LLM Multimodal Highlights

Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Byung-Kwan Lee, Chae Won Kim, Beomchan Park, Yong Man Ro
Taeheon Kim*, Sangyun Chung*, Damin Yeom, Youngjoon Yu, Hak Gu Kim, Yong Man Ro
Byung-Kwan Lee, Beomchan Park, Chae Won Kim, Yong Man Ro
ACL 2024

Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation

Se Jin Park*, Chae Won Kim*, Hyeongseop Rha, Minsu Kim, Joanna Hong, Jeonghun Yeo, and Yong Man Ro (* equal contributor)

ACL 2024

Minsu Kim, Jee-weon Jung, Hyeongseop Rha, Soumi Maiti, Siddhant Arora, Xuankai Chang, Shinji Watanabe,  Yong Man Ro
Sungjune Park*, Hyunjun Kim*, Yong Man Ro (* equal contributor)
Pattern Recognition
Sungjune Park*, Hyunjun Kim*, Yong Man Ro (* equal contributor)
IEEE Transactions on Circuits and Systems for Video Technology

 Jeongsoo Choi*, Se Jin Park*, Minsu Kim* and Yong Man Ro (* equal contributor)

CVPR 2024

Towards Practical and Efficient Image-To-Speech Captioning With Vision-Language Pre-Training and Multi-Modal Tokens

Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, and Yong Man Ro

ICASSP 2024

OSR via Visual Prompts from Common-Sense Knowledge

Seongyeop Kim, Hyung-Il Kim, and Yong Man Ro

AAAI 2024