2025-06-03 [IEEE TMM] TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages (by Minsu Kim) is accepted to IEEE Transactions on Multimedia.
2025-05-28 [ACL 2025] MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens (by Jeong Hun Yeo, Hyeongseop Rha) is accepted to the Findings of ACL 2025.
2025-05-14 [ICML 2025] Long-Form Speech Generation with Spoken Language Models (by Se Jin Park) is accepted as Oral (~1%) in ICML 2025.
2025-04-18 [2025 가을학기 연구실 학생 모집] MLLM (Multimodal large language model)+ (Vision, Audio, Language) 분야를 연구할 인재를 초청합니다.
2025-03-12 [Recruited by Deepmind] Dr. Minsu Kim and Dr. Joanna Hong, have been recruited by DeepMind.
2025-02-27 [CVPR 2025] SALOVA: Segment-Augmented Long Video Assistance for Targeted Retrieval and Routing in Long-Form Video Analysis (by Junho Kim, Hyunjun Kim) is accepted in CVPR 2025.
2025-02-27 [CVPR 2025] VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models (by Byung-Kwan Lee) is accepted in CVPR 2025.
2024-12-24 [IEEE TCSVT] MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection (by Taeheon Kim, Sangyun Chung) is accepted in IEEE Transactions on Circuits and Systems for Video Technology.
2024-12-10 [AAAI 2025] Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language (by Jeong Hun Yeo) is accepted in AAAI 2025.
2024-10-18 [IEEE TPAMI] Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition (by Minsu Kim) is accepted in IEEE Transactions on Pattern Analysis and Machine Intelligence.
2024-10-15 [NVIDIA Internship] Byung Kwan Lee will join NVIDIA for a research internship.
2024-10-09 [IEEE TNNLS] Advancing Causal Intervention in Image Captioning with Causal Prompt (by Youngjoon Yu) is accepted in IEEE Transactions on Neural Networks and Learning Systems.
2024-09-26 [NeurIPS 2024] Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models (by Byung-Kwan Lee) is accepted at NeurIPS 2024.
2024-09-26 [NeurIPS 2024] CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models (by Junho Kim, Hyunjun Kim) is accepted at NeurIPS 2024.
2024-09-21 [EMNLP 2024] From CollaVo (ACL 24) to MoAI (ECCV 24), Now TroL: Advancing Large Language and Vision Model (by Byung-Kwan Lee) is accepted at EMNLP 2024.
2024-09-21 [EMNLP 2024] Where Visual Speech Meets Language: VSP-LLM (by Jeong Hun Yeo, Seunghee Han) is accepted at the Findings of EMNLP 2024.
2024-09-21 [EMNLP 2024] What if...?: Thinking Counterfactual Keywords Helps to Mitigate Hallucination in Large Multi-modal Models (by Junho Kim) is accepted at the Findings of EMNLP 2024.
2024-08-19 [Outstanding Paper Award in ACL 2024] Se Jin Park and Chae Won Kim have won the Outstanding Paper Award at the ACL (Association for Computational Linguistics) 2024 conference.
2024-08-03 [IEEE TASLP] Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation (by Minsu Kim) is accepted in IEEE Trans. on Audio, Speech, and Language Processing.
2024-07-17 [ACM MM 2024] Efficient Training for Multilingual Visual Speech Recognition (by Minsu Kim, Jeonghun Yeo) is accepted in ACM MM 2024.
2024-07-03 [ECCV 2024] MoAI: Mixture of All Intelligence for Large Language and Vision Models (by Byung-Kwan Lee) is accepted in ECCV 2024.
2024-07-03 [Pattern Recognition] Text-Guided Distillation Learning to Diversify Video Embeddings (by Sangmin Lee) is accepted in Pattern Recognition.
2024-02-21 Prof. Yong Man Ro Named ICT Endowed Chair Professor at KAIST.
IVYLAB & IVLLAB