Lab Overview

Lab Introduction

Image and video systems (IVY) Lab at KAIST, was founded in 1997 and has been led by Prof. Ro since its establishment. Among the years IVY Lab has been conducting research in a wide spectrum of multimedia including image and video processing and multimodal deep learning. Some of the recent research topics of the IVY Lab are: Multimodal deep learning, integrating vision, speech, and language for AI, vision with large scale models, inclusive human multimodal conversation, interpretability and robustness of deep learning models, and computer vision and multimedia. IVY Lab has produced about 130 journal papers and 350 conference papers over the last years. The collaborative lab environment and the enthusiasm of its members have made it be in touch with the latest developments of standards and AI. For example, the lab has developed the homogeneous texture descriptor for the MPEG standard, ROI descriptor in SVC and various description schemes in user characteristics as a part of the MPEG LA. In AI of recent years, the lab has accomplished several outstanding research achievements: Deep learning based visual recognition, distinguishing homophenes using multi-head visual-audio memory, distilling robust and non-robust features in adversarial examples, Synctalkface: talking face generation, lip to speech synthesis with visual context attentional GAN, Cromm-VSR: cross-modal memory augmented visual speech recognition, multi-modality associative bridging through memory, video prediction recalling long-term motion context, structure boundary preserving segmentation, BMAN: bidirectional multi-scale aggregation networks, mode variational LSTM robust to unseen modes of variation, and multi-objective based spatio-temporal feature representation learning. The lab is continuously working hand to hand with industry to be able to innovate and challenge the state of the art in multiple aspects of Multimodal AI. Currently the lab is interested in the following research topics:  

Integrating Vision, Speech, and Language for AI

Multimodal Learning with Pretrained Large-Scale Model 

Multimodal Deep Learning 

Inclusive Human Multimodal Conversation

Competency, Interpretability, Memorability, and Robustness of Deep learning Model

Computer Vision and Multimedia