Research Demo

Integrating Language-Derived Appearance Elements with Visual Cues in Pedestrian Detection

Sungjune Park*, Hyunjun Kim*, Yong Man Ro (* equal contributor)

IEEE Transactions on Circuits and Systems for Video Technology

Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection

Taeheon Kim*, Sebin Shin*, Youngjoon Yu, Hak Gu Kim, and Yong Man Ro (* equal contributor)

CVPR 2024

AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation 

Jeongsoo Choi*, Se Jin Park*, Minsu Kim*, and Yong Man Ro  (* equally contributed)

CVPR 2024

Exploring Phonetic Context-Aware Lip-Sync for Talking Face Generation

Se Jin Park, Minsu Kim, Jeongsoo Choi, and Yong Man Ro


Improving Open Set Recognition via Visual Prompts Distilled from Common-Sense Knowledge

Seongyeop Kim, Hyung-Il Kim, and Yong Man Ro

AAAI 2024

DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding

Jeongsoo Choi*, Joanna Hong*, and Yong Man Ro (* equally contributed)

ICCV 2023

Mitigating Dataset Bias in Image Captioning through CLIP Confounder-free Captioning Network

YeonJu Kim, Junho Kim, Byung-Kwan Lee, Sebin Shin, and Yong Man Ro

ICIP 2023

Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring

Joanna Hong*, Minsu Kim*, Jeongsoo Choi, and Yong Man Ro (* equally contributed)

CVPR 2023

Lip-to-speech Synthesis in the Wild with Multi-task Learning

Minsu Kim*, Joanna Hong*, and Yong Man Ro (* equally contributed)


Audio-Visual Mismatch-Aware Video Retrieval via Association and Adjustment

Sangmin Lee, Sungjune Park, Yong Man Ro

ECCV 2022

VisageSynTalk: Unseen Speaker Video to Speech Synthesis via Speech Visage Feature Selection

Joanna Hong, Minsu Kim, Yong Man Ro

ECCV 2022

Weakly Paired Associative Learning for Sound-Image Representation

Sangmin Lee, Hyung-Il Kim, Yong Man Ro

CVPR 2022

Masking Adversarial Damage: Finding Adversarial Saliency for Robust and Sparse Network

Byung-Kwan Lee, Junho Kim, Yong Man Ro

CVPR 2022

Distinguishing Homophenes using Multi-head Visual-Audio Memory for Lip Reading

Minsu Kim, Jeong Hun Yeo, Yong Man Ro

AAAI 2022

SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory

Sejin Park, Minsu Kim, Joanna Hong, Jeongsoo Choi, Yong Man Ro

AAAI 2022

Lip to Speech Synthesis with Visual Context Attentional GAN

Minsu Kim, Joanna Hong, Yong Man Ro

NeurIPS 2021

Distilling Robust and Non-Robust Features in Adversarial Examples

Junho Kim, Byung-Kwan Lee, Yong Man Ro

NeurIPS 2021

Multi Modality Associative Bridging Through Memory Speech Sound Recollected From Face Video

Minsu Kim, Joanna Hong, Se Jin Park, Yong Man Ro

ICCV 2021

Robust Small scale Pedestrian Detection with Cued Recall via Memory Learning

Jung Uk Kim, Sungjune Park, Yong Man Ro

ICCV 2021

Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning

Sangmin Lee, Hak Gu Kim, Dae Hwi Choi, Hyung-Il Kim, Yong Man Ro

CVPR 2021

Video Based Facial Expression Recognition with appearance suppressed dynamic features for on-the-fly prediction

Wissam J. Baddar, Sangmin Lee, Yong Man Ro

IEEE Transactions on Affective Computing 2019

ICADX: Interpretable Computer Aided Diagnosis of Breast Masses

Seong Tae Kim, Hakmin Lee, Hak Gu Kim, Yong Man Ro

Medical Imaging 2018

Facial Expression Based Face Identification

Seong Tae Kim, Yong Man Ro

ICIP 2018

Ultra Fast CGH Calculation using Sparse FFT

Hak Gu Kim, Yong Man Ro

Optics Express 2017

Deep Learning based Recognition: DeepSensus, deep facial expression recognition

Wissam J. Baddar, Daehoe Kim, Yong Man Ro

MMM 2017

Free-view Generation for 3D Displays

Hak Gu Kim, Yong Man Ro


Automatically masking face for privacy protection first and Recognizing enrolled face later


Measure of Visual Discomfort While Watching 3D TV


Emotion TV: Emotion Measure While Watching TV Contents


S3D quality analyzer


 Automatic Privacy Protection in Surveillance (Face Masking) and Real application to ATM Surveillance


Facial Expression Recognition in Real-world Situation


Digital Breast Tomosynthesis (DBT) Computer-Aided Detection (CAD)