jeongsoo_choi

Jeongsoo Choi

Postdoctoral Researcher at KAIST

I am a postdoctoral researcher in Multimodal AI Lab at KAIST EE, advised by Prof. Joon Son Chung. My research interests include multimodal conversational AI, with a focus on spoken language modeling, audio-visual speech understanding and generation, multilingual communication, and human-AI interaction.

About Me


Work Experience

  • KAIST, Daejeon, South Korea (Mar. 2026 - Present)
    Postdoctoral Researcher
    Alternative Military Service as Technical Research Personnel
  • Meta, Seattle, WA, United States (Jun. 2025 - Sep. 2025)
    Research Scientist Intern, FAIR Team
    Advisor: Dr. Hongyu Gong
  • Microsoft, Beijing, China (May 2024 - Oct. 2024)
    Research Intern, Speech Team
    Advisors: Dr. Shujie Liu and Dr. Jinyu Li

Education

  • KAIST, Daejeon, South Korea (Sep. 2020 - Feb. 2026)
    Ph.D. in Electrical Engineering
    Advisor: Prof. Joon Son Chung
  • KAIST, Daejeon, South Korea (Mar. 2015 - Aug. 2020)
    B.S. in Electrical Engineering

Publications


2026

  • ProsoCodec: Prosody-Oriented Speech Codec for Voice Conversion
    Jeongsoo Choi, Ji-Hoon Kim, Shujie Hu, Joon Son Chung
    Interspeech 2026
  • Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis
    Zhikang Niu, Shujie Hu, Jeongsoo Choi, Yushen Chen, Peining Chen, Pengcheng Zhu, Yunting Yang, Bowen Zhang, Jian Zhao, Chunhui Wang, Xie Chen
    Interspeech 2026
    [ paper | code | demo ]
  • DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and Synchronization
    Ngoc-Son Nguyen, Thanh V. T. Tran, Jeongsoo Choi, Hieu-Nghia Huynh-Nguyen, Truong-Son Hy, Van Nguyen
    CVPR 2026 Findings
    [ paper | code | demo ]
  • Deep Understanding of Sign Language for Sign to Subtitle Alignment
    Youngjoon Jang*, Jeongsoo Choi*, Junseok Ahn, Joon Son Chung
    IEEE Transactions on Multimedia (TMM)
    [ paper ]

2025

  • Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
    Jeongsoo Choi*, Jaehun Kim*, Joon Son Chung
    EMNLP 2025 Findings
    [ paper ]
  • AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
    Jeongsoo Choi, Ji-Hoon Kim, Kim Sung-Bin, Tae-Hyun Oh, Joon Son Chung
    ACM MM 2025
    [ paper | code | demo ]
  • VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models
    Kim Sung-Bin, Jeongsoo Choi, Puyuan Peng, Joon Son Chung, Tae-Hyun Oh, David Harwath
    ICCV 2025
    [ paper | demo ]
  • MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation
    Sungwoo Cho, Jeongsoo Choi, Sungnyun Kim, Se-Young Yun
    ICCV 2025
    [ paper ]
  • Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment
    Jeongsoo Choi*, Zhikang Niu*, Ji-Hoon Kim, Chunhui Wang, Joon Son Chung, Xie Chen
    Interspeech 2025
    [ paper | code | demo ]
  • From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
    Ji-Hoon Kim, Jeongsoo Choi, Jaehun Kim, Chaeyoung Jung, Joon Son Chung
    CVPR 2025 Highlight presentation
    [ paper | demo ]
  • ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
    Zongyi Li*, Shujie Hu*, Shujie Liu, Long Zhou, Jeongsoo Choi, Lingwei Meng, Xun Guo, Jinyu Li, Hefei Ling, Furu Wei
    ICLR 2025
    [ paper | demo ]
  • V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow
    Jeongsoo Choi*, Ji-Hoon Kim*, Jinyu Li, Joon Son Chung, Shujie Liu
    ICASSP 2025
    [ paper | demo ]
  • Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
    Tan Dat Nguyen, Ji-Hoon Kim, Jeongsoo Choi, Shukjae Choi, Jinseok Park, Younglo Lee, Joon Son Chung
    ICASSP 2025
    [ paper | demo ]

2024

  • Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation
    Minsu Kim*, Jeongsoo Choi*, Dahun Kim, and Yong Man Ro
    IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)
    [ paper | code | demo ]
  • AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
    Jeongsoo Choi*, Se Jin Park*, Minsu Kim*, and Yong Man Ro
    CVPR 2024 Highlight presentation
    [ paper | code | demo ]
  • Text-driven Talking Face Synthesis by Reprogramming Audio-driven Models
    Jeongsoo Choi, Minsu Kim, Se Jin Park, and Yong Man Ro
    ICASSP 2024
    [ paper | demo ]
  • Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
    Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, Yong Man Ro
    ICASSP 2024
    [ paper | code | demo ]
  • Exploring Phonetic Context-Aware Lip-Sync For Talking Face Generation
    Se Jin Park, Minsu Kim, Jeongsoo Choi, and Yong Man Ro
    ICASSP 2024
    [ paper ]
  • AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
    Jeong Hun Yeo, Minsu Kim, Jeongsoo Choi, Dae Hoe Kim, and Yong Man Ro
    IEEE Transactions on Multimedia (TMM)
    [ paper ]

2023

  • DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding
    Jeongsoo Choi*, Joanna Hong*, and Yong Man Ro
    ICCV 2023
    [ paper | demo ]
  • Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge
    Minsu Kim*, Jeong Hun Yeo*, Jeongsoo Choi, and Yong Man Ro
    ICCV 2023
    [ paper ]
  • Intelligible Lip-to-Speech Synthesis with Speech Units
    Jeongsoo Choi, Minsu Kim, and Yong Man Ro
    Interspeech 2023
    [ paper | code | demo ]
  • Watch or Listen: Robust Audio-Visual Speech Recognition With Visual Corruption Modeling and Reliability Scoring
    Joanna Hong*, Minsu Kim*, Jeongsoo Choi, and Yong Man Ro
    CVPR 2023
    [ paper | code | demo | data ]

2022

  • SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory
    Se Jin Park, Minsu Kim, Joanna Hong, Jeongsoo Choi, and Yong Man Ro
    AAAI 2022 Oral presentation
    [ paper ]

Awards & Honors


Academic Services


Conference Reviewer: AAAI, CVPR, ECCV, ICASSP, ICCV, Interspeech, ACM MM, NeurIPS

Journal Reviewer: International Journal of Computer Vision (IJCV)


Jeongsoo Choi @ KAIST