jeongsoo_choi

Jeongsoo Choi

Ph.D. candidate at KAIST

I am a Ph.D. student in the School of Electrical Engineering at KAIST. My research interest includes 1) Multilingual and Multimodal Machine Translation and 2) Audio-Visual Speech Recognition and Generation.

About Me


Work Experience

  • Microsoft, Beijing, China (May. 2024 - Oct. 2024)
    Research Intern, Speech Team
    Advisors: Shujie Liu and Jinyu Li

Education

  • KAIST, Daejeon, Korea (Sep. 2020 - Present)
    Ph.D. in Electrical Engineering
    Advisor: Prof. Joon Son Chung
  • KAIST, Daejeon, Korea (Mar. 2015 - Aug. 2020)
    B.S. in Electrical Engineering

Publications


2024

  • Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation
    Minsu Kim*, Jeongsoo Choi*, Dahun Kim, and Yong Man Ro
    IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)
    [ paper | code | demo ]
  • AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
    Jeongsoo Choi*, Se Jin Park*, Minsu Kim*, and Yong Man Ro
    CVPR 2024 Highlight presentation
    [ paper | code | demo ]
  • Text-driven Talking Face Synthesis by Reprogramming Audio-driven Models
    Jeongsoo Choi, Minsu Kim, Se Jin Park, and Yong Man Ro
    ICASSP 2024
    [ paper | demo ]
  • Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
    Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, Yong Man Ro
    ICASSP 2024
    [ paper | code | demo ]
  • Exploring Phonetic Context-Aware Lip-Sync For Talking Face Generation
    Se Jin Park, Minsu Kim, Jeongsoo Choi, and Yong Man Ro
    ICASSP 2024
    [ paper ]
  • AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
    Jeong Hun Yeo, Minsu Kim, Jeongsoo Choi, Dae Hoe Kim, and Yong Man Ro
    IEEE Transactions on Multimedia (TMM)
    [ paper ]

2023

  • DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding
    Jeongsoo Choi*, Joanna Hong*, and Yong Man Ro
    ICCV 2023
    [ paper | demo ]
  • Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge
    Minsu Kim*, Jeong Hun Yeo*, Jeongsoo Choi, and Yong Man Ro
    ICCV 2023
    [ paper ]
  • Intelligible Lip-to-Speech Synthesis with Speech Units
    Jeongsoo Choi, Minsu Kim, and Yong Man Ro
    Interspeech 2023
    [ paper | code | demo ]
  • Watch or Listen: Robust Audio-Visual Speech Recognition With Visual Corruption Modeling and Reliability Scoring
    Joanna Hong*, Minsu Kim*, Jeongsoo Choi, and Yong Man Ro
    CVPR 2023
    [ paper | code | demo | data ]

2022

  • SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory
    Se Jin Park, Minsu Kim, Joanna Hong, Jeongsoo Choi, and Yong Man Ro
    AAAI 2022 Oral presentation
    [ paper ]

Jeongsoo Choi @ KAIST