2025 봄 MO 세미나

5/22 (목) 16:00 – 17:30 정준선 교수님(KAIST)

Facebook
LinkedIn
Twitter

1. 연사: 정준선 교수님 (KAIST)
2. 주제: 목소리와 얼굴을 갖춘 AI: 다중 모달 소통을 위한 기술
3. 일시: 5/22 (목요일), 오후 4시~5시 30분
4. 장소: L409호
* Abstract
Deep neural networks excel in various domains such as speech or image recognition, yet humans learn by combining multiple senses. We emulate this by leveraging the natural alignment of sight and sound—e.g., a guitarist’s image with its audio—to train models without labels. The resulting representations support tasks like sound-source localization and retrieval. Moving beyond monaural settings, our framework integrates binaural cues, enabling it to disentangle scenes where visual and auditory signals misalign or overlap, and to unify semantic and spatial reasoning.