Seminar / Workshop

Toward Real-Time, Real-World Spoken Dialogue Robots

4 June 2025, time 11:30

https://maps.app.goo.gl/r8NcqAhukJMYmDyC8

Ferrari 1 Building, Via Sommarive 5, Povo (Trento)

Room n. 259, Augmented Health Environments Lab

Free

Organizer: Department of Information Engineering and Computer Science

Target audience: University community

Add to your Google Calendar

Referent: Seyed Mahed Mousavi

Speaker: Koichiro Yoshino (Institute of Science Tokyo)

Abstract

With advances in core technologies of spoken dialogue systems, such as automatic speech recognition (ASR) and dialogue generation, real-world dialogue agents are becoming feasible. However, there are several issues on working such systems in real-time, and real-world. Cascaded systems often suffer from significant latency in each module, resulting in interactions that are far from natural human dialogue. Recent research has begun to focus on real-time dialogue systems, such as Voice Activity Projection (VAP) and Moshi. While fast systems that predict turn-taking using acoustic cues have been widely studied, it is equally important to explore slower systems that anticipate dialogue based on the content of utterances.
In this talk, we introduce ongoing research toward real-time dialogue systems and present approaches that integrate both real-world robotic dialogue and insights from human cognitive mechanisms.

About the Speaker

Koichiro Yoshino is an Associate Professor of Institute of Science Tokyo (formerly known as Tokyo-Tech) and a Team Director of RIKEN. He received his Ph.D. in 2014 from Kyoto University in the field of informatics. He worked at Kyoto University as a Pos-doc, and NAIST as an Assistant Professor. From 2020, he is working as the PI of intelligent robotdialogue research laboratory at RIKEN. He is also an Affiliate Professor of NAIST. From 2019 to 2020, he was a visiting research of Heinrich-Heine-Universität Düsseldorf, Germany. He is working on areas of spoken and natural language processing, especially robot dialogue systems.
Dr. Koichiro Yoshino received several honors, including the best paper award of IWSDS2020, IWSDS2024, and the best paper award of the 1st NLP4ConvAI workshop. He is a member of IEEE Speech and Language Processing Technical Committee (SLTC), a member of Dialogue System Technology Challenge (DSTC) Steering Committee, an action editor of ACL rolling review (ARR), a board member of SIGdial, a member at large of AFNLP, and a board member of association for The Association for Natural Language Processing.