PMMTD: Towards Proactive Multimodal Mixed-Type Dialogues
Hongfei XIA , Yuhang GUO , Bao CHEN , Linhao ZHENG , Zeming LIU , Haifeng WANG
Currently, mixed-type dialogue systems aim to handle complex conversations by integrating multiple dialogue types within a single interaction. However, existing approaches are predominantly text-based and lack the ability to proactively guide the conversation, which significantly limits their effectiveness in real-world scenarios. For instance, when a user is unfamiliar with a concept being discussed, a more natural and effective system response would be to proactively present an image, rather than continuing with additional text-based explanations. In this paper, we formally identify this limitation and define the challenge of building a proactive multimodal mixed-type dialogue system capable of handling realistic, dynamic dialogue situations. To mitigate this challenge, we propose a new task and introduce a novel Proactive Multimodal Mixed-Type Dialogue dataset, PMMTD, which spans four dialogue types, conversational recommendation, task-oriented dialogues, Q&A, and chitchat. Specifically, each dialogue in PMMTD involves multimodal information and rich dialogue types with natural topic transitions. Additionally, we propose a proactive multimodal mixed-type dialogue generation framework with a novel Composite Structure-Guiding mechanism, termed CSG, and build baselines for PMMTD to address this task. Experimental results show the effectiveness of CSG. We will open-source PMMTD and CSG at https://github.com/BITHLP/PMMTD.
Multimodal mixed-type dialogues / Mixed-type dialogues / Dialogue systems
Higher Education Press 2026
/
| 〈 |
|
〉 |