Bridging modalities: a unified framework for textual and multimodal dialogue discourse parsing
Chen GONG , Nan YU , Guo-Hong FU
Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (9) : 2009351
Bridging modalities: a unified framework for textual and multimodal dialogue discourse parsing
Dialogue discourse parsing is a fundamental task in natural language understanding. It aims to capture the relationships between utterances in a dialogue, facilitating a deeper understanding of dialogue structures and semantics, especially in long and complex dialogues. Existing research often develops separate dialogue discourse parsers for text-only and multimodal scenarios, largely due to the scarcity of parallel multimodal annotated datasets. This separation limits the ability to fully utilize diverse data with different modalities and poses challenges for real-world artificial intelligence applications. To address the limitation, we propose a unified dialogue discourse parsing framework that bridges text-only and multimodal parsing within a single model. We first develop a basic text-only parser, pre-trained on textual datasets. Then, we extend it to multimodal scenarios by adding additional multimodal encoders and fusion modules, while freezing the parameters learned during the text-only stage. We conduct extensive experiments on three datasets, covering both text-only and multimodal dialogues. Experimental results show that our approach achieves significant average improvements over several existing benchmarks. This demonstrates the generalizability and effectiveness of our framework for dialogue discourse parsing across different modalities.
dialogue discourse parsing / dialogue systems / multimodal data / unified framework / natural language processing
| [1] |
Jia Q, Liu Y, Ren S, Zhu K, Tang H. Multi-turn response selection using dialogue dependency relations. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing. 2020, 1911−1920 |
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
Chen J, Yang D. Structure-aware abstractive conversation summarization via discourse and action graphs. In: Proceedings of 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021, 1380−1391 |
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
Fan Y, Jiang F, Li P, Kong F, Zhu Q. Improving dialogue discourse parsing via reply-to structures of addressee recognition. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 8484−8495 |
| [17] |
|
| [18] |
Zhao N, Li H, Wu Y, He X. JDDC 2.1: a multimodal Chinese dialogue dataset with joint tasks of query rewriting, response generation, discourse parsing, and summarization. In: Proceedings of 2022 Conference on Empirical Methods in Natural Language Processing. 2022, 12037−12051 |
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
Prasad R, Dinesh N, Lee A, Miltsakaki E, Robaldo L, Joshi A, Webber B. The Penn discourse TreeBank 2.0. In: Proceedings of the 6th International Conference on Language Resources and Evaluation. 2008 |
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
Afantenos S, Kow E, Asher N, Perret J. Discourse parsing for multi-party chat dialogues. In: Proceedings of 2015 Conference on Empirical Methods in Natural Language Processing. 2015, 928−937 |
| [31] |
Asher N, Hunter J, Morey M, Farah B, Afantenos S. Discourse structure and dialogue acts in multiparty dialogue: the STAC corpus. In: Proceedings of the 10th International Conference on Language Resources and Evaluation. 2016, 2721−2727 |
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
| [39] |
Fan Y, Jiang F, Li P, Li H. Uncovering the potential of ChatGPT for discourse analysis in dialogue: an empirical study. In: Proceedings of 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation. 2024, 16998−17010 |
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
|
| [44] |
|
| [45] |
|
| [46] |
|
| [47] |
|
| [48] |
|
Higher Education Press
/
| 〈 |
|
〉 |