
MPFToD: a modularized pre-training framework for consistency identification in task-oriented dialogue
Libo QIN, Shijue HUANG, Qiguang CHEN, Qian LIU, Wanxiang CHE, Ruifeng XU
Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (10) : 1910351.
MPFToD: a modularized pre-training framework for consistency identification in task-oriented dialogue
Consistency identification in task-oriented dialogue (CI-ToD) can prevent inconsistent dialogue response generation, which has recently emerged as an important and growing research area. This paper takes the first step to explore a pre-training paradigm for CI-ToD. Nevertheless, pre-training for CI-ToD is non-trivial because it requires a large amount of multi-turn KB-grounded dialogues, which are extremely hard to collect. To alleviate the data scarcity problem for pre-training, we introduce a modularized pre-training framework (MPFToD), which is capable of utilizing large amounts of KB-free dialogues. Specifically, such modularization allows us to decouple CI-ToD into three sub-modules and propose three pre-training tasks including (i) query response matching pre-training; (ii) dialogue history consistent identification pre-training; and (iii) KB mask language modeling to enhance different abilities of CI-ToD model. As different sub-tasks are solved separately, MPFToD can learn from large amounts of KB-free dialogues for different modules, which are much easier to obtain. Results on the CI-ToD benchmark show that MPFToD pushes the state-of-the-art performance from 56.3% to 61.0%. Furthermore, we show its transferability with promising performance on other downstream tasks (i.e., dialog act recognition, sentiment classification and table fact checking).
task-oriented dialogue / consistency identification / modularized pre-training framework
Libo Qin received his PhD degree in computer science from the Harbin Institute of Technology (HIT), China. He is a professor at Central South University, China. His current research interests include natural language processing and dialogue systems
Shijue Huang is currently working toward the master’s degree with the School of Computer Science and Technology, Harbin Institution of Technology (Shenzhen), China. His research interests include natural language processing and dialogue systems
Qiguang Chen is a master student in Harbin Institute of Technology (HIT), China. His research fields include natural language processing and dialog systems
Qian Liu is a Research Scientist at Sea AI Lab, Singapore. His research interests include semantic parsing and dialogue systems, and natural language processing. He has published several papers in top-tier conferences (ICLR/NeurIPS/ACL/EMNLP)
Wanxiang Che received his PhD degree in computer science from the Harbin Institute of Technology (HIT), China in 2008. He is a Full Professor in the School of Computer Science and Technology, HIT. His current research interests include natural language processing and dialogue systems
Ruifeng Xu received the PhD degree in computer science from The Hong Kong Polytechnic University, China. He is currently a Professor with the School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), China. He has published more than 100 papers in natural language processing, sentiment analysis, and social media analysis
[1] |
Qin L, Xie T, Huang S, Chen Q, Xu X, Che W. Don’t be contradicted with anything! CI-ToD: towards benchmarking consistency for task-oriented dialogue system. In: Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing. 2021, 2357–2367
|
[2] |
Chen Q, Zhu X, Ling Z H, Wei S, Jiang H, Inkpen D. Enhanced LSTM for natural language inference. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017, 1657–1668
|
[3] |
Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A. Supervised learning of universal sentence representations from natural language inference data. In: Proceedings of 2017 Conference on Empirical Methods in Natural Language Processing. 2017, 670–680
|
[4] |
Yang R, Zhang J, Gao X, Ji F, Chen H. Simple and effective text matching with richer alignment features. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019, 4699–4709
|
[5] |
Devlin J, Chang M W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, 4171–4186
|
[6] |
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V. RoBERTa: a robustly optimized BERT pretraining approach. 2019, arXiv preprint arXiv: 1907.11692
|
[7] |
Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 7871–7880
|
[8] |
Qin L, Chen Q, Xie T, Liu Q, Huang S, Che W, Yu Z. CGIM: a cycle guided interactive learning model for consistency identification in task-oriented dialogue. In: Proceedings of the 29th International Conference on Computational Linguistics. 2022, 461–470
|
[9] |
Qiu X, Sun T, Xu Y, Shao Y, Dai N, Huang X . Pre-trained models for natural language processing: a survey. Science China Technological Sciences, 2020, 63( 10): 1872–1897
|
[10] |
Chen W, Wang H, Chen J, Zhang Y, Wang H, Li S, Zhou X, Wang W Y. TabFact: a large-scale dataset for table-based fact verification. In: Proceedings of the 8th International Conference on Learning Representations. 2020
|
[11] |
Yin P, Neubig G, Yih W T, Riedel S. TaBERT: pretraining for joint understanding of textual and tabular data. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 8413–8426
|
[12] |
Lee S, Schulz H, Atkinson A, Gao J, Suleman K, El Asri L, Adada M, Huang M, Sharma S, Tay W, Li X. Multi-domain task-completion dialog challenge. 2019
|
[13] |
El Asri L, Schulz H, Sharma S, Zumer J, Harris J, Fine E, Mehrotra R, Suleman K. Frames: a corpus for adding memory to goal-oriented dialogue systems. In: Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue. 2017, 207–219
|
[14] |
Mrkšić N, Seaghdha D Ó, Wen T H, Thomson B, Young S. Neural belief tracker: data-driven dialogue state tracking. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017, 1777–1788
|
[15] |
Wen T H, Vandyke D, Mrkšić N, Gašić M, Rojas-Barahona L M, Su P H, Ultes S, Young S. A network-based end-to-end trainable task-oriented dialogue system. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. 2017, 438–449
|
[16] |
Eric M, Krishnan L, Charette F, Manning C D. Key-value retrieval networks for task-oriented dialogue. In: Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue. 2017, 37–49
|
[17] |
Budzianowski P, Wen T H, Tseng B H, Casanueva I, Ultes S, Ramadan O, Gašić M. MultiWOZ - a large-scale multi-domain Wizard-of-Oz dataset for task-oriented dialogue modelling. In: Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing. 2018, 5016–5026
|
[18] |
Rastogi A, Zang X, Sunkara S, Gupta R, Khaitan P. Towards scalable multi-domain conversational agents: the schema-guided dialogue dataset. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 8689–8696
|
[19] |
Li X, Wang Y, Sun S, Panda S, Liu J, Gao J. Microsoft dialogue challenge: building end-to-end task-completion dialogue systems. 2018, arXiv preprint arXiv: 1807.11125
|
[20] |
Byrne B, Krishnamoorthi K, Sankar C, Neelakantan A, Goodrich B, Duckworth D, Yavuz S, Dubey A, Kim K Y, Cedilnik A. Taskmaster-1: toward a realistic and diverse dialog dataset. In: Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019, 4516–4525
|
[21] |
Shah P, Hakkani-Tür D, Liu B, Tür G. Bootstrapping a neural conversational agent with dialogue self-play, crowdsourcing and on-line reinforcement learning. In: Proceedings of 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers). 2018, 41–51
|
[22] |
Li Y, Su H, Shen X, Li W, Cao Z, Niu S. DailyDialog: a manually labelled multi-turn dialogue dataset. In: Proceedings of the 8th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2017, 986–995
|
[23] |
Parikh A, Wang X, Gehrmann S, Faruqui M, Dhingra B, Yang D, Das D. ToTTo: a controlled table-to-text generation dataset. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. 1173–1186
|
[24] |
Loshchilov I, Hutter F. Decoupled weight decay regularization. In: Proceedings of the 7th International Conference on Learning Representations. 2019
|
[25] |
Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov R, Le Q V. Xlnet: generalized autoregressive pretraining for language understanding. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 517
|
[26] |
Beltagy I, Peters M E, Cohan A. Longformer: the long-document transformer. 2020, arXiv preprint arXiv: 2004.05150
|
[27] |
Wu C S, Hoi S C H, Socher R, Xiong C. TOD-BERT: pre-trained natural language understanding for task-oriented dialogue. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020, 917–929
|
[28] |
Gururangan S, Marasović A, Swayamdipta S, Lo K, Beltagy I, Downey D, Smith N A. Don’t stop pretraining: adapt language models to domains and tasks. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 8342–8360
|
[29] |
Cerisara C, Jafaritazehjani S, Oluokun A, Le H T. Multi-task dialog act and sentiment recognition on mastodon. In: Proceedings of the 27th International Conference on Computational Linguistics. 2018, 745–754
|
[30] |
Qin L, Che W, Li Y, Ni M, Liu T. Dcr-Net: a deep co-interactive relation network for joint dialog act recognition and sentiment classification. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 8665–8672
|
[31] |
Zhuang Z, Chen Q, Ma L, Li M, Han Y, Qian Y, Bai H, Zhang W, Liu T. Through the lens of core competency: survey on evaluation of large language models. In: Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 2: Frontier Forum). 2023, 88–109
|
[32] |
Qin L, Chen Q, Feng X, Wu Y, Zhang Y, Li Y, Li M, Che W, Yu P S. Large language models meet NLP: a survey. 2024, arXiv preprint arXiv: 2405.12819
|
[33] |
Zhang S, Dinan E, Urbanek J, Szlam A, Kiela D, Weston J. Personalizing dialogue agents: I have a dog, do you have pets too? In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018, 2204–2213
|
[34] |
Zheng Y, Chen G, Huang M, Liu S, Zhu X. Personalized dialogue generation with diversified traits. 2019, arXiv preprint arXiv:1901.09672
|
[35] |
Welleck S, Weston J, Szlam A, Cho K. Dialogue natural language inference. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019, 3731–3741
|
[36] |
Dziri N, Kamalloo E, Mathewson K, Zaiane O. Evaluating coherence in dialogue systems using entailment. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, 3806–3812
|
[37] |
Song H, Wang Y, Zhang W N, Zhao Z, Liu T, Liu X. Profile consistency identification for open-domain dialogue agents. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020, 6651–6662
|
[38] |
Nie Y, Williamson M, Bansal M, Kiela D, Weston J. I like fish, especially dolphins: addressing contradictions in dialogue modeling. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021, 1699–1713
|
[39] |
Zhang Z, Guo T, Chen M. DialogueBERT: a self-supervised learning based dialogue pre-training encoder. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2021, 3647–3651
|
[40] |
Gu J C, Tao C, Ling Z, Xu C, Geng X, Jiang D. MPC-BERT: a pre-trained language model for multi-party conversation understanding. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021, 3682–3692
|
[41] |
Zhang Y, Sun S, Galley M, Chen Y C, Brockett C, Gao X, Gao J, Liu J, Dolan B. DIALOGPT: large-scale generative pre-training for conversational response generation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 2020, 270–278
|
[42] |
Zhao X, Wu W, Tao C, Xu C, Zhao D, Yan R. Low-resource knowledge-grounded dialogue generation. In: Proceedings of the 8th International Conference on Learning Representations. 2020
|
[43] |
Bao S, He H, Wang F, Wu H, Wang H. PLATO: pre-trained dialogue generation model with discrete latent variable. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 85–96
|
[44] |
Su Y, Shu L, Mansimov E, Gupta A, Cai D, Lai Y A, Zhang Y. Multi-task pre-training for plug-and-play task-oriented dialogue system. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022, 4661–4676
|
[45] |
Shi X, Che W . Combating with extremely noisy samples in weakly supervised slot filling for automatic diagnosis. Frontiers of Computer Science, 2023, 17( 5): 175333
|
[46] |
Qin L, Xu X, Wang L, Zhang Y, Che W . Modularized pre-training for end-to-end task-oriented dialogue. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, 31: 1601–1610
|
/
〈 |
|
〉 |