RiverEcho-2.0: A Real-Time Interactive System for Yellow River Culture via Enhanced MultiModal Document RAG
Haofeng Wang , Yilin Guo , Tiange Zhang , Zehao Li , Tong Yue , Yizong Wang , Rongqun Lin , Feng Gao , Shiqi Wang , Siwei Ma
Transactions on Artificial Intelligence ›› 2025, Vol. 1 ›› Issue (1) : 212 -226.
The Yellow River culture is a cornerstone of Chinese civilization, em- bodying rich historical, social, and ecological significance. To conserve and promote this invaluable cultural heritage, we propose RiverEcho-2.0, a real-time interactive digital system designed to facilitate user engagement with Yellow River culture. As the foundation of our system, we curated and digitized a comprehensive col- lection of books and documents related to Yellow River heritage, constructing a dedicated multimodal corpus. To effectively leverage this corpus, we introduce a novel multi-modal Document Retrieval-Augmented Generation (RAG) framework that enhances document retrieval through context-aware image-text alignment and joint embedding. Experimental results demonstrate that our method achieves a large improvement over existing state-of-the-art multi-modal RAG baselines, leading to significant gains in downstream tasks.
Yellow River culture / dataset construction / multi-modal document RAG
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
Wptoux. Bloom-7B-Chunhua. Available online: accessed on 1 October 2023). |
| [23] |
XunziALLM.Available online: accessed on 1 March 2024). |
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
| [39] |
|
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
|
| [44] |
|
| [45] |
|
| [46] |
LiveTalking: Real-Time Interactive Streaming Digital Human. 2024. Available online: accessed on 16 March 2025). |
| [47] |
|
| [48] |
|
| [49] |
Metahuman-stream: Real-time Streaming Digital Human Based on NeRF. 2023. Available online: accessed on 16 March 2025). |
| [50] |
Adobe Systems Incorporated.Real-Time Messaging Protocol (RTMP) Specification. 2002. accessed on 16 March 2025). |
| [51] |
IETF and W3C. Web Real-Time Communication (WebRTC) Standard. 2011. Available online: accessed on 16 March 2025). |
| [52] |
Synthesia. Synthesia: AI Video Generation Platform. 2017. Available online: accessed on 16 March 2025). |
| [53] |
|
| [54] |
|
| [55] |
|
| [56] |
Edge-tts: Use Microsoft Edge’s Online Text-to-Speech Service from Python WITHOUT Needing Microsoft Edge or Windows or an API Key. 2024. Available online: accessed on 16 March 2025). |
| [57] |
|
| [58] |
|
| [59] |
|
| [60] |
|
| [61] |
|
| [62] |
|
| [63] |
|
| [64] |
|
| [65] |
|
| [66] |
|
| [67] |
|
/
| 〈 |
|
〉 |