Artificial intelligence-collaborative folk music composition system based on gesture recognition: A real-time interactive framework integrating computer vision and folk music generation
Qinghao Liu , Tazul Izan Tajuddin
International Journal of Systematic Innovation ›› 2025, Vol. 9 ›› Issue (6) : 44 -62.
Artificial intelligence (AI) and gesture recognition offer new creative possibilities, yet culturally sensitive, real-time systems for gestural folk music composition remain largely undeveloped. This study develops an AI-collaborative folk music composition system that integrates computer vision-based gesture recognition with specialized folk music generation algorithms to create a real-time interactive framework that supports traditional music composition while preserving cultural musical characteristics across multiple folk traditions. The system employs a four-layer architecture encompassing gesture acquisition, computer vision processing, interpretation, and generation layers. A comprehensive dataset of 1,643 folk music compositions from established repositories representing English, American, Irish, and Chinese traditional music (Nottingham Dataset, Irish Traditional Corpus, and self-recorded materials) was curated, supplemented by 6,127 successfully tracked gesture samples collected from 47 participants across 12 folk music gesture categories. The evaluation framework assessed gesture recognition accuracy, cultural authenticity preservation, real-time performance, and collaborative effectiveness through extensive experimental validation. The system achieved robust gesture recognition performance with 88.9% accuracy and 23.4 ms processing latency, while maintaining end-to-end response times of 86.8-91.6 ms during collaborative sessions. Cultural authenticity scores ranged from 7.6 to 8.3 across different regional folk styles, with a user satisfaction rating of 7.8 and a 28% improvement in musical coherence compared to baseline approaches. The framework successfully supports up to eight concurrent users while maintaining sub-100 ms real-time performance requirements. The integrated system successfully demonstrates effective coordination between gesture recognition and folk music generation subsystems, validating the architectural design and optimization strategies for culturally sensitive AI applications across diverse folk music traditions. The validated framework provides a foundation for educational, performance, and cultural preservation applications, contributing methodological insights for multimodal human-AI interaction systems and culturally aware creative technologies applicable to traditional music contexts.
Artificial Intelligence-Collaborative Music Composition / Computer Vision / Folk Music Generation / Gesture Recognition / Real-Time Interactive Framework / Traditional Music
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
| [39] |
|
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
|
/
| 〈 |
|
〉 |