Artificial intelligence-collaborative folk music composition system based on gesture recognition: A real-time interactive framework integrating computer vision and folk music generation

Qinghao Liu , Tazul Izan Tajuddin

International Journal of Systematic Innovation ›› 2025, Vol. 9 ›› Issue (6) : 44 -62.

PDF (1768KB)
International Journal of Systematic Innovation ›› 2025, Vol. 9 ›› Issue (6) :44 -62. DOI: 10.6977/IJoSI.202512_9(6).0004
ARTICLE
research-article
Artificial intelligence-collaborative folk music composition system based on gesture recognition: A real-time interactive framework integrating computer vision and folk music generation
Author information +
History +
PDF (1768KB)

Abstract

Artificial intelligence (AI) and gesture recognition offer new creative possibilities, yet culturally sensitive, real-time systems for gestural folk music composition remain largely undeveloped. This study develops an AI-collaborative folk music composition system that integrates computer vision-based gesture recognition with specialized folk music generation algorithms to create a real-time interactive framework that supports traditional music composition while preserving cultural musical characteristics across multiple folk traditions. The system employs a four-layer architecture encompassing gesture acquisition, computer vision processing, interpretation, and generation layers. A comprehensive dataset of 1,643 folk music compositions from established repositories representing English, American, Irish, and Chinese traditional music (Nottingham Dataset, Irish Traditional Corpus, and self-recorded materials) was curated, supplemented by 6,127 successfully tracked gesture samples collected from 47 participants across 12 folk music gesture categories. The evaluation framework assessed gesture recognition accuracy, cultural authenticity preservation, real-time performance, and collaborative effectiveness through extensive experimental validation. The system achieved robust gesture recognition performance with 88.9% accuracy and 23.4 ms processing latency, while maintaining end-to-end response times of 86.8-91.6 ms during collaborative sessions. Cultural authenticity scores ranged from 7.6 to 8.3 across different regional folk styles, with a user satisfaction rating of 7.8 and a 28% improvement in musical coherence compared to baseline approaches. The framework successfully supports up to eight concurrent users while maintaining sub-100 ms real-time performance requirements. The integrated system successfully demonstrates effective coordination between gesture recognition and folk music generation subsystems, validating the architectural design and optimization strategies for culturally sensitive AI applications across diverse folk music traditions. The validated framework provides a foundation for educational, performance, and cultural preservation applications, contributing methodological insights for multimodal human-AI interaction systems and culturally aware creative technologies applicable to traditional music contexts.

Keywords

Artificial Intelligence-Collaborative Music Composition / Computer Vision / Folk Music Generation / Gesture Recognition / Real-Time Interactive Framework / Traditional Music

Cite this article

Download citation ▾
Qinghao Liu, Tazul Izan Tajuddin. Artificial intelligence-collaborative folk music composition system based on gesture recognition: A real-time interactive framework integrating computer vision and folk music generation. International Journal of Systematic Innovation, 2025, 9(6): 44-62 DOI:10.6977/IJoSI.202512_9(6).0004

登录浏览全文

4963

注册一个新账户 忘记密码

Funding

This study is supported by Journal Support Fund, Universiti Teknologi MARA (UiTM).

References

[1]

Berkowitz, A.E. (2024). Artificial intelligence and musicking: A philosophical inquiry. Music Perception: An Interdisciplinary Journal, 41(5), 393-412. https://doi.org/10.1525/mp.2024.41.5.393

[2]

Bian, W., Song, Y., Gu, N., Chan, T.Y., Lo, T.T., Li, T.S., et al. (2023). MoMusic:A motion-driven human-AI collaborative music composition and performing system. Proceedings of the AAAI Conference on Artificial Intelligence, 37(13), 16057-16062. https://doi.org/10.1609/aaai.v37i13.26907

[3]

Borovik, I., & Viro, V. (2023). Co-performing music with AI:Real-time performance control using speech and gestures. In: HHAI 2023: Augmenting Human Intellect. IOS Press, Amsterdam, p340-350. https://doi.org/10.3233/FAIA230097

[4]

Boulkroune, A., Hamel, S., Zouari, F., Boukabou, A., & Ibeas, A. (2017). Output-feedback controller based projective lag-synchronization of uncertain chaotic systems in the presence of input nonlinearities. Mathematical Problems in Engineering, 2017(1), 8045803. https://doi.org/10.1155/2017/8045803

[5]

Boulkroune, A., Zouari, F., & Boubellouta, A. (2025). Adaptive fuzzy control for practical fixed-time synchronization of fractional-order chaotic systems. Journal of Vibration and Control, 10775463251320258.

[6]

Chang, J., Wang, Z., & Yan, C. (2024). MusicARLtrans Net: A multimodal agent interactive music education system driven via reinforcement learning. Frontiers in Neurorobotics, 18, 1479694. https://doi.org/10.3389/fnbot.2024.1479694

[7]

Chen, Y., Huang, L., & Gou, T. (2024). Applications and Advances of Artificial Intelligence in Music Generation: A Review. [arXiv Preprint]. https://doi.org/10.48550/arXiv.2409.03715

[8]

Cheng, L. (2025). The impact of generative AI on school music education: Challenges and recommendations. Arts Education Policy Review, 126, 255-262. https://doi.org/10.1080/10632913.2025.2451373

[9]

Civit, M., Civit-Masot, J., Cuadrado, F., & Escalona, M.J. (2022). A systematic review of artificial intelligence-based music generation: Scope, applications, and future trends. Expert Systems with Applications, 209, 118190. https://doi.org/10.1016/j.eswa.2022.118190

[10]

Dalmazzo, D., Waddell, G., & Ramírez, R. (2021). Applying deep learning techniques to estimate patterns of musical gesture. Frontiers in Psychology, 11, 575971. https://doi.org/10.3389/fpsyg.2020.575971

[11]

Dash, A., & Agres, K. (2024). AI-based affective music generation systems: A review of methods and challenges. ACM Computing Surveys, 56(11), 1-34. https://doi.org/10.1145/3672554

[12]

Dawande, A., Chourasia, U., & Dixit, P. (2023). Music Generation and Composition Using Machine Learning. In: Proceedings of 3rd International Conference on Artificial Intelligence: Advances and Applications: ICAIAA 2022, p547-566. https://doi.org/10.1007/978-981-19-7041-2_46

[13]

Dritsas, E., Trigka, M., Troussas, C., & Mylonas, P. (2025). Multimodal interaction, interfaces, and communication: A survey. Multimodal Technologies and Interaction, 9(1), 6. https://doi.org/10.3390/mti9010006

[14]

Fan, M. (2022). Application of music industry based on the deep neural network. Scientific Programming, 2022(1), 4068207. https://doi.org/10.1155/2022/4068207

[15]

Ferreira, P., Limongi, R., & Fávero, L.P. (2023). Generating music with data: Application of deep learning models for symbolic music composition. Applied Sciences, 13(7), 4543. https://doi.org/10.3390/app13074543

[16]

Fu, Y., Newman, M., Going, L., Feng, Q., & Lee, J.H. (2025). Exploring the Collaborative Co-Creation Process with AI: A Case Study in Novice Music Production. In: Proceedings of the 2025 ACM Designing Interactive Systems Conference, p1298-1312. https://doi.org/10.1145/3715336.3735829

[17]

Gao, X., Rogel, A., Sankaranarayanan, R., Dowling, B., & Weinberg, G. (2024). Music, body, and machine: Gesture-based synchronization in human-robot musical interaction. Frontiers in Robotics and AI, 11, 1461615. https://doi.org/10.3389/frobt.2024.1461615

[18]

Graf, M., Opara, H.C., & Barthet, M. (2021). An Audio-Driven System for Real-Time Music Visualisation. [arXiv Preprint].

[19]

Hansen, N.C., Højlund, A., Møller, C., Pearce, M., & Vuust, P. (2022). Musicians show more integrated neural processing of contextually relevant acoustic features. Frontiers in Neuroscience, 16, 907540. https://doi.org/10.3389/fnins.2022.907540

[20]

Hernandez-Olivan, C., & Beltran, J.R. (2022). Music composition with deep learning: A review. In: Advances in Speech and Music Technology: Computational Aspects and Applications. Springer Nature, Germany, p25-50. https://doi.org/10.1007/978-3-031-18444-4_2

[21]

Huang, J., Weber, C.J., & Rothe, S. (2025). An AI-driven Music Visualization System for Generating Meaningful Audio-Responsive Visuals in Real-Time. In: Proceedings of the 2025 ACM International Conference on Interactive Media Experiences, p258-274. https://doi.org/10.1145/3706370.3727869

[22]

Ji, S., Yang, X., & Luo, J. (2023). A survey on deep learning for symbolic music generation: Representations, algorithms, evaluations, and challenges. ACM Computing Surveys, 56(1), 1-39. https://doi.org/10.1145/3597493

[23]

Jia, J., He, Y., & Le, H. (2020). A Multimodal Human-Computer Interaction System and Its Application in Smart Learning Environments. In: International Conference on Blended Learning, p3-14.

[24]

Johansen, S.S., Van Berkel, N., & Fritsch, J. (2022). Characterising Soundscape Research in Human-Computer Interaction. In: Proceedings of the 2022 ACM Designing Interactive Systems Conference, p1394-1417. https://doi.org/10.1145/3532106.3533458

[25]

Kapoor, S. (2025). The Many Faces of Uncertainty Estimation in Machine Learning. New York University. Available from: https://www.proquest.com/openview/92ed381924762b1c4afbf2a168231b2f/1?pq-origsite=gscholar&cbl=18750&diss=y

[26]

Kim, G., Kim, D.K., & Jeong, H. (2024). Spontaneous emergence of rudimentary music detectors in deep neural networks. Nature Communications, 15(1), 148. https://doi.org/10.1038/s41467-023-44516-0

[27]

Krol, S.J., Llano Rodriguez, M.T., & Loor Paredes, M.J. (2025). Exploring the Needs of Practising Musicians in Co-Creative AI Through Co-Design. In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, p1-13. https://doi.org/10.1145/3706598.3713894

[28]

Lee, K.J.M., Pasquier, P., & Yuri, J. (2025). Revival: Collaborative Artistic Creation through Human-AI Interactions in Musical Creativity. [arXiv Preprint]. https://doi.org/10.48550/arXiv.2503.15498

[29]

Li, J., Xu, W., Cao, Y., Liu, W., & Cheng, W. (2020). Robust piano music transcription based on computer Vision. In: Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference and 2020 3rd International Conference on Big Data and Artificial Intelligence, p92-97. https://doi.org/10.1145/3409501.3409540

[30]

Liang, J. (2023). Harmonizing minds and machines: Survey on transformative power of machine learning in music. Frontiers in Neurorobotics, 17, 1267561. https://doi.org/10.3389/fnbot.2023.1267561

[31]

Otsu, K., Yuan, J., Fukuda, H., Kobayashi, Y., Kuno, Y., & Yamazaki, K. (2021). Enhancing Multimodal Interaction between Performers and Audience Members During Live Music Performances. In: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, p1-6. https://doi.org/10.1145/3411763.3451584

[32]

Pricop, T.C., & Iftene, A. (2024). Music generation with machine learning and deep neural networks. Procedia Computer Science, 246, 1855-1864. https://doi.org/10.1016/j.procs.2024.09.692

[33]

Rezwana, J., & Maher, M.L. (2023). Designing creative AI partners with COFI: A framework for modeling interaction in human-AI co-creative systems. ACM Transactions on Computer-Human Interaction, 30(5), 1-28. https://doi.org/10.1145/3519026

[34]

Rigatos, G., Abbaszadeh, M., Sari, B., Siano, P., Cuccurullo, G., & Zouari, F. (2023). Nonlinear optimal control for a gas compressor driven by an induction motor. Results in Control and Optimization, 11, 100226. https://doi.org/10.1016/j.rico.2023.100226

[35]

Roche, F. (2020). Music Sound Synthesis using Machine Learning: Towards a Perceptually Relevant Control Spac. Université Grenoble Alpes. Available from: https://theses.hal.science/tel-03102796v1 [Last accessed on 2024 Mar 12].

[36]

Sturm, B.L., & Ben-Tal, O. (2021). Folk the Algorithms: (Mis) Applying Artificial Intelligence to Folk Music. In: Handbook of Artificial Intelligence for Music: Foundations, Advanced Approaches, and Developments for Creativity. Springer, Germany, p423-454. https://doi.org/10.1007/978-3-030-72116-9_16

[37]

Vear, C., Benford, S., Avila, J.M., & Moroz, S. (2023). Human-AI Musicking: A Framework for Designing AI for Music Co-creativity. AIMC 2023. Available from: https://aimc2023.pubpub.org/pub/zd46ltn3 [Last accessed on 2024 Apr 25].

[38]

Yimer, M.H., Yu, Y., Adu, K., Favour, E., Liyih, S.M., & Patamia, R.A. (2023). Music Genre Classification using Deep Neural Networks. In: 2023 35th Chinese Control and Decision Conference (CCDC), p2384-2391. https://doi.org/10.1109/CCDC58219.2023.10327367

[39]

Zhao, Y., Yang, M., Lin, Y., Zhang, X., Shi, F., Wang, Z., et al. (2025). AI-enabled text-to-music generation: A comprehensive review of methods, frameworks, and future directions. Electronics, 14(6), 1197. https://doi.org/10.3390/electronics14061197

[40]

Zhu, T., Liu, H., Jiang, Z., & Zheng, Z. (2024). Symbolic Music Generation with Fine-grained Interactive Textural Guidance. Available from: https://openreview.net/forum?id=Qt5sBi0u7I [Last accessed on 2025 Jan 15].

[41]

Zouari, F., Saad, K.B., & Benrejeb, M. (2012). Robust neural adaptive control for a class of uncertain nonlinear complex dynamical multivariable systems. International Review on Modelling and Simulations, 5(5), 2075-2103. https://doi.org/10.1109/TNN.2010.2042611

[42]

Zouari, F., Saad, K.B., & Benrejeb, M. (2013a). Adaptive backstepping control for a class of uncertain single input single output nonlinear systems. In: 10th International Multi-Conferences on Systems, Signals and Devices 2013 (SSD13), p1-6. https://doi.org/10.1109/SSD.2013.6564134

[43]

Zouari, F., Saad, K.B., & Benrejeb, M. (2013b). Adaptive Backstepping Control for a Single-Link Flexible Robot Manipulator Driven DC Motor. In: 2013 International Conference on Control, Decision and Information Technologies (CoDIT), p864-871. https://doi.org/10.1109/codit.2013.6689656

PDF (1768KB)

0

Accesses

0

Citation

Detail

Sections
Recommended

/