Artificial intelligence-collaborative folk music composition system based on gesture recognition: A real-time interactive framework integrating computer vision and folk music generation

Qinghao Liu; Tazul Izan Tajuddin

doi:10.6977/IJoSI.202512_9(6).0004

International Journal of Systematic Innovation ›› 2025, Vol. 9 ›› Issue (6) :44 -62. DOI: 10.6977/IJoSI.202512_9(6).0004

ARTICLE

research-article

Artificial intelligence-collaborative folk music composition system based on gesture recognition: A real-time interactive framework integrating computer vision and folk music generation

Qinghao Liu ¹
, Tazul Izan Tajuddin ¹^,²^,^*

Author information +

¹ Faculty of Music, Universiti Teknologi MARA, Shah Alam, Selangor, Malaysia

² Institut Seni Kreatif Nusantara, Universiti Teknologi MARA, Shah Alam, Selangor, Malaysia

^* tazulizan@uitm.edu.my

Qinghao Liu obtained a Bachelor of Arts in Musical Performance (Vocal Music) in 2018 from Zhejiang University of Media & Communications, China, and a Master of Arts in Music (2019) from the University of Birmingham, UK. She is now a PhD student at Universiti Teknologi MARA (UiTM), Malaysia. Her research interests focus on contemporary Chinese Minyao music, digital platforms and music transmission, youth subculture, and the application of both qualitative and quantitative approaches in music and cultural studies.

Tazul Izan Tajuddin is a Malaysian composer, a Fulbright Visiting Scholar at Harvard University, and a visiting fellow at King’s College London. He has written more than 60 works, which have been performed, presented, and broadcast in 24 countries, garnering critically acclaimed reviews worldwide. His music, such as the Arabesque¸ Tenunan, Mediasi Ukiran, Gamelbati, Pantun, and Topography cycles, has been inspired by Malay-Asian cultures, Islamic geometrical patterns, and Western European art combined with diverse contemporary cultural ideals. His work has been published by Babelscores.com, Alexander Street Press (online), Dynamic Publication (Malaysia), ATMA Classique (Canada), FMR Records (UK), and Ibersonic (Spain). Awards included Top 10 Personality Award “The Legendary Music Composer,” National Academic Award in Arts and Culture, Toru Takemitsu Composition Award, Lutoslawski Award, JSCM Composers Award, New Millennium Award UK, Molinari Quartet Award, Creative Grant Industry Award, Anugerah Akademik UiTM, among others.

He is a Professor in Composition and the Dean of Faculty of Music, UiTM, the President of the Society of Malaysian Contemporary Composers (SMCC), an Associate Fellow of Institute of Creative Arts Nusantara (INSAN), a member of the Chopin Society Malaysia and the Performing Rights Society (UK), the former Vice Chancellor of College of Creative Arts, UiTM and the Former Vice-President of Fulbright Alumni Association Malaysia.

Show less

History +

PDF (1768KB)

Abstract

Artificial intelligence (AI) and gesture recognition offer new creative possibilities, yet culturally sensitive, real-time systems for gestural folk music composition remain largely undeveloped. This study develops an AI-collaborative folk music composition system that integrates computer vision-based gesture recognition with specialized folk music generation algorithms to create a real-time interactive framework that supports traditional music composition while preserving cultural musical characteristics across multiple folk traditions. The system employs a four-layer architecture encompassing gesture acquisition, computer vision processing, interpretation, and generation layers. A comprehensive dataset of 1,643 folk music compositions from established repositories representing English, American, Irish, and Chinese traditional music (Nottingham Dataset, Irish Traditional Corpus, and self-recorded materials) was curated, supplemented by 6,127 successfully tracked gesture samples collected from 47 participants across 12 folk music gesture categories. The evaluation framework assessed gesture recognition accuracy, cultural authenticity preservation, real-time performance, and collaborative effectiveness through extensive experimental validation. The system achieved robust gesture recognition performance with 88.9% accuracy and 23.4 ms processing latency, while maintaining end-to-end response times of 86.8-91.6 ms during collaborative sessions. Cultural authenticity scores ranged from 7.6 to 8.3 across different regional folk styles, with a user satisfaction rating of 7.8 and a 28% improvement in musical coherence compared to baseline approaches. The framework successfully supports up to eight concurrent users while maintaining sub-100 ms real-time performance requirements. The integrated system successfully demonstrates effective coordination between gesture recognition and folk music generation subsystems, validating the architectural design and optimization strategies for culturally sensitive AI applications across diverse folk music traditions. The validated framework provides a foundation for educational, performance, and cultural preservation applications, contributing methodological insights for multimodal human-AI interaction systems and culturally aware creative technologies applicable to traditional music contexts.

Keywords

Artificial Intelligence-Collaborative Music Composition / Computer Vision / Folk Music Generation / Gesture Recognition / Real-Time Interactive Framework / Traditional Music

Cite this article

Download citation ▾

Qinghao Liu, Tazul Izan Tajuddin. Artificial intelligence-collaborative folk music composition system based on gesture recognition: A real-time interactive framework integrating computer vision and folk music generation. International Journal of Systematic Innovation, 2025, 9(6): 44-62 DOI:10.6977/IJoSI.202512_9(6).0004

登录浏览全文

4963

注册一个新账户忘记密码

Funding

This study is supported by Journal Support Fund, Universiti Teknologi MARA (UiTM).

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Berkowitz, A.E. (2024). Artificial intelligence and musicking: A philosophical inquiry. Music Perception: An Interdisciplinary Journal, 41(5), 393-412. https://doi.org/10.1525/mp.2024.41.5.393

[2]	Bian, W., Song, Y., Gu, N., Chan, T.Y., Lo, T.T., Li, T.S., et al. (2023). MoMusic:A motion-driven human-AI collaborative music composition and performing system. Proceedings of the AAAI Conference on Artificial Intelligence, 37(13), 16057-16062. https://doi.org/10.1609/aaai.v37i13.26907

[3]	Borovik, I., & Viro, V. (2023). Co-performing music with AI:Real-time performance control using speech and gestures. In: HHAI 2023: Augmenting Human Intellect. IOS Press, Amsterdam, p340-350. https://doi.org/10.3233/FAIA230097

[4]

Boulkroune, A., Hamel, S., Zouari, F., Boukabou, A., & Ibeas, A. (2017). Output-feedback controller based projective lag-synchronization of uncertain chaotic systems in the presence of input nonlinearities. Mathematical Problems in Engineering, 2017(1), 8045803. https://doi.org/10.1155/2017/8045803

[5]	Boulkroune, A., Zouari, F., & Boubellouta, A. (2025). Adaptive fuzzy control for practical fixed-time synchronization of fractional-order chaotic systems. Journal of Vibration and Control, 10775463251320258.

[6]	Chang, J., Wang, Z., & Yan, C. (2024). MusicARLtrans Net: A multimodal agent interactive music education system driven via reinforcement learning. Frontiers in Neurorobotics, 18, 1479694. https://doi.org/10.3389/fnbot.2024.1479694

[7]	Chen, Y., Huang, L., & Gou, T. (2024). Applications and Advances of Artificial Intelligence in Music Generation: A Review. [arXiv Preprint]. https://doi.org/10.48550/arXiv.2409.03715

[8]	Cheng, L. (2025). The impact of generative AI on school music education: Challenges and recommendations. Arts Education Policy Review, 126, 255-262. https://doi.org/10.1080/10632913.2025.2451373

[9]	Civit, M., Civit-Masot, J., Cuadrado, F., & Escalona, M.J. (2022). A systematic review of artificial intelligence-based music generation: Scope, applications, and future trends. Expert Systems with Applications, 209, 118190. https://doi.org/10.1016/j.eswa.2022.118190

[10]	Dalmazzo, D., Waddell, G., & Ramírez, R. (2021). Applying deep learning techniques to estimate patterns of musical gesture. Frontiers in Psychology, 11, 575971. https://doi.org/10.3389/fpsyg.2020.575971

[11]	Dash, A., & Agres, K. (2024). AI-based affective music generation systems: A review of methods and challenges. ACM Computing Surveys, 56(11), 1-34. https://doi.org/10.1145/3672554

[12]	Dawande, A., Chourasia, U., & Dixit, P. (2023). Music Generation and Composition Using Machine Learning. In: Proceedings of 3rd International Conference on Artificial Intelligence: Advances and Applications: ICAIAA 2022, p547-566. https://doi.org/10.1007/978-981-19-7041-2_46

[13]	Dritsas, E., Trigka, M., Troussas, C., & Mylonas, P. (2025). Multimodal interaction, interfaces, and communication: A survey. Multimodal Technologies and Interaction, 9(1), 6. https://doi.org/10.3390/mti9010006

[14]	Fan, M. (2022). Application of music industry based on the deep neural network. Scientific Programming, 2022(1), 4068207. https://doi.org/10.1155/2022/4068207

[15]	Ferreira, P., Limongi, R., & Fávero, L.P. (2023). Generating music with data: Application of deep learning models for symbolic music composition. Applied Sciences, 13(7), 4543. https://doi.org/10.3390/app13074543

[16]	Fu, Y., Newman, M., Going, L., Feng, Q., & Lee, J.H. (2025). Exploring the Collaborative Co-Creation Process with AI: A Case Study in Novice Music Production. In: Proceedings of the 2025 ACM Designing Interactive Systems Conference, p1298-1312. https://doi.org/10.1145/3715336.3735829

[17]	Gao, X., Rogel, A., Sankaranarayanan, R., Dowling, B., & Weinberg, G. (2024). Music, body, and machine: Gesture-based synchronization in human-robot musical interaction. Frontiers in Robotics and AI, 11, 1461615. https://doi.org/10.3389/frobt.2024.1461615

[18]	Graf, M., Opara, H.C., & Barthet, M. (2021). An Audio-Driven System for Real-Time Music Visualisation. [arXiv Preprint].

[19]	Hansen, N.C., Højlund, A., Møller, C., Pearce, M., & Vuust, P. (2022). Musicians show more integrated neural processing of contextually relevant acoustic features. Frontiers in Neuroscience, 16, 907540. https://doi.org/10.3389/fnins.2022.907540

[20]	Hernandez-Olivan, C., & Beltran, J.R. (2022). Music composition with deep learning: A review. In: Advances in Speech and Music Technology: Computational Aspects and Applications. Springer Nature, Germany, p25-50. https://doi.org/10.1007/978-3-031-18444-4_2

[21]	Huang, J., Weber, C.J., & Rothe, S. (2025). An AI-driven Music Visualization System for Generating Meaningful Audio-Responsive Visuals in Real-Time. In: Proceedings of the 2025 ACM International Conference on Interactive Media Experiences, p258-274. https://doi.org/10.1145/3706370.3727869

[22]	Ji, S., Yang, X., & Luo, J. (2023). A survey on deep learning for symbolic music generation: Representations, algorithms, evaluations, and challenges. ACM Computing Surveys, 56(1), 1-39. https://doi.org/10.1145/3597493

[23]	Jia, J., He, Y., & Le, H. (2020). A Multimodal Human-Computer Interaction System and Its Application in Smart Learning Environments. In: International Conference on Blended Learning, p3-14.

[24]	Johansen, S.S., Van Berkel, N., & Fritsch, J. (2022). Characterising Soundscape Research in Human-Computer Interaction. In: Proceedings of the 2022 ACM Designing Interactive Systems Conference, p1394-1417. https://doi.org/10.1145/3532106.3533458

[25]	Kapoor, S. (2025). The Many Faces of Uncertainty Estimation in Machine Learning. New York University. Available from: https://www.proquest.com/openview/92ed381924762b1c4afbf2a168231b2f/1?pq-origsite=gscholar&cbl=18750&diss=y

[26]	Kim, G., Kim, D.K., & Jeong, H. (2024). Spontaneous emergence of rudimentary music detectors in deep neural networks. Nature Communications, 15(1), 148. https://doi.org/10.1038/s41467-023-44516-0

[27]	Krol, S.J., Llano Rodriguez, M.T., & Loor Paredes, M.J. (2025). Exploring the Needs of Practising Musicians in Co-Creative AI Through Co-Design. In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, p1-13. https://doi.org/10.1145/3706598.3713894

[28]	Lee, K.J.M., Pasquier, P., & Yuri, J. (2025). Revival: Collaborative Artistic Creation through Human-AI Interactions in Musical Creativity. [arXiv Preprint]. https://doi.org/10.48550/arXiv.2503.15498

[29]

Li, J., Xu, W., Cao, Y., Liu, W., & Cheng, W. (2020). Robust piano music transcription based on computer Vision. In: Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference and 2020 3rd International Conference on Big Data and Artificial Intelligence, p92-97. https://doi.org/10.1145/3409501.3409540

[30]	Liang, J. (2023). Harmonizing minds and machines: Survey on transformative power of machine learning in music. Frontiers in Neurorobotics, 17, 1267561. https://doi.org/10.3389/fnbot.2023.1267561

[31]

Otsu, K., Yuan, J., Fukuda, H., Kobayashi, Y., Kuno, Y., & Yamazaki, K. (2021). Enhancing Multimodal Interaction between Performers and Audience Members During Live Music Performances. In: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, p1-6. https://doi.org/10.1145/3411763.3451584

[32]	Pricop, T.C., & Iftene, A. (2024). Music generation with machine learning and deep neural networks. Procedia Computer Science, 246, 1855-1864. https://doi.org/10.1016/j.procs.2024.09.692

[33]	Rezwana, J., & Maher, M.L. (2023). Designing creative AI partners with COFI: A framework for modeling interaction in human-AI co-creative systems. ACM Transactions on Computer-Human Interaction, 30(5), 1-28. https://doi.org/10.1145/3519026

[34]	Rigatos, G., Abbaszadeh, M., Sari, B., Siano, P., Cuccurullo, G., & Zouari, F. (2023). Nonlinear optimal control for a gas compressor driven by an induction motor. Results in Control and Optimization, 11, 100226. https://doi.org/10.1016/j.rico.2023.100226

[35]	Roche, F. (2020). Music Sound Synthesis using Machine Learning: Towards a Perceptually Relevant Control Spac. Université Grenoble Alpes. Available from: https://theses.hal.science/tel-03102796v1 [Last accessed on 2024 Mar 12].

[36]

Sturm, B.L., & Ben-Tal, O. (2021). Folk the Algorithms: (Mis) Applying Artificial Intelligence to Folk Music. In: Handbook of Artificial Intelligence for Music: Foundations, Advanced Approaches, and Developments for Creativity. Springer, Germany, p423-454. https://doi.org/10.1007/978-3-030-72116-9_16

[37]	Vear, C., Benford, S., Avila, J.M., & Moroz, S. (2023). Human-AI Musicking: A Framework for Designing AI for Music Co-creativity. AIMC 2023. Available from: https://aimc2023.pubpub.org/pub/zd46ltn3 [Last accessed on 2024 Apr 25].

[38]	Yimer, M.H., Yu, Y., Adu, K., Favour, E., Liyih, S.M., & Patamia, R.A. (2023). Music Genre Classification using Deep Neural Networks. In: 2023 35th Chinese Control and Decision Conference (CCDC), p2384-2391. https://doi.org/10.1109/CCDC58219.2023.10327367

[39]	Zhao, Y., Yang, M., Lin, Y., Zhang, X., Shi, F., Wang, Z., et al. (2025). AI-enabled text-to-music generation: A comprehensive review of methods, frameworks, and future directions. Electronics, 14(6), 1197. https://doi.org/10.3390/electronics14061197

[40]	Zhu, T., Liu, H., Jiang, Z., & Zheng, Z. (2024). Symbolic Music Generation with Fine-grained Interactive Textural Guidance. Available from: https://openreview.net/forum?id=Qt5sBi0u7I [Last accessed on 2025 Jan 15].

[41]	Zouari, F., Saad, K.B., & Benrejeb, M. (2012). Robust neural adaptive control for a class of uncertain nonlinear complex dynamical multivariable systems. International Review on Modelling and Simulations, 5(5), 2075-2103. https://doi.org/10.1109/TNN.2010.2042611

[42]	Zouari, F., Saad, K.B., & Benrejeb, M. (2013a). Adaptive backstepping control for a class of uncertain single input single output nonlinear systems. In: 10th International Multi-Conferences on Systems, Signals and Devices 2013 (SSD13), p1-6. https://doi.org/10.1109/SSD.2013.6564134

[43]	Zouari, F., Saad, K.B., & Benrejeb, M. (2013b). Adaptive Backstepping Control for a Single-Link Flexible Robot Manipulator Driven DC Motor. In: 2013 International Conference on Control, Decision and Information Technologies (CoDIT), p864-871. https://doi.org/10.1109/codit.2013.6689656