User story clustering in agile development: a framework and an empirical study
Bo YANG, Xiuyin MA, Chunhui WANG, Haoran GUO, Huai LIU, Zhi JIN
User story clustering in agile development: a framework and an empirical study
Agile development aims at rapidly developing software while embracing the continuous evolution of user requirements along the whole development process. User stories are the primary means of requirements collection and elicitation in the agile development. A project can involve a large amount of user stories, which should be clustered into different groups based on their functionality’s similarity for systematic requirements analysis, effective mapping to developed features, and efficient maintenance. Nevertheless, the current user story clustering is mainly conducted in a manual manner, which is time-consuming and subjective to human bias. In this paper, we propose a novel approach for clustering the user stories automatically on the basis of natural language processing. Specifically, the sentence patterns of each component in a user story are first analysed and determined such that the critical structure in the representative tasks can be automatically extracted based on the user story meta-model. The similarity of user stories is calculated, which can be used to generate the connected graph as the basis of automatic user story clustering. We evaluate the approach based on thirteen datasets, compared against ten baseline techniques. Experimental results show that our clustering approach has higher accuracy, recall rate and F1-score than these baselines. It is demonstrated that the proposed approach can significantly improve the efficacy of user story clustering and thus enhance the overall performance of agile development. The study also highlights promising research directions for more accurate requirements elicitation.
user story / agile development / user story mapping / clustering
Bo Yang received the PhD degree in computer software and theory from the Beihang University, China. He is an associate professor at the School of Information Science and Technology, Beijing Forestry University, China. His research interests include deep learning, software testing, software fault localization, and software requirements analysis. He is a member of CCF
Xiuyin Ma received the BEng degree from the North China University of Technology, China. Her research interests include software requirements analysis and software testing
Chunhui Wang received her PhD degree in computer science from School of Electronics Engineering and Computer Science, Peking University, China in 2020. Currently, she is an associate professor at the School of Computer Science, Inner Mongolia Normal University, China. Her research interests include requirements engineering and collective intelligence based software engineering. She is a member of CCF
Haoran Guo received the BEng degree from the North China University of Technology, China. His research interests include software fault localization and software testing
Huai Liu received the PhD degree in software engineering from the Swinburne University of Technology, Australia. He is a senior lecturer in the Department of Computing Technologies, Swinburne University of Technology, Australia. He has worked as a lecturer at Victoria University and a research fellow at RMIT University Australia. His current research interests include software testing, cloud computing, and end-user software engineering
Zhi Jin obtained her BSc from Zhejiang University, China in 1984, and PhD from National University of Defense Technology, China in 1992, respectively. She is a professor in School of Computer Science, Peking University (PKU), China and serves as the Deputy Director of High-Confidence Software Technologies (PKU), Ministry of Education, China since 2009. Her research interests include requirements engineering, knowledge engineering, and knowledge-based software engineering
[1] |
Sillitti A, Succi G. Requirements engineering for agile methods. In: Aurum A, Wohlin C, eds. Engineering and Managing Software Requirements. Berlin, Heidelberg: Springer, 2005, 309−326
|
[2] |
Leffingwell D. Agile Software Requirements: Lean Requirements Practices for Teams, Programs, and the Enterprise. Upper Saddle River: Addison-Wesley Professional, 2011
|
[3] |
Wang X, Zhao L, Wang Y, Sun J. The role of requirements engineering practices in agile development: an empirical study. In: Zowghi D, Jin Z, eds. Requirements Engineering. Berlin, Heidelberg: Springer, 2014, 195−209
|
[4] |
Kassab M. The changing landscape of requirements engineering practices over the past decade. In: Proceedings of the 5th IEEE International Workshop on Empirical Requirements Engineering (EmpiRE). 2015, 1−8
|
[5] |
Dimitrijević S, Jovanović J, Devedžić V . A comparative study of software tools for user story management. Information and Software Technology, 2015, 57: 352–368
|
[6] |
Patton J, Economy P. User Story Mapping: Discover the Whole Story, Build the Right Product. Sebastopol: O’Reilly Media, Inc., 2014
|
[7] |
Wang C H, Jin Z, Zhao H Y, Liu L, Zhang W, Cui M Y . Human-assisted elicitation and evolution of user stories with scenarios. Journal of Software, 2019, 30( 10): 3186–3205
|
[8] |
Lucassen G, Dalpiaz F, van der Werf J M E M, Brinkkemper S. Visualizing user story requirements at multiple granularity levels via semantic relatedness. In: Proceedings of the 35th International Conference on Conceptual Modeling. 2016, 463−478
|
[9] |
Wautelet Y, Heng S, Kolp M, Mirbel I, Poelmans S. Building a rationale diagram for evaluating user story sets. In: Proceedings of the 10th IEEE International Conference on Research Challenges in Information Science (RCIS). 2016, 1–12
|
[10] |
Tsilionis K, Maene J, Heng S, Wautelet Y, Poelmans S. Conceptual modeling versus user story mapping: which is the best approach to agile requirements engineering? In: Proceedings of the 15th International Conference on Research Challenges in Information Science. 2021, 356–373
|
[11] |
Berends J, Dalpiaz F . Refining user stories via example mapping: an empirical investigation. Proceedings of the 29th IEEE International Requirements Engineering Conference (RE), 2021, 345–355
|
[12] |
Wautelet Y, Heng S, Kolp M, Mirbel I. Unifying and extending user story models. In: Proceedings of the 26th International Conference on Advanced Information Systems Engineering. 2014, 211−225
|
[13] |
Grau G, Franch X, Mayol E, Ayala C, Cares C, Haya M, Navarrete F, Botella P, Quer C. RiSD: a methodology for building i* strategic dependency models. In: Proceedings of the 17th International Conference on Software Engineering and Knowledge Engineering. 2005, 259−266
|
[14] |
Lucassen G, Dalpiaz F, van der Werf J M E M, Brinkkemper S. Forging high-quality user stories: towards a discipline for agile requirements. In: Proceedings of the 23rd IEEE International Requirements Engineering Conference (RE). 2015, 126−135
|
[15] |
Kanungo T, Mount D M, Netanyahu N S, Piatko C D, Silverman R, Wu A Y. An efficient k-means clustering algorithm: analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24: 881
|
[16] |
Ester M, Kriegel H P, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. 1996, 226−231
|
[17] |
Belkin M, Niyogi P. Laplacian Eigenmaps and spectral techniques for embedding and clustering. In: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic. 2001, 585−591
|
[18] |
Joachims T. A probabilistic analysis of the rocchio algorithm with TFIDF for text categorization. In: Proceedings of the 14th International Conference on Machine Learning. 1997, 143–151
|
[19] |
Le Q, Mikolov T. Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning. 2014, 1188−1196
|
[20] |
Xu J, Wang P, Tian G, Xu B, Zhao J, Wang F, Hao H. Short text clustering via convolutional neural networks. In: Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing. 2015, 62−69
|
[21] |
Goutte C, Gaussier E. A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: Proceedings of the 27th European Conference on Information Retrieval. 2005, 345−359
|
[22] |
Larsen B, Aone C. Fast and effective text mining using linear-time document clustering. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1999, 16–22
|
[23] |
Hedges L V, Olkin I. Statistical Methods for Meta-Analysis. New York: Academic Press, 1985
|
[24] |
Sawilowsky S S . New effect size rules of thumb. Journal of Modern Applied Statistical Methods, 2009, 8( 2): 597–599
|
[25] |
Rodeghero P, Jiang S, Armaly A, McMillan C. Detecting user story information in developer-client conversations to generate extractive summaries. In: Proceedings of the 39th IEEE/ACM International Conference on Software Engineering (ICSE). 2017, 49−59
|
[26] |
Lucassen G, Dalpiaz F, van der Werf J M E M, Brinkkemper S . Improving agile requirements: the Quality User Story framework and tool. Requirements Engineering, 2016, 21( 3): 383–403
|
[27] |
Robeer M, Lucassen G, van der Werf J M E M, Dalpiaz F, Brinkkemper S. Automated extraction of conceptual models from user stories via NLP. In: Proceedings of the 24th IEEE International Requirements Engineering Conference (RE). 2016, 196−205
|
[28] |
Dalpiaz F, van der Schalk I, Lucassen G. Pinpointing ambiguity and incompleteness in requirements engineering via information visualization and NLP. In: Proceedings of the 24th International Working Conference on Requirements Engineering: Foundation for Software Quality. 2018, 119−135
|
[29] |
Wautelet Y, Heng S, Hintea D, Kolp M, Poelmans S. Bridging user story sets with the use case model. In: Proceedings of 2016 International Conference on Conceptual Modeling. 2016, 127−138
|
[30] |
Mesquita R, Jaqueira A, Agra C, Lucena M, Alencar F. US2StarTool: generating i* models from user stories. In: Proceedings of the 8th International i* Workshop (istar 2015). 2015, 103−108
|
[31] |
Jaqueira A, Lucena M, Alencar F M R, Castro J, Aranha E. Using i* models to enrich user stories. In: Proceedings of the 6th International i* Workshop 2013. 2013, 55−60
|
[32] |
Trkman M, Mendling J, Krisper M . Using business process models to better understand the dependencies among user stories. Information and Software Technology, 2016, 71: 58–76
|
[33] |
Wautelet Y, Velghe M, Heng S, Poelmans S, Kolp M. On modelers ability to build a visual diagram from a user story set: a goal-oriented approach. In: Proceedings of the 24th International Working Conference on Requirements Engineering: Foundation for Software Quality. 2018, 209−226
|
[34] |
Barbosa R, Silva A E A, Moraes R. Use of similarity measure to suggest the existence of duplicate user stories in the srum process. In: Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshop (DSN-W). 2016, 2−5
|
[35] |
Li C, Duan Y, Wang H, Zhang Z, Sun A, Ma Z . Enhancing topic modeling for short texts with auxiliary word embeddings. ACM Transactions on Information Systems, 2017, 36( 2): 11
|
[36] |
Quan X, Kit C, Ge Y, Pan S J. Short and sparse text topic modeling via self-aggregation. In: Proceedings of the 24th International Conference on Artificial Intelligence. 2015, 2270−2276
|
[37] |
Seifzadeh S, Farahat A K, Kamel M S, Karray F. Short-text clustering using statistical semantics. In: Proceedings of the 24th International Conference on World Wide Web. 2015, 805−810
|
[38] |
Joulin A, Grave E, Bojanowski P, Mikolov T. Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. 2017, 427−431
|
[39] |
De Boom C, Van Canneyt S, Bohez S, Demeester T, Dhoedt B. Learning semantic similarity for very short texts. In: Proceedings of 2015 IEEE International Conference on Data Mining Workshop (ICDMW), 2015, 1229−1234
|
[40] |
Zeng J, Li J, Song Y, Gao C, Lyu M R, King I. Topic memory networks for short text classification. In: Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing. 2018, 3120−3131
|
[41] |
Kenter T, de Rijke M. Short text similarity with word embeddings. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. 2015, 1411−1420
|
[42] |
Hua W, Wang Z, Wang H, Zheng K, Zhou X. Short text understanding through lexical-semantic analysis. In: Proceedings of the 31st IEEE International Conference on Data Engineering. 2015, 495−506
|
[43] |
Liang S, Yilmaz E, Kanoulas E. Dynamic clustering of streaming short documents. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016, 995−1004
|
[44] |
Zuo Y, Wu J, Zhang H, Lin H, Wang F, Xu K, Xiong H. Topic modeling of short texts: a pseudo-document view. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016, 2105−2114
|
[45] |
Yin J, Chao D, Liu Z, Zhang W, Yu X, Wang J. Model-based clustering of short text streams. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018, 2634−2642
|
[46] |
Banerjee S, Ramanathan K, Gupta A. Clustering texts using wikipedia. In: Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval. 2007
|
[47] |
Fodeh S, Punch B, Tan P N . On ontology-driven document clustering using core semantic features. Knowledge and Information Systems, 2011, 28( 2): 395–421
|
[48] |
Yin J, Wang J. A Dirichlet multinomial mixture model-based approach for short text clustering. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2014, 233−242
|
[49] |
Lai S, Xu L, Liu K, Zhao J. Recurrent convolutional neural networks for text classification. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015, 2267−2273
|
[50] |
Ravi S, Kozareva Z. Self-governing neural networks for on-device short text classification. In: Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing. 2018, 804−810
|
[51] |
Xie J, Girshick R, Farhadi A. Unsupervised deep embedding for clustering analysis. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning. 2016, 478−487
|
[52] |
Zhang X, Zhao J, LeCun Y. Character-level convolutional networks for text classification. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015, 649−657
|
[53] |
Wang M, Lu Z, Li H, Liu Q. Syntax-based deep matching of short texts. In: Proceedings of the 24th International Conference on Artificial Intelligence. 2015, 1354−1361
|
[54] |
Wang P, Xu J, Xu B, Liu C L, Zhang H, Wang F, Hao H. Semantic clustering and convolutional neural network for short text categorization. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 2015, 352−357
|
/
〈 | 〉 |