Random forest-based weighted majority voting for crowdsourcing

Liangjun YU , Wenjun ZHANG , Liangxiao JIANG

Front. Comput. Sci. ›› 2027, Vol. 21 ›› Issue (3) : 2103603

PDF (2755KB)
Front. Comput. Sci. ›› 2027, Vol. 21 ›› Issue (3) :2103603 DOI: 10.1007/s11704-025-51186-2
Information Systems
RESEARCH ARTICLE
Random forest-based weighted majority voting for crowdsourcing
Author information +
History +
PDF (2755KB)

Abstract

In crowdsourcing scenarios, we can obtain each instance’s multiple noisy label set from crowd workers and then infer its unknown true label via label integration. Recent studies show that label integration performs well when the label quality of most workers is high, but seldom considers the crowdsourcing scenario in which the label quality of most workers is low. In this work, we argue that the label quality of most workers is low while the label quality of a few workers is high, label integration can also perform well. Based on this premise, we propose a novel label integration algorithm called random forest-based weighted majority voting (RFWMV). RFWMV uses a random forest to learn multiple labeling rules for each worker and uses the consistency of labeling rules to evaluate the label quality of each worker. Specifically, RFWMV first respectively trains a random forest on the instances labeled by each worker. Then, RFWMV estimates the label quality of each worker based on the outputs of the corresponding random forest’s base classifiers. Finally, RFWMV infers integrated labels of instances by the weighted majority voting based on each worker’s label quality and its corresponding random forest’s output. The extensive experiments show that RFWMV significantly outperforms all the other state-of-the-art label integration algorithms.

Graphical abstract

Keywords

crowdsourcing / label integration / label quality / random forest

Cite this article

Download citation ▾
Liangjun YU, Wenjun ZHANG, Liangxiao JIANG. Random forest-based weighted majority voting for crowdsourcing. Front. Comput. Sci., 2027, 21(3): 2103603 DOI:10.1007/s11704-025-51186-2

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Jiang L, Zhang L, Li C, Wu J . A correlation-based feature weighting filter for naive bayes. IEEE Transactions on Knowledge and Data Engineering, 2019, 31( 2): 201–213

[2]

Li C, Jiang L, Xu W . Noise correction to improve data and model quality for crowdsourcing. Engineering Applications of Artificial Intelligence, 2019, 82: 184–191

[3]

Zhang H, Jiang L, Li C . CS-ResNet: cost-sensitive residual convolutional neural network for PCB cosmetic defect detection. Expert Systems with Applications, 2021, 185: 115673

[4]

Li S Y, Huang S J, Chen S . Crowdsourcing aggregation with deep Bayesian learning. Science China Information Sciences, 2021, 64( 3): 130104

[5]

Yu G, Tu J, Wang J, Domeniconi C, Zhang X . Active multilabel crowd consensus. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32( 4): 1448–1459

[6]

He G, Li B, Wang H, Jiang W . Cost-effective active semi-supervised learning on multivariate time series data with crowds. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2022, 52( 3): 1437–1450

[7]

Xu W, Jiang L, Li C . Improving data and model quality in crowdsourcing using cross-entropy-based noise correction. Information Sciences, 2021, 546: 803–814

[8]

Dong Y, Jiang L, Li C . Improving data and model quality in crowdsourcing using co-training-based noise correction. Information Sciences, 2022, 583: 174–188

[9]

Su B, Jiang L, Si S . Confident learning-based noise correction for crowdsourcing. Pattern Recognition, 2026, 169: 111962

[10]

Wang Y, Dai W, Jin Q, Ma J . BciNet: a biased contest-based crowdsourcing incentive mechanism through exploiting social networks. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, 50( 8): 2926–2937

[11]

Li S Y, Jiang Y, Chawla N V, Zhou Z H . Multi-label learning from crowds. IEEE Transactions on Knowledge and Data Engineering, 2019, 31( 7): 1369–1382

[12]

Li F, Wang Y, Gao Y, Tong X, Jiang N, Cai Z . Three-party evolutionary game model of stakeholders in mobile crowdsourcing. IEEE Transactions on Computational Social Systems, 2022, 9( 4): 974–985

[13]

Zhang J . Knowledge learning with crowdsourcing: a brief review and systematic perspective. IEEE/CAA Journal of Automatica Sinica, 2022, 9( 5): 749–762

[14]

Zhang J, Wu X, Sheng V S . Learning from crowdsourced labeled data: a survey. Artificial Intelligence Review, 2016, 46( 4): 543–576

[15]

Jiang J, An B, Jiang Y, Lin D . Context-aware reliable crowdsourcing in social networks. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, 50( 2): 617–632

[16]

Sheng V S, Zhang J. Machine learning with crowdsourcing: a brief summary of the past research and future directions. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 9837−9843

[17]

Zhang J, Sheng V S, Li T, Wu X . Improving crowdsourced label quality using noise correction. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29( 5): 1675–1688

[18]

Ma F, Li Y, Li Q, Qiu M, Gao J, Zhi S, Su L, Zhao B, Ji H, Han J. FaitCrowd: fine grained truth discovery for crowdsourced data aggregation. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015, 745−754

[19]

Pan C, Jiang L, Li C . Three-way decision-based label integration for crowdsourcing. Pattern Recognition, 2025, 158: 111034

[20]

Sheng V S, Zhang J, Gu B, Wu X . Majority voting and pairing with multiple noisy labeling. IEEE Transactions on Knowledge and Data Engineering, 2019, 31( 7): 1355–1368

[21]

Tao F, Jiang L, Li C . Differential evolution-based weighted soft majority voting for crowdsourcing. Engineering Applications of Artificial Intelligence, 2021, 106: 104474

[22]

Chen Z, Jiang L, Li C . Label augmented and weighted majority voting for crowdsourcing. Information Sciences, 2022, 606: 397–409

[23]

Zhang X, Xu G, Sun Y, Zhang M, Wang X, Zhang M. Identifying Chinese opinion expressions with extremely-noisy crowdsourcing annotations. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022, 2801−2813

[24]

Sheng V S, Provost F, Ipeirotis P G. Get another label? Improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 614−622

[25]

Zhang J, Sheng V S, Wu J, Wu X . Multi-class ground truth inference in crowdsourcing with clustering. IEEE Transactions on Knowledge and Data Engineering, 2016, 28( 4): 1080–1085

[26]

Yin L, Han J, Zhang W, Yu Y. Aggregating crowd wisdoms with label-aware autoencoders. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017, 1325−1331

[27]

Zhang J, Sheng V S, Wu J . Crowdsourced label aggregation using bilayer collaborative clustering. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30( 10): 3172–3185

[28]

Tian T, Zhu J, Qiaoben Y . Max-margin majority voting for learning from crowds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41( 10): 2480–2494

[29]

Li Y, Rubinstein B I P, Cohn T. Exploiting worker correlation for label aggregation in crowdsourcing. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 3886−3895

[30]

Jiang L, Zhang H, Tao F, Li C . Learning from crowds with multiple noisy label distribution propagation. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33( 11): 6558–6568

[31]

Dawid A P, Skene A M . Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society Series C: Applied Statistics, 1979, 28( 1): 20–28

[32]

Whitehill J, Ruvolo P, Wu T, Bergsma J, Movellan J. Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems. 2009, 2035−2043

[33]

Singh A K . The EM algorithm and related statistical models. Technometrics, 2006, 48( 1): 148

[34]

Welinder P, Branson S, Belongie S, Perona P. The multidimensional wisdom of crowds. In: Proceedings of the 24th International Conference on Neural Information Processing Systems. 2010, 2424−2432

[35]

Li H, Yu B. Error rate bounds and iterative weighted majority voting for crowdsourcing. 2014, arXiv preprint arXiv: 1411.4086

[36]

Karger D R, Oh S, Shah D. Iterative learning for reliable crowdsourcing systems. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. 2011, 1953−1961

[37]

Ok J, Oh S, Shin J, Yi Y. Optimality of belief propagation for crowdsourced classification. In: Proceedings of the 33rd International Conference on Machine Learning. 2016, 535−544

[38]

Tao D, Cheng J, Yu Z, Yue K, Wang L . Domain-weighted majority voting for crowdsourcing. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30( 1): 163–174

[39]

Li J, Baba Y, Kashima H. Incorporating worker similarity for label aggregation in crowdsourcing. In: Proceedings of the 27th International Conference on Artificial Neural Networks Artificial Neural Networks and Machine Learning. 2018, 596−606

[40]

Tao F, Jiang L, Li C . Label similarity-based weighted soft majority voting and pairing for crowdsourcing. Knowledge and Information Systems, 2020, 62( 7): 2521–2538

[41]

Storn R, Price K . Differential evolution - A simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 1997, 11( 4): 341–359

[42]

Pan C, Jiang L, Si S . Cross-worker joint modeling-based label integration for crowdsourcing. International Journal of Approximate Reasoning, 2025, 187: 109570

[43]

Li J, Jiang L, Zhang W . Label consistency-based ground truth inference for crowdsourcing. IEEE Transactions on Neural Networks and Learning Systems, 2025, 36( 5): 9408–9421

[44]

Witten I H, Frank E, Hall M A. Data Mining: Practical Machine Learning Tools and Techniques. 3rd ed. Burlington: Morgan Kaufmann, 2011

[45]

Zhang J, Sheng V S, Nicholson B A, Wu X . CEKA: a tool for mining the wisdom of crowds. Journal of Machine Learning Research, 2015, 16: 2853–2858

[46]

Demšar J . Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 2006, 7: 1–30

[47]

Quinlan J R. C4.5: Programs for Machine Learning. San Mateo: Morgan Kaufmann, 1993

RIGHTS & PERMISSIONS

Higher Education Press

PDF (2755KB)

Supplementary files

Highlights

289

Accesses

0

Citation

Detail

Sections
Recommended

/