Classification-oriented dawid skene model for transferring intelligence from crowds to machines

Jiaran LI , Richong ZHANG , Samuel MENSAH , Wenyi QIN , Chunming HU

Front. Comput. Sci. ›› 2023, Vol. 17 ›› Issue (5) : 175332

PDF (15012KB)
Front. Comput. Sci. ›› 2023, Vol. 17 ›› Issue (5) : 175332 DOI: 10.1007/s11704-022-2245-8
Artificial Intelligence
RESEARCH ARTICLE

Classification-oriented dawid skene model for transferring intelligence from crowds to machines

Author information +
History +
PDF (15012KB)

Abstract

When a crowdsourcing approach is used to assist the classification of a set of items, the main objective is to classify this set of items by aggregating the worker-provided labels. A secondary objective is to assess the workers’ skill levels in this process. A classical model that achieves both objectives is the famous Dawid-Skene model. In this paper, we consider a third objective in this context, namely, to learn a classifier that is capable of labelling future items without further assistance of crowd workers. By extending the Dawid-Skene model to include the item features into consideration, we develop a Classification-Oriented Dawid Skene (CODS) model, which achieves the three objectives simultaneously. The effectiveness of CODS on this three dimensions of the problem space is demonstrated experimentally.

Graphical abstract

Keywords

crowdsourcing / information aggregation / learning from crowds

Cite this article

Download citation ▾
Jiaran LI, Richong ZHANG, Samuel MENSAH, Wenyi QIN, Chunming HU. Classification-oriented dawid skene model for transferring intelligence from crowds to machines. Front. Comput. Sci., 2023, 17(5): 175332 DOI:10.1007/s11704-022-2245-8

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Su H, Deng J, Li F F. Crowdsourcing annotations for visual object detection. In: Proceedings of 2012 AAAI Workshop on Human Computation. 2012, 40−46

[2]

Welinder P, Branson S, Belongie S, Perona P. The multidimensional wisdom of crowds. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems. 2010, 2424−2432

[3]

Little G, Chilton L B, Goldman M, Miller R C. TurKit: human computation algorithms on mechanical Turk. In: Proceedings of the 23rd Annual ACM Symposium on User Interface Software and Technology. 2010, 57−66

[4]

Snow R, O’Connor B, Jurafsky D, Ng A Y. Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2008, 254−263

[5]

Lu X, Chow T W S . Modeling sequential annotations for sequence labeling with crowds. IEEE Transactions on Cybernetics, 2021, 1–11

[6]

Lin C H, Mausam, Weld D S. Dynamically switching between synergistic workflows for crowdsourcing. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence. 2012, 87−93

[7]

Wang J, Kraska T, Franklin M J, Feng J . CrowdER: crowdsourcing entity resolution. Proceedings of the VLDB Endowment, 2012, 5( 11): 1483–1494

[8]

Khatib F, Cooper S, Tyka M D, Xu K, Makedon I, Popović Z, Baker D, Players F . Algorithm discovery by protein folding game players. Proceedings of the National Academy of Sciences of the United States of America, 2011, 108( 47): 18949–18953

[9]

Zaidan O F, Callison-Burch C. Crowdsourcing translation: professional quality from non-professionals. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011, 1220−1229

[10]

Murphy M J, Miller C D, Lasecki W S, Bigham J P. Adaptive time windows for real-time crowd captioning. In: Proceedings of CHI ’13 Extended Abstracts on Human Factors in Computing Systems. 2013, 13−18

[11]

Dawid A P, Skene A M . Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 1979, 28( 1): 20–28

[12]

Kurve A, Miller D J, Kesidis G . Multicategory crowdsourcing accounting for variable task difficulty, worker skill, and worker intention. IEEE Transactions on Knowledge and Data Engineering, 2015, 27( 3): 794–809

[13]

Zhao Z, Wei F, Zhou M, Chen W, Ng W S H. Crowd-selection query processing in crowdsourcing databases: a task-driven approach. In: Proceedings of the 18th International Conference on Extending Database Technology. 2015, 397−408

[14]

Lof C, El Maarry K, Balke W T. Skyline queries in crowd-enabled databases. In: Proceedings of the 16th International Conference on Extending Database Technology. 2013, 465−476

[15]

Chen X, Lin Q, Zhou D. Optimistic knowledge gradient policy for optimal budget allocation in crowdsourcing. In: Proceedings of the 30th International Conference on International Conference on Machine Learning. 2013, III-64−III-72

[16]

Fan J, Lu M, Ooi B C, Tan W C, Zhang M. A hybrid machine-crowdsourcing system for matching web tables. In: Proceedings of the 30th IEEE International Conference on Data Engineering. 2014, 976−987

[17]

Yousefnezhad M, Huang S J, Zhang D . WoCE: a framework for clustering ensemble by exploiting the wisdom of crowds theory. IEEE Transactions on Cybernetics, 2018, 48( 2): 486–499

[18]

Zhang J . Knowledge learning with crowdsourcing: a brief review and systematic perspective. IEEE/CAA Journal of Automatica Sinica, 2022, 9( 5): 749–762

[19]

Jiang L, Zhang H, Tao F, Li C . Learning from crowds with multiple noisy label distribution propagation. IEEE Transactions on Neural Networks and Learning Systems, 2021, 1–11

[20]

Tao F, Jiang L, Li C . Differential evolution-based weighted soft majority voting for crowdsourcing. Engineering Applications of Artificial Intelligence, 2021, 106: 104474

[21]

Chittilappilly A I, Chen L, Amer-Yahia S . A survey of general-purpose crowdsourcing techniques. IEEE Transactions on Knowledge and Data Engineering, 2016, 28( 9): 2246–2266

[22]

Zhang J, Wu X, Sheng V S . Learning from crowdsourced labeled data: a survey. Artificial Intelligence Review, 2016, 46( 4): 543–576

[23]

Yan Y, Rosales R, Fung G, Dy J G. Active learning from crowds. In: Proceedings of the 28th International Conference on Machine Learning. 2011, 1161−1168

[24]

Mozafari B, Sarkar P, Franklin M J, Jordan M I, Madden S. Active learning for crowd-sourced databases. 2014, arXiv preprint arXiv: 1209.3686

[25]

Raykar V C, Yu S, Zhao L H, Valadez G H, Florin C, Bogoni L, Moy L . Learning from crowds. Journal of Machine Learning Research, 2010, 11: 1297–1322

[26]

Zhang J, Wu X, Shengs V S . Active learning with imbalanced multiple noisy labeling. IEEE Transactions on Cybernetics, 2015, 45( 5): 1095–1107

[27]

Bachrach Y, Minka T, Guiver J, Graepel T. How to grade a test without knowing the answers: a Bayesian graphical model for adaptive crowdsourcing and aptitude testing. In: Proceedings of the 29th International Conference on Machine Learning. 2012, 819−826

[28]

Ho C J, Jabbari S, Vaughan J W. Adaptive task assignment for crowdsourced classification. In: Proceedings of the 30th International Conference on Machine Learning. 2013, I-534−I-542

[29]

Buchbinder N, Naor J. Online primal-dual algorithms for covering and packing problems. In: Proceedings of 13th Annual European Symposium on Algorithms. 2005, 689−701

[30]

Long C, Hua G, Kapoor A. Active visual recognition with expertise estimation in crowdsourcing. In: Proceedings of 2013 IEEE International Conference on Computer Vision. 2013, 3000−3007

[31]

Donmez P, Carbonell J G, Schneider J. Efficiently learning the accuracy of labeling sources for selective sampling. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2009, 259−268

[32]

Zhao L, Zhang Y, Sukthankar G. An active learning approach for jointly estimating worker performance and annotation reliability with crowdsourced data. 2014, arXiv preprint arXiv: 1401.3836

[33]

Lewis D D, Gale W A. A sequential algorithm for training text classifiers. In: Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. 1994, 3−12

[34]

Cochran W G. Sampling Techniques. 3rd ed. Hoboken: John Wiley & Sons, 1977

[35]

Salton G, Buckley C. Improving retrieval performance by relevance feedback. In: Jones K S, Willett P, eds. Readings in Information Retrieval. San Francisco: Morgan Kaufmann, 1997, 355−364

[36]

Dagan I, Engelson S P. Committee-based sampling for training probabilistic classifiers. In: Proceedings of the 12th International Conference on Machine Learning. 1995, 150−157

[37]

Zhu J, Wang H, Hovy E, Ma M . Confidence-based stopping criteria for active learning for data annotation. ACM Transactions on Speech and Language Processing, 2010, 6( 3): 3

[38]

Dua D, Graff C. UCI machine learning repository. See Archive.ics.uci.eduml website, 2017

[39]

Karger D R, Oh S, Shah D. Iterative learning for reliable crowdsourcing systems. In: Proceedings of the 24th International Conference on Neural Information Processing Systems. 2011, 1953−1961

[40]

Kajino H, Tsuboi Y, Kashima H. Clustering crowds. In: Proceedings of the 27th AAAI Conference on Artificial Intelligence. 2013, 1120−1127

[41]

Yin L A, Han J H, Zhang W N, Yu Y. Aggregating crowd wisdoms with label-aware autoencoders. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017, 1325−1331

[42]

Liu Y F, Zhang W N, Yu Y. Aggregating crowd wisdom with side information via a clustering-based label-aware autoencoder. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence. 2021, 214

[43]

Simpson E, Roberts S, Psorakis I, Smith A. Dynamic Bayesian combination of multiple imperfect classifiers. In: Guy T V, Karny M, Wolpert D, eds. Decision Making and Imperfection. Berlin: Springer, 2013, 1−35

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (15012KB)

Supplementary files

FCS-22245-OF-JL_suppl_1

1819

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/