Find truth in the hands of the few: acquiring specific knowledge with crowdsourcing

Tao HAN; Hailong SUN; Yangqiu SONG; Yili FANG; Xudong LIU

doi:10.1007/s11704-020-9364-x

PDF(690 KB)

Front. Comput. Sci. ›› 2021, Vol. 15 ›› Issue (4) : 154315. DOI: 10.1007/s11704-020-9364-x

RESEARCH ARTICLE

Find truth in the hands of the few: acquiring specific knowledge with crowdsourcing

Tao HAN¹^,² ,
Hailong SUN¹^,² ,
Yangqiu SONG³ ,
Yili FANG⁴ ,
Xudong LIU¹^,²

Author information +

History +

Abstract

Crowdsourcing has been a helpful mechanism to leverage human intelligence to acquire useful knowledge.However, when we aggregate the crowd knowledge based on the currently developed voting algorithms, it often results in common knowledge that may not be expected. In this paper, we consider the problem of collecting specific knowledge via crowdsourcing. With the help of using external knowledge base such as WordNet, we incorporate the semantic relations between the alternative answers into a probabilisticmodel to determine which answer is more specific. We formulate the probabilistic model considering both worker’s ability and task’s difficulty from the basic assumption, and solve it by the expectation-maximization (EM) algorithm. To increase algorithm compatibility, we also refine our method into semi-supervised one. Experimental results show that our approach is robust with hyper-parameters and achieves better improvement thanmajority voting and other algorithms when more specific answers are expected, especially for sparse data.

Keywords

crowdsourcing / knowledge acquisition / EM algorithm / label aggregation

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Tao HAN, Hailong SUN, Yangqiu SONG, Yili FANG, Xudong LIU. Find truth in the hands of the few: acquiring specific knowledge with crowdsourcing. Front. Comput. Sci., 2021, 15(4): 154315 https://doi.org/10.1007/s11704-020-9364-x

This is a preview of subscription content, contact us for subscripton.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Howe J. The rise of crowdsourcing. Wired Magazine, 2006, 14(6): 1–4

[2]	Wang J, Li G, Kraska T, Franklin M J, Feng J. Leveraging transitive relations for crowdsourced joins. In: Proceedings of ACM Conference on Management of Data. 2013, 229–240 CrossRef Google scholar

[3]	Russell B C, Torralba A, Murphy K P, Freeman W T. Labelme: a database and Web-based tool for image annotation. International Journal of Computer Vision, 2008, 77(1–3): 157–173 CrossRef Google scholar

[4]	Hwang K, Lee S Y. Environmental audio scene and activity recognition through mobile-based crowdsourcing. IEEE Transactions on Consumer Electronics, 2012, 58(2): 700–705 CrossRef Google scholar

[5]	Vondrick C, Patterson D, Ramanan D. Efficiently scaling up crowdsourced video annotation. International Journal of Computer Vision, 2013, 101(1): 184–204 CrossRef Google scholar

[6]	Waggoner B, Chen Y. Output agreement mechanisms and common knowledge. In: Proceedings of the 2nd AAAI Conference on Human Computation and Crowdsourcing. 2014

[7]	Ordonez V, Deng J, Choi Y, Berg A C, Berg T. From large scale image categorization to entry-level categories. In: Proceedings of IEEE International Conference on Computer Vision. 2013, 2768–2775 CrossRef Google scholar

[8]	Feng S, Ravi S, Kumar R, Kuznetsova P, Liu W, Berg A C, Berg T L, Choi Y. Refer-to-as relations as semantic knowledge. In: Proceedings of International Conference on Automated Planning and Scheduling. 2015

[9]	Dawid A P, Skene A M. Maximum likelihood estimation of observer error-rates using the em algorithm. Applied Statistics, 1979, 28(1): 20–28 CrossRef Google scholar

[10]	Whitehill J, Wu T f, Bergsma J, Movellan J R, Ruvolo P L. Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Proceedings of Annual Conference on Neural Information Processing Systems. 2009, 2035–2043

[11]	Salek M, Bachrach Y, Key P. Hotspotting-a probabilistic graphical model for image object localization through crowdsourcing. In: Proceedings of International Conference on Automated Planning and Scheduling. 2013

[12]	Bachrach Y, Minka T, Guiver J, Graepel T. How to grade a test without knowing the answers—a bayesian graphical model for adaptive crowdsourcing and aptitude testing. In: Proceedings of the 29th International Conference on Machine Learning. 2012, 819–826

[13]	Raykar V C, Yu S, Zhao L H, Valadez G H, Florin C, Bogoni L, Moy L. Learning from crowds. Journal of Machine Learning Research, 2010, 11(43): 1297–1322

[14]	Demartini G, Difallah D E, Cudré-Mauroux P. Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st International Conference on World Wide Web. 2012, 469–478 CrossRef Google scholar

[15]	Zhou D, Basu S, Mao Y, Platt J C. Learning from the wisdom of crowds= by minimax entropy. In: Proceedings of Annual Conference on Neural Information Processing Systems. 2012, 2195–2203

[16]	Han T, Sun H, Song Y, Fang Y, Liu X. Incorporating external knowledge into crowd intelligence for more specific knowledge acquisition. In: Proceedings of International Joint Conference on Artificial Intelligence. 2016, 1541–1547

[17]	Chilton L B, Little G, Edge D, Weld D S, Landay J A. Cascade: crowdsourcing taxonomy creation. In: Proceedings of SIGCHI Conference on Human Factors in Computing Systems. 2013, 1999–2008 CrossRef Google scholar

[18]	Bragg J, Weld D S. Crowdsourcing multi-label classification for taxonomy creation. In: Proceedings of the 1st AAAI Conference on Human Computation and Crowdsourcing. 2013

[19]	Sun Y, Singla A, Fox D, Krause A. Building hierarchies of concepts via crowdsourcing. In: Proceedings of International Joint Conference on Artificial Intelligence. 2015, 844–851

[20]	Fellbaum C. WordNet: An Electronic Lexical Database. MIT Press, 1998 CrossRef Google scholar

[21]	Lenat D B, Guha R V. Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project. Addison-Wesley, 1989

[22]	Speer R, Havasi C. Representing general relational knowledge in conceptnet 5. In: Proceedings of Language Resources and Evaluation Conference. 2012, 3679–3686

[23]	Wu W, Li H, Wang H, Zhu K Q. Probase: a probabilistic taxonomy for text understanding. In: Proceedings of ACM Conference on Management of Data. 2012, 481–492 CrossRef Google scholar

[24]	Prelec D, Seung H S, McCoy J. A solution to the single-question crowd wisdom problem. Nature, 2017, 541(7638): 532–535 CrossRef Google scholar

[25]	Divvala S K, Farhadi A, Guestrin C. Learning everything about anything: webly-supervised visual concept learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2014, 3270–3277 CrossRef Google scholar

[26]	Sheng V S, Provost F, Ipeirotis P G. Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 614–622 CrossRef Google scholar

[27]	Ipeirotis P G, Provost F, Wang J. Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation. 2010, 64–67 CrossRef Google scholar

[28]	Han T, Sun H, Song Y, Wang Z, Liu X. Budgeted task scheduling for crowdsourced knowledge acquisition. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 2017, 1059–1068 CrossRef Google scholar

[29]	Callison-Burch C. Fast, cheap, and creative: evaluating translation quality using amazon’s mechanical turk. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 2009, 286–295 CrossRef Google scholar

[30]	Hu C, Bederson B B, Resnik P. Translation by iterative collaboration between monolingual users. In: Proceedings of Graphics Interface 2010. 2010, 39–46 CrossRef Google scholar

[31]	Ambati V, Vogel S, Carbonell J. Active learning and crowd-sourcing for machine translation. In: Proceedings of the 7th International Conference on Language Resources and Evaluation. 2010

[32]	Dong X L, Gabrilovich E, Heitz G, Horn W, Murphy K, Sun S, Zhang W. From data fusion to knowledge fusion. Proceedings of the VLDB Endowment, 2014, 7(10): 881–892 CrossRef Google scholar

[33]	Ma F, Li Y, Li Q, Qiu M, Gao J, Zhi S, Su L, Zhao B, Ji H, Han J. Faitcrowd: fine grained truth discovery for crowdsourced data aggregation. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015, 745–754 CrossRef Google scholar

[34]	Fang Y, Sun H, Chen P, Huai J. On the cost complexity of crowdsourcing. In: Proceedings of International Joint Conference on Artificial Intelligence. 2018, 1531–1537 CrossRef Google scholar

[35]	Luengo-Oroz M A, Arranz A, Frean J. Crowdsourcing malaria parasite quantification: an online game for analyzing images of infected thick blood smears. Journal of Medical Internet Research, 2012, 14(6): e167 CrossRef Google scholar

[36]	Kalman R E. A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 1960, 82(1): 35–45 CrossRef Google scholar

[37]	Sun H, Hu K, Fang Y, Song Y. Adaptive result inference for collecting quantitative data with crowdsourcing. IEEE Internet of Things Journal, 2017, 4(5): 1389–1398 CrossRef Google scholar

[38]	Dai P, Lin C H, Weld D S. Pomdp-based control of workflows for crowdsourcing. Artificial Intelligence, 2013, 202: 52–85 CrossRef Google scholar

[39]	Dai P, Weld D S. Artificial intelligence for artificial artificial intelligence. In: Proceedings of the 25th AAAI Conference on Artificial Intelligence. 2011

[40]	Fang Y, Sun H, Li G, Zhang R, Huai J. Context-aware result inference in crowdsourcing. Information Sciences, 2018, 460: 346–363 CrossRef Google scholar

[41]	Otani N, Baba Y, Kashima H. Quality control of crowdsourced classification using hierarchical class structures. Expert Systems with Applications, 2016, 58: 155–163 CrossRef Google scholar

[42]	Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L. Imagenet: a largescale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2009, 248–255 CrossRef Google scholar