Find truth in the hands of the few: acquiring specific knowledge with crowdsourcing
Tao HAN, Hailong SUN, Yangqiu SONG, Yili FANG, Xudong LIU
Find truth in the hands of the few: acquiring specific knowledge with crowdsourcing
Crowdsourcing has been a helpful mechanism to leverage human intelligence to acquire useful knowledge.However, when we aggregate the crowd knowledge based on the currently developed voting algorithms, it often results in common knowledge that may not be expected. In this paper, we consider the problem of collecting specific knowledge via crowdsourcing. With the help of using external knowledge base such as WordNet, we incorporate the semantic relations between the alternative answers into a probabilisticmodel to determine which answer is more specific. We formulate the probabilistic model considering both worker’s ability and task’s difficulty from the basic assumption, and solve it by the expectation-maximization (EM) algorithm. To increase algorithm compatibility, we also refine our method into semi-supervised one. Experimental results show that our approach is robust with hyper-parameters and achieves better improvement thanmajority voting and other algorithms when more specific answers are expected, especially for sparse data.
crowdsourcing / knowledge acquisition / EM algorithm / label aggregation
[1] |
Howe J. The rise of crowdsourcing. Wired Magazine, 2006, 14(6): 1–4
|
[2] |
Wang J, Li G, Kraska T, Franklin M J, Feng J. Leveraging transitive relations for crowdsourced joins. In: Proceedings of ACM Conference on Management of Data. 2013, 229–240
CrossRef
Google scholar
|
[3] |
Russell B C, Torralba A, Murphy K P, Freeman W T. Labelme: a database and Web-based tool for image annotation. International Journal of Computer Vision, 2008, 77(1–3): 157–173
CrossRef
Google scholar
|
[4] |
Hwang K, Lee S Y. Environmental audio scene and activity recognition through mobile-based crowdsourcing. IEEE Transactions on Consumer Electronics, 2012, 58(2): 700–705
CrossRef
Google scholar
|
[5] |
Vondrick C, Patterson D, Ramanan D. Efficiently scaling up crowdsourced video annotation. International Journal of Computer Vision, 2013, 101(1): 184–204
CrossRef
Google scholar
|
[6] |
Waggoner B, Chen Y. Output agreement mechanisms and common knowledge. In: Proceedings of the 2nd AAAI Conference on Human Computation and Crowdsourcing. 2014
|
[7] |
Ordonez V, Deng J, Choi Y, Berg A C, Berg T. From large scale image categorization to entry-level categories. In: Proceedings of IEEE International Conference on Computer Vision. 2013, 2768–2775
CrossRef
Google scholar
|
[8] |
Feng S, Ravi S, Kumar R, Kuznetsova P, Liu W, Berg A C, Berg T L, Choi Y. Refer-to-as relations as semantic knowledge. In: Proceedings of International Conference on Automated Planning and Scheduling. 2015
|
[9] |
Dawid A P, Skene A M. Maximum likelihood estimation of observer error-rates using the em algorithm. Applied Statistics, 1979, 28(1): 20–28
CrossRef
Google scholar
|
[10] |
Whitehill J, Wu T f, Bergsma J, Movellan J R, Ruvolo P L. Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Proceedings of Annual Conference on Neural Information Processing Systems. 2009, 2035–2043
|
[11] |
Salek M, Bachrach Y, Key P. Hotspotting-a probabilistic graphical model for image object localization through crowdsourcing. In: Proceedings of International Conference on Automated Planning and Scheduling. 2013
|
[12] |
Bachrach Y, Minka T, Guiver J, Graepel T. How to grade a test without knowing the answers—a bayesian graphical model for adaptive crowdsourcing and aptitude testing. In: Proceedings of the 29th International Conference on Machine Learning. 2012, 819–826
|
[13] |
Raykar V C, Yu S, Zhao L H, Valadez G H, Florin C, Bogoni L, Moy L. Learning from crowds. Journal of Machine Learning Research, 2010, 11(43): 1297–1322
|
[14] |
Demartini G, Difallah D E, Cudré-Mauroux P. Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st International Conference on World Wide Web. 2012, 469–478
CrossRef
Google scholar
|
[15] |
Zhou D, Basu S, Mao Y, Platt J C. Learning from the wisdom of crowds= by minimax entropy. In: Proceedings of Annual Conference on Neural Information Processing Systems. 2012, 2195–2203
|
[16] |
Han T, Sun H, Song Y, Fang Y, Liu X. Incorporating external knowledge into crowd intelligence for more specific knowledge acquisition. In: Proceedings of International Joint Conference on Artificial Intelligence. 2016, 1541–1547
|
[17] |
Chilton L B, Little G, Edge D, Weld D S, Landay J A. Cascade: crowdsourcing taxonomy creation. In: Proceedings of SIGCHI Conference on Human Factors in Computing Systems. 2013, 1999–2008
CrossRef
Google scholar
|
[18] |
Bragg J, Weld D S. Crowdsourcing multi-label classification for taxonomy creation. In: Proceedings of the 1st AAAI Conference on Human Computation and Crowdsourcing. 2013
|
[19] |
Sun Y, Singla A, Fox D, Krause A. Building hierarchies of concepts via crowdsourcing. In: Proceedings of International Joint Conference on Artificial Intelligence. 2015, 844–851
|
[20] |
Fellbaum C. WordNet: An Electronic Lexical Database. MIT Press, 1998
CrossRef
Google scholar
|
[21] |
Lenat D B, Guha R V. Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project. Addison-Wesley, 1989
|
[22] |
Speer R, Havasi C. Representing general relational knowledge in conceptnet 5. In: Proceedings of Language Resources and Evaluation Conference. 2012, 3679–3686
|
[23] |
Wu W, Li H, Wang H, Zhu K Q. Probase: a probabilistic taxonomy for text understanding. In: Proceedings of ACM Conference on Management of Data. 2012, 481–492
CrossRef
Google scholar
|
[24] |
Prelec D, Seung H S, McCoy J. A solution to the single-question crowd wisdom problem. Nature, 2017, 541(7638): 532–535
CrossRef
Google scholar
|
[25] |
Divvala S K, Farhadi A, Guestrin C. Learning everything about anything: webly-supervised visual concept learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2014, 3270–3277
CrossRef
Google scholar
|
[26] |
Sheng V S, Provost F, Ipeirotis P G. Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 614–622
CrossRef
Google scholar
|
[27] |
Ipeirotis P G, Provost F, Wang J. Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation. 2010, 64–67
CrossRef
Google scholar
|
[28] |
Han T, Sun H, Song Y, Wang Z, Liu X. Budgeted task scheduling for crowdsourced knowledge acquisition. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 2017, 1059–1068
CrossRef
Google scholar
|
[29] |
Callison-Burch C. Fast, cheap, and creative: evaluating translation quality using amazon’s mechanical turk. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 2009, 286–295
CrossRef
Google scholar
|
[30] |
Hu C, Bederson B B, Resnik P. Translation by iterative collaboration between monolingual users. In: Proceedings of Graphics Interface 2010. 2010, 39–46
CrossRef
Google scholar
|
[31] |
Ambati V, Vogel S, Carbonell J. Active learning and crowd-sourcing for machine translation. In: Proceedings of the 7th International Conference on Language Resources and Evaluation. 2010
|
[32] |
Dong X L, Gabrilovich E, Heitz G, Horn W, Murphy K, Sun S, Zhang W. From data fusion to knowledge fusion. Proceedings of the VLDB Endowment, 2014, 7(10): 881–892
CrossRef
Google scholar
|
[33] |
Ma F, Li Y, Li Q, Qiu M, Gao J, Zhi S, Su L, Zhao B, Ji H, Han J. Faitcrowd: fine grained truth discovery for crowdsourced data aggregation. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015, 745–754
CrossRef
Google scholar
|
[34] |
Fang Y, Sun H, Chen P, Huai J. On the cost complexity of crowdsourcing. In: Proceedings of International Joint Conference on Artificial Intelligence. 2018, 1531–1537
CrossRef
Google scholar
|
[35] |
Luengo-Oroz M A, Arranz A, Frean J. Crowdsourcing malaria parasite quantification: an online game for analyzing images of infected thick blood smears. Journal of Medical Internet Research, 2012, 14(6): e167
CrossRef
Google scholar
|
[36] |
Kalman R E. A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 1960, 82(1): 35–45
CrossRef
Google scholar
|
[37] |
Sun H, Hu K, Fang Y, Song Y. Adaptive result inference for collecting quantitative data with crowdsourcing. IEEE Internet of Things Journal, 2017, 4(5): 1389–1398
CrossRef
Google scholar
|
[38] |
Dai P, Lin C H, Weld D S. Pomdp-based control of workflows for crowdsourcing. Artificial Intelligence, 2013, 202: 52–85
CrossRef
Google scholar
|
[39] |
Dai P, Weld D S. Artificial intelligence for artificial artificial intelligence. In: Proceedings of the 25th AAAI Conference on Artificial Intelligence. 2011
|
[40] |
Fang Y, Sun H, Li G, Zhang R, Huai J. Context-aware result inference in crowdsourcing. Information Sciences, 2018, 460: 346–363
CrossRef
Google scholar
|
[41] |
Otani N, Baba Y, Kashima H. Quality control of crowdsourced classification using hierarchical class structures. Expert Systems with Applications, 2016, 58: 155–163
CrossRef
Google scholar
|
[42] |
Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L. Imagenet: a largescale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2009, 248–255
CrossRef
Google scholar
|
/
〈 | 〉 |