Diversification on big data in query processing

Meifan ZHANG, Hongzhi WANG, Jianzhong LI, Hong GAO

PDF(647 KB)
PDF(647 KB)
Front. Comput. Sci. ›› 2020, Vol. 14 ›› Issue (4) : 144607. DOI: 10.1007/s11704-019-8324-9
RESEARCH ARTICLE

Diversification on big data in query processing

Author information +
History +

Abstract

Recently, in the area of big data, some popular applications such as web search engines and recommendation systems, face the problem to diversify results during query processing. In this sense, it is both significant and essential to propose methods to deal with big data in order to increase the diversity of the result set. In this paper, we firstly define the diversity of a set and the ability of an element to improve the overall diversity. Based on these definitions, we propose a diversification framework which has good performance in terms of effectiveness and efficiency. Also, this framework has theoretical guarantee on probability of success. Secondly, we design implementation algorithms based on this framework for both numerical and string data. Thirdly, for numerical and string data respectively, we carry out extensive experiments on real data to verify the performance of our proposed framework, and also perform scalability experiments on synthetic data.

Keywords

diversification / query processing / big data

Cite this article

Download citation ▾
Meifan ZHANG, Hongzhi WANG, Jianzhong LI, Hong GAO. Diversification on big data in query processing. Front. Comput. Sci., 2020, 14(4): 144607 https://doi.org/10.1007/s11704-019-8324-9

References

[1]
Drosou M, Pitoura E. Search result diversification. Special Interest Group on Management of Data Record, 2010, 39(1): 41–47
CrossRef Google scholar
[2]
Drosou M, Jagadish H V, Pitoura E, Stoyanovich J. Diversity in big data: a review. Big Data, 2017, 5(2): 73
CrossRef Google scholar
[3]
Angel A, Koudas N. Efficient diversity-aware search. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2011, 781–792
CrossRef Google scholar
[4]
Vieira M R, Razente H L, Barioni M C, Hadjieleftheriou M, Srivastava D, Jr C T, Tsotras V J. On query result diversification. In: Proceedings of International Conference on Data Engineering. 2011, 1163–1174
CrossRef Google scholar
[5]
Agrawal R, Gollapudi S, Halverson A, Ieong S. Diversifying search results. In: Proceedings of the 2nd International Conference on Web Search and Web Data Mining. 2009, 5–14
CrossRef Google scholar
[6]
Ashkan A, Kveton B, Berkovsky S, Wen Z. Optimal greedy diversity for recommendation. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence. 2015, 1742–1748
[7]
Gollapudi S, Sharma A. An axiomatic approach for result diversification. In: Proceedings of the 18th International Conference on World Wide Web. 2009, 381–390
CrossRef Google scholar
[8]
Zhang M, Hurley N. Avoiding monotony: improving the diversity of recommendation lists. In: Proceedings of ACM Conference on Recommender Systems. 2008, 123–130
CrossRef Google scholar
[9]
Liu K, Terzi E, Grandison T. Highlighting diverse concepts in documents. In: Proceedings of the SIAM International Conference on Data Mining. 2009, 545–556
CrossRef Google scholar
[10]
Sarma A D, Gollapudi S, Ieong S. Bypass rates: reducing query abandonment using negative inferences. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 177–185
[11]
Wu T, Chen L, Hui P, Zhang C J, Li W. Hear the whole story: towards the diversity of opinion in crowdsourcing markets. Proceedings of the VLDB Endowment, 2015, 8(5): 485–496
CrossRef Google scholar
[12]
Clarke C L, Kolla M, Cormack G V, Vechtomova O, Ashkan A, Buttcher S, MacKinnon I. Novelty and diversity in information retrieval evaluation. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2008, 659–666
CrossRef Google scholar
[13]
Zhang Y, Callan J P, Minka T P. Novelty and redundancy detection in adaptive filtering. In: Proceedings of the 25th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval. 2002, 81–88
CrossRef Google scholar
[14]
Santos R L, Macdonald C, Ounis I. Exploiting query reformulations for web search result diversification. In: Proceedings of the 19th International Conference on World Wide Web. 2010, 881–890
CrossRef Google scholar
[15]
Ozdemiray A M, Altingovde I S. Explicit search result diversification using score and rank aggregation methods. Journal of the Association for Information Science and Technology, 2015, 66(6): 1212–1228
CrossRef Google scholar
[16]
Carbinell J, Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. Special Interest Group on Information Retrieval Forum, 2017, 51(2): 209–210
CrossRef Google scholar
[17]
Capannini G, Nardini F M, Perego R, Silvestri F. Efficient diversification of web search results. Proceedings of the VLDB Endowment, 2011, 4(7): 451–459
CrossRef Google scholar
[18]
Ziegler C, Mcnee S M, Konstan J A, Lausen G.Improving recommendation lists through topic diversification. In: Proceedings of the 14th International Conference on World Wide Web. 2005, 22–32
CrossRef Google scholar
[19]
Radlinski F, Dumais S T. Improving personalized web search using result diversification. In: Proceedings of the 29th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval. 2006, 691–692
CrossRef Google scholar
[20]
Yu C, Lakshmanan L V, Ameryahia S.It takes variety to make a world:diversification in recommender systems. In: Proceedings of the 12thInternational Conference on Extending Database Technology. 2009, 368–378
CrossRef Google scholar
[21]
Vee E, Srivastava U, Shanmugasundaram J, Bhat P, Yahia S A. Efficient computation of diverse query results. In: Proceedings of the 24th International Conference on Data Engineering. 2008, 228–236
CrossRef Google scholar
[22]
Drosou M, Pitoura E. Diverse set selection over dynamic data. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(5): 1102–1116
CrossRef Google scholar
[23]
Zhu Y, Lan Y, Guo J, Cheng X, Niu S. Learning for search result diversification. In: Proceedings of the 27th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2014, 293–302
CrossRef Google scholar
[24]
Xia L, Xu J, Lan Y, Guo J, Cheng X. Learning maximal marginal relevance model via directly optimizing diversity evaluation measures. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015, 113–122
CrossRef Google scholar
[25]
Xu J, Xia L, Lan Y, Guo J, Cheng X. Directly optimize diversity evaluation measures: a new approach to search result diversification. ACM Transactions on Intelligent Systems and Technology, 2017, 8(3): 41
CrossRef Google scholar
[26]
Xia L, Xu J, Lan Y, Guo J, Cheng X. Modeling document novelty with neural tensor network for search result diversification. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2016, 395–404
CrossRef Google scholar
[27]
Erkut E, Ülküsal Y, Yeniçerioglu O. A comparison of p-dispersion heuristics. Computers & Operations Research, 1994, 21(10): 1103–1113
CrossRef Google scholar
[28]
Baryossef Z, Jayram T S, Kumar R, Sivakumar D, Trevisan L. Counting distinct elements in a data stream. In: Proceedings of International Workshop on Randomization and Approximation Techniques in Computer Science. 2002, 1–10
CrossRef Google scholar
[29]
Cormen T H, Leiserson C E, Rivest RL, Stein C. Introduction to Algorithms. 2nd ed. Cambridge: The MIT Press and McGraw-Hill Book Company, 2001
[30]
Mitzenmacher M, Upfal E. Probability and Computing- Randomized Algorithms and Probabilistic Analysis. Cambridge: Cambridge University Press, 2005
CrossRef Google scholar
[31]
Hadjieleftheriou M, Li C. Efficient approximate search on string collections. Proceedings of the VLDB Endowment, 2009, 2(2): 1660–1661
CrossRef Google scholar

RIGHTS & PERMISSIONS

2020 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature
AI Summary AI Mindmap
PDF(647 KB)

Accesses

Citations

Detail

Sections
Recommended

/