Budgeted spatial data acquisition: when coverage and connectivity matter

Wenzhe YANG , Shixun HUANG , Sheng WANG , Zhiyong PENG

Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (3) : 2003612

PDF (4305KB)
Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (3) : 2003612 DOI: 10.1007/s11704-025-41427-9
Information Systems
RESEARCH ARTICLE

Budgeted spatial data acquisition: when coverage and connectivity matter

Author information +
History +
PDF (4305KB)

Abstract

Data is undoubtedly becoming a commodity like oil, land, and labour in the 21st century. Although there have been many successful marketplaces for data trading, the existing data marketplaces lack consideration of the case where buyers want to acquire a collection of datasets (instead of one), and the overall spatial coverage and connectivity matter. In this paper, we make the first attempt to formulate this problem as Budgeted Maximum Coverage with Connectivity Constraint (BMCC), which aims to acquire a dataset collection with the maximum spatial coverage under a limited budget while maintaining spatial connectivity. To solve the problem, we propose two approximate algorithms with detailed theoretical guarantees and time complexity analysis, followed by two acceleration strategies to further improve the efficiency of the algorithm. Experiments are conducted on five real-world spatial dataset collections to verify the efficiency and effectiveness of our algorithms.

Graphical abstract

Keywords

spatial data acquisition / data marketplace / budget / spatial coverage / spatial connectivity

Cite this article

Download citation ▾
Wenzhe YANG, Shixun HUANG, Sheng WANG, Zhiyong PENG. Budgeted spatial data acquisition: when coverage and connectivity matter. Front. Comput. Sci., 2026, 20(3): 2003612 DOI:10.1007/s11704-025-41427-9

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Chapman A, Simperl E, Koesten L, Konstantinidis G, Ibáñez L D, Kacprzak E, Groth P . Dataset search: a survey. The VLDB Journal, 2020, 29( 1): 251–272

[2]

Fernandez R C, Subramaniam P, Franklin M J . Data market platforms: Trading data assets to solve data problems. Proceedings of the VLDB Endowment, 2020, 13( 12): 1933–1947

[3]

Ionescu A, Alexandridou A, Ikonomou L, Psarakis K, Patroumpas K, Chatzigeorgakidis G, Skoutas D, Athanasiou S, Hai R, Katsifodimos A. Topio marketplace: Search and discovery of geospatial data. In: Proceedings of the 26th International Conference on Extending Database Technology. 2023, 819−822

[4]

Wang S, Sun Y, Musco C, Bao Z. Public transport planning: When transit network connectivity meets commuting demand. In: Proceedings of 2021 International Conference on Management of Data. 2021, 1906−1919

[5]

Wang S, Bao Z, Culpepper J S, Sellis T, Cong G . Reverse k nearest neighbor search over trajectories. IEEE Transactions on Knowledge and Data Engineering, 2018, 30( 4): 757–771

[6]

Wang J, Cheng P, Zheng L, Feng C, Chen L, Lin X, Wang Z . Demand-aware route planning for shared mobility services. Proceedings of the VLDB Endowment, 2020, 13( 7): 979–991

[7]

Ali M E, Eusuf S S, Abdullah K, Choudhury F M, Culpepper J S, Sellis T . The maximum trajectory coverage query in spatial databases. Proceedings of the VLDB Endowment, 2018, 12( 3): 197–209

[8]

He D, Zhou T, Zhou X, Kim J . An efficient algorithm for maximum trajectory coverage query with approximation guarantee. IEEE Transactions on Intelligent Transportation Systems, 2022, 23( 12): 24031–24043

[9]

Hochbaum D S, Pathria A . Analysis of the greedy approach in problems of maximum k-coverage. Naval Research Logistics, 1998, 45( 6): 615–627

[10]

Feige U . A threshold of ln n for approximating set cover. Journal of the ACM, 1998, 45( 4): 634–652

[11]

Schomm F, Stahl F, Vossen G . Marketplaces for data: an initial survey. ACM SIGMOD Record, 2013, 42( 1): 15–26

[12]

Balazinska M, Howe B, Suciu D . Data markets in the cloud: An opportunity for the database community. Proceedings of the VLDB Endowment, 2011, 4( 12): 1482–1485

[13]

Li Y, Sun H, Dong B, Wang H . Cost-efficient data acquisition on online data marketplaces for correlation analysis. Proceedings of the VLDB Endowment, 2018, 12( 4): 362–375

[14]

Qiao S, Jindal A . PikePlace: Generating intelligence for marketplace datasets. Proceedings of the VLDB Endowment, 2023, 16( 12): 4106–4109

[15]

Asudeh A, Nargesian F . Towards distribution-aware query answering in data markets. Proceedings of the VLDB Endowment, 2022, 15( 11): 3137–3144

[16]

Kanza Y, Samet H. An online marketplace for geosocial data. In: Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems. 2015, 10

[17]

Schrijver A. Combinatorial Optimization: Polyhedra and Efficiency. Berlin: Springer, 2003

[18]

Papadimitriou C H, Steiglitz K. Combinatorial Optimization: Algorithms and Complexity. North Chelmsford: Courier Corporation, 1998

[19]

Agarwal A, Dahleh M, Sarkar T. A marketplace for data: an algorithmic solution. In: Proceedings of 2019 ACM Conference on Economics and Computation. 2019, 701−726

[20]

Moor D. Data markets with dynamic arrival of buyers and sellers. In: Proceedings of the 14th Workshop on the Economics of Networks, Systems and Computation. 2019, 3

[21]

Niu C, Zheng Z, Wu F, Tang S, Chen G . Online pricing with reserve price constraint for personal data markets. IEEE Transactions on Knowledge and Data Engineering, 2022, 34( 4): 1928–1943

[22]

Koutris P, Upadhyaya P, Balazinska M, Howe B, Suciu D. Toward practical query pricing with QueryMarket. In: Proceedings of 2013 ACM SIGMOD International Conference on Management of Data. 2013, 613−624

[23]

Zhang M, Beltrán F, Liu J . A survey of data pricing for data marketplaces. IEEE Transactions on Big Data, 2023, 9( 4): 1038–1056

[24]

Chawla S, Deep S, Koutrisw P, Teng Y . Revenue maximization for query pricing. Proceedings of the VLDB Endowment, 2019, 13( 1): 1–14

[25]

Chen L, Koutris P, Kumar A. Towards model-based pricing for machine learning in a data marketplace. In: Proceedings of 2019 International Conference on Management of Data. 2019, 1535−1552

[26]

Lin B R, Kifer D . On arbitrage-free pricing for general data queries. Proceedings of the VLDB Endowment, 2014, 7( 9): 757–768

[27]

Li Y, Yu X, Koudas N . Data acquisition for improving machine learning models. Proceedings of the VLDB Endowment, 2021, 14( 10): 1832–1844

[28]

Sakr M . A data model and algorithms for a spatial data marketplace. International Journal of Geographical Information Science, 2018, 32( 11): 2140–2168

[29]

Azcoitia S A, Laoutaris N . A survey of data marketplaces and their business models. ACM SIGMOD Record, 2022, 51( 3): 18–29

[30]

Pei J, Fernandez R C, Yu X . Data and AI model markets: Opportunities for data and model sharing, discovery, and integration. Proceedings of the VLDB Endowment, 2023, 16( 12): 3872–3873

[31]

Chen C, Yuan Y, Wen Z, Wang G, Li A. GQP: a framework for scalable and effective graph query-based pricing. In: Proceedings of the 38th International Conference on Data Engineering. 2022, 1573−1585

[32]

Tong Y, Wang L, Zhou Z, Chen L, Du B, Ye J. Dynamic pricing in spatial crowdsourcing: A matching-based approach. In: Proceedings of 2018 International Conference on Management of Data. 2018, 773−788

[33]

Pei J. Data pricing -- from economics to data science. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020, 3553−3554

[34]

Pei J . A survey on data pricing: from economics to data science. IEEE Transactions on Knowledge and Data Engineering, 2022, 34( 10): 4586–4608

[35]

Koutris P, Upadhyaya P, Balazinska M, Howe B, Suciu D. Query-based data pricing. In: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems. 2012, 167−178

[36]

Deep S, Koutris P. QIRANA: A framework for scalable query pricing. In: Proceedings of 2017 ACM International Conference on Management of Data. 2017, 699−713

[37]

Liu J. Dealer: End-to-end data marketplace with model-based pricing. 2020, arXiv preprint arXiv: 2003.13103

[38]

Miao X, Gao Y, Chen L, Peng H, Yin J, Li Q . Towards query pricing on incomplete data. IEEE Transactions on Knowledge and Data Engineering, 2022, 34( 8): 4024–4036

[39]

Yu H, Zhang M . Data pricing strategy based on data quality. Computers & Industrial Engineering, 2017, 112: 1–10

[40]

Azcoitia S A, Iordanou C, Laoutaris N. Understanding the price of data in commercial data marketplaces. In: Proceedings of the 39th International Conference on Data Engineering. 2023, 3718−3728

[41]

Chen Y, Immorlica N, Lucier B, Syrgkanis V, Ziani J . Optimal data acquisition for statistical estimation. In: Proceedings of 2018 ACM Conference on Economics and Computation, 2018, 2018: 27–44

[42]

Huang S, Gan J, Bao Z, Lin W . Managing conflicting interests of stakeholders in influencer marketing. Proceedings of the ACM on Management of Data, 2023, 1( 1): 80

[43]

Li Y, Fan J, Wang Y, Tan K L . Influence maximization on social graphs: A survey. IEEE Transactions on Knowledge and Data Engineering, 2018, 30( 10): 1852–1872

[44]

Bian S, Guo Q, Wang S, Yu J X . Efficient algorithms for budgeted influence maximization on massive social networks. Proceedings of the VLDB Endowment, 2020, 13( 9): 1498–1510

[45]

Huang S, Lin W, Bao Z, Sun J . Influence maximization in real-world closed social networks. Proceedings of the VLDB Endowment, 2022, 16( 2): 180–192

[46]

Kempe D, Kleinberg J, Tardos É. Maximizing the spread of influence through a social network. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2003, 137−146

[47]

Tang Y, Xiao X, Shi Y. Influence maximization: near-optimal time complexity meets practical efficiency. In: Proceedings of 2014 ACM SIGMOD International Conference on Management of Data. 2014, 75−86

[48]

Huang K, Tang J, Han K, Xiao X, Chen W, Sun A, Tang X, Lim A . Efficient approximation algorithms for adaptive influence maximization. The VLDB Journal, 2020, 29( 6): 1385–1406

[49]

Li H, Bhowmick S S, Sun A, Cui J . Conformity-aware influence maximization in online social networks. The VLDB Journal, 2015, 24( 1): 117–141

[50]

Khuller S, Purohit M, Sarpatwar K K . Analyzing the optimal neighborhood: algorithms for partial and budgeted connected dominating set problems. SIAM Journal on Discrete Mathematics, 2020, 34( 1): 251–270

[51]

Alber J, Fellows M R, Niedermeier R . Polynomial-time data reduction for dominating set. Journal of the ACM, 2004, 51( 3): 363–384

[52]

Guha S, Khuller S . Approximation algorithms for connected dominating sets. Algorithmica, 1998, 20( 4): 374–387

[53]

Wan P J, Alzoubi K M, Frieder O. Distributed construction of connected dominating set in wireless ad hoc networks. In: Proceedings of the 21st Annual Joint Conference of the IEEE Computer and Communications Societies. 2002, 1597−1604

[54]

Gandhi R, Parthasarathy S . Distributed algorithms for connected domination in wireless networks. Journal of Parallel and Distributed Computing, 2007, 67( 7): 848–862

[55]

Du D Z, Wan P J. Connected Dominating Set: Theory and Applications. New York: Springer, 2013

[56]

Zhang W, Lin X, Zhang Y, Pei J, Wang W . Threshold-based probabilistic top-k dominating queries. The VLDB Journal, 2010, 19( 2): 283–305

[57]

Chekuri C, Kumar A. Maximum coverage problem with group budget constraints and applications. In: Proceedings of the 7th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2004 and 8th International Workshop on Randomization and Computation. 2004, 72−83

[58]

Rawitz D, Rosén A . Online budgeted maximum coverage. Algorithmica, 2021, 83( 9): 2989–3014

[59]

Hochbaum D S, Rao X . Approximation algorithms for connected maximum coverage problem for the discovery of mutated driver pathways in cancer. Information Processing Letters, 2020, 158: 105940

[60]

Farbstein B, Levin A . Maximum coverage problem with group budget constraints. Journal of Combinatorial Optimization, 2017, 34( 3): 725–735

[61]

Guo L, Li M, Xu D . Efficient approximation algorithms for maximum coverage with group budget constraints. Theoretical Computer Science, 2019, 788: 53–65

[62]

Indyk P, Vakilian A. Tight trade-offs for the maximum k-coverage problem in the general streaming model. In: Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 2019, 200−217

[63]

Perrault P, Healey J, Wen Z, Valko M. Budgeted online influence maximization. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 7620−7631

[64]

Banerjee S, Jenamani M, Pratihar D K . ComBIM: A community-based solution approach for the budgeted influence maximization problem. Expert Systems with Applications, 2019, 125: 1–13

[65]

Han S, Zhuang F, He Q, Shi Z. Balanced seed selection for budgeted influence maximization in social networks. In: Proceedings of the 18th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. 2014, 65−77

[66]

Khuller S, Moss A, Naor J . The budgeted maximum coverage problem. Information Processing Letters, 1999, 70( 1): 39–45

[67]

Vazirani V V. Approximation Algorithms. Berlin: Springer, 2003

[68]

Vandin F, Upfal E, Raphael B J . Algorithms for detecting significantly mutated pathways in cancer. Journal of Computational Biology, 2011, 18( 3): 507–522

[69]

Cohen R, Katzir L. The generalized maximum coverage problem. Information Processing Letters, 2008, 108(1): 15−22

[70]

Dorigo M, Birattari M, Stutzle T . Ant colony optimization. IEEE Computational Intelligence Magazine, 2006, 1( 4): 28–39

[71]

Zacharatou E T, Doraiswamy H, Ailamaki A, Silva C T, Freire J . GPU rasterization for real-time spatial aggregation over arbitrary polygons. Proceedings of the VLDB Endowment, 2017, 11( 3): 352–365

[72]

Cao C, Li M. Generating mobility trajectories with retained data utility. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021, 2610−2620

[73]

Peng J, Wang H, Li J, Gao H. Set-based similarity search for time series. In: Proceedings of 2016 International Conference on Management of Data. 2016, 2039−2052

[74]

Yang W, Wang S, Sun Y, Peng Z . Fast dataset search with earth mover’s distance. Proceedings of the VLDB Endowment, 2022, 15( 11): 2517–2529

[75]

Su H, Liu S, Zheng B, Zhou X, Zheng K . A survey of trajectory distance measures and performance evaluation. The VLDB Journal, 2020, 29( 1): 3–32

[76]

Nutanong S, Jacox E H, Samet H . An incremental hausdorff distance calculation algorithm. Proceedings of the VLDB Endowment, 2011, 4( 8): 506–517

[77]

Corral A, Manolopoulos Y, Theodoridis Y, Vassilakopoulos M. Closest pair queries in spatial databases. In: Proceedings of 2000 ACM SIGMOD International Conference on Management of Data. 2000, 189−200

[78]

Vandin F, Upfal E, Raphael B . Algorithms and genome sequencing: identifying driver pathways in cancer. Computer, 2012, 45( 3): 39–46

[79]

Omohundro S M. Five Balltree Construction Algorithms. Technical Report, International Computer Science Institute, Berkeley, California 1989, 1−22

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (4305KB)

Supplementary files

Highlights

517

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/