A survey of uncertain data management
Lingli LI, Hongzhi WANG, Jianzhong LI, Hong GAO
A survey of uncertain data management
Uncertain data are data with uncertainty information, which exist widely in database applications. In recent years, uncertainty in data has brought challenges in almost all database management areas such as data modeling, query representation, query processing, and data mining. There is no doubt that uncertain data management has become a hot research topic in the field of data management. In this study, we explore problems in managing uncertain data, present state-of-the-art solutions, and provide future research directions in this area. The discussed uncertain data management techniques include data modeling, query processing, and data mining in uncertain data in the forms of relational, XML, graph, and stream.
uncertain data / probabilistic database / probabilistic XML / semi-structured data / data stream
[1] |
Fuhr N, Rölleke T. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Transactions on Information Systems, 1997, 15(1): 32–66
CrossRef
Google scholar
|
[2] |
Imieliński T, Lipski W. Incomplete information in relational databases. Journal of the ACM, 1984, 31(4): 761–791
CrossRef
Google scholar
|
[3] |
Barbará D, Garcia-Molina H, Porter D. The management of probabilistic data. IEEE Transactions on Knowledge and Data Engineering, 1992, 4(5): 487–502
CrossRef
Google scholar
|
[4] |
Lakshmanan L V, Leone N, Ross R, Subrahmanian V S. Probview: a flexible probabilistic database system. ACM Transactions on Database Systems, 1997, 22(3): 419–469
CrossRef
Google scholar
|
[5] |
Zimányi E. Query evaluation in probabilistic relational databases. Theoretical Computer Science, 1997, 171(1): 179–219
CrossRef
Google scholar
|
[6] |
Sen P, Deshpande A. Representing and querying correlated tuples in probabilistic databases. In: Proceedings of the 23rd International Conference on Data Engineering. 2007, 596–605
CrossRef
Google scholar
|
[7] |
Suciu D. Probabilistic databases. SIGACT News, 2008, 39(2): 111–124
CrossRef
Google scholar
|
[8] |
Cavallo R, Pittarelli M. The theory of probabilistic databases. In: Proceedings of the 13th International Conference on Very Large Data Bases. 1987, 71–81
|
[9] |
Benjelloun O, Sarma A D, Halevy A, Widom J. ULDBS: databases with uncertainty and lineage. In: Proceedings of the 32nd International Conference on Very Large Data Bases. VLDB Endowment, 2006, 953–964
|
[10] |
Sen P, Deshpande A, Getoor L. Read-once functions and query evaluation in probabilistic databases. Proceedings of the VLDB Endowment, 2010, 3(1–2): 1068–1079
CrossRef
Google scholar
|
[11] |
Olteanu D, Huang J. Using OBDDs for efficient query evaluation on probabilistic databases. In: Proceedings of the International Conference on Scalable Uncertainty Management. 2008, 326–340
CrossRef
Google scholar
|
[12] |
Roy S, Perduca V, Tannen V. Faster query answering in probabilistic databases using read-once functions. In: Proceedings of the 14th International Conference on Database Theory. 2011, 232–243
CrossRef
Google scholar
|
[13] |
Kenig B, Gal A, Strichman O. A new class of lineage expressions over probabilistic databases computable in P-time. In: Proceedings of the 7th International Conference on Scalable Uncertainty Management. 2013, 219–232
CrossRef
Google scholar
|
[14] |
Widom J. Trio: a system for integrated management of data, accuracy, and lineage. Stanford Infolab, 2004
|
[15] |
Antova L, Koch C, Olteanu D. Maybms: managing incomplete information with probabilistic world-set decompositions. In: Proceedings of the 23rd International Conference on Data Engineering. 2007, 1479–1480
CrossRef
Google scholar
|
[16] |
Cheng R, Singh S, Prabhakar S. U-DBMS: a database system for managing constantly-evolving data. In: Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 2005, 1271–1274
|
[17] |
Boulos J, Dalvi N, Mandhani B, Mathur S, Re C, Suciu D. Mystiq: a system for finding more answers by using probabilities. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. 2005, 891–893
CrossRef
Google scholar
|
[18] |
Olteanu D, Huang J, Koch C. Sprout: lazy vs. eager query plans for tuple-independent probabilistic databases. In: Proceedings of the 25th International Conference on Data Engineering. 2009, 640–651
CrossRef
Google scholar
|
[19] |
Kimelfeld B, Kosharovsky Y, Sagiv Y. Query evaluation over probabilistic XML. The International Journal on Very Large Data Bases, 2009, 18(5): 1117–1140
CrossRef
Google scholar
|
[20] |
Senellart P, Souihli A. Proapprox: a lightweight approximation query processor over probabilistic trees. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. 2011, 1295–1298
CrossRef
Google scholar
|
[21] |
Welbourne E, Khoussainova N, Letchner J, Li Y, Balazinska M, Borriello G, Suciu D. Cascadia: a system for specifying, detecting, and managing rfid events. In: Proceedings of the 6th International Conference on Mobile Systems, Applications, and Services. 2008, 281–294
CrossRef
Google scholar
|
[22] |
Tran T T, Peng L, Li B, Diao Y, Liu A. PODS: a new model and processing algorithms for uncertain data streams. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 159–170
CrossRef
Google scholar
|
[23] |
Tran T T, Peng L, Diao Y, McGregor A, Liu A. Claro: modeling and processing uncertain data streams. The International Journal on Very Large Data Bases, 2012, 21(5): 651–676
CrossRef
Google scholar
|
[24] |
Aggarwal C C, Yu P S. A survey of uncertain data algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(5): 609–623
CrossRef
Google scholar
|
[25] |
Zhou A Y. A survey on the management of uncertain data. Chinese Journal of Computers, 2009, 32(1): 1–16
CrossRef
Google scholar
|
[26] |
Kimelfeld B, Senellart P. Probabilistic XML: Models and Complexity. Advances in Probabilistic Databases for Uncertain Information Management, Springer, Berlin, Heidelberg, 2013, 39–66
CrossRef
Google scholar
|
[27] |
Sarma A D, Benjelloun O, Halevy A, Widom J. Working models for uncertain data. In: Proceedings of the 22nd International Conference on Data Engineering. 2006, 7
CrossRef
Google scholar
|
[28] |
Green T J, Tannen V. Models for incomplete and probabilistic information. In: Proceedings of the International Conference on Extending Database Technology. 2006, 278–296
CrossRef
Google scholar
|
[29] |
Sen P, Deshpande A, Getoor L. PRDB: managing and exploiting rich correlations in probabilistic databases. The International Journal on Very Large Data Bases, 2009, 18(5): 1065–1090
CrossRef
Google scholar
|
[30] |
Chen R, Mao Y, Kiringa I. GRN model of probabilistic databases: construction, transition and querying. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 291–302
CrossRef
Google scholar
|
[31] |
Cheng R, Xia Y, Prabhakar S, Shah R, Vitter J S. Efficient indexing methods for probabilistic threshold queries over uncertain data. In: Proceedings of the 30th International Conference on Very Large Data Bases. VLDB Endowment, 2004, 876–887
CrossRef
Google scholar
|
[32] |
Tao Y, Cheng R, Xiao X, Ngai W K, Kao B, Prabhakar S. Indexing multi-dimensional uncertain data with arbitrary probability density functions. In: Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 2005, 922–933
|
[33] |
Burdick D, Deshpande P M, Jayram T, Ramakrishnan R, Vaithyanathan S. Olap over uncertain and imprecise data. In: Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 2005, 970–981
|
[34] |
Jayram T, Kale S, Vee E. Efficient aggregation algorithms for probabilistic data. In: Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 2007, 346–355
|
[35] |
Dalvi N, Suciu D. Efficient query evaluation on probabilistic databases. In: Proceedings of the 30th International Conference on Very Large Data Bases. VLDB Endowment, 2004, 864–875
CrossRef
Google scholar
|
[36] |
Cormode G, Garofalakis M. Sketching probabilistic data streams. In: Proceedings of the ACM SIGMOD International Conference onManagement of Data. 2007, 281–292
CrossRef
Google scholar
|
[37] |
Ross R, Subrahmanian V, Grant J. Aggregate operators in probabilistic databases. Journal of the ACM, 2005, 52(1): 54–101
CrossRef
Google scholar
|
[38] |
Kanagal B, Deshpande A. Efficient query evaluation over temporally correlated probabilistic streams. In: Proceedings of the 25th International Conference on Data Engineering. 2009, 1315–1318
|
[39] |
Burdick D, Deshpande P M, Jayram T, Ramakrishnan R, Vaithyanathan S. Efficient allocation algorithms for olap over imprecise data. In: Proceedings of the 32nd International Conference on Very Large Data Bases. VLDB Endowment, 2006, 391–402
|
[40] |
Ré C, Suciu D. The trichotomy of having queries on a probabilistic database. The International Journal on Very Large Data Bases, 2009, 18(5): 1091–1116
CrossRef
Google scholar
|
[41] |
Fink R, Han L, Olteanu D. Aggregation in probabilistic databases via knowledge compilation. Proceedings of the VLDB Endowment, 2012, 5(5): 490–501
CrossRef
Google scholar
|
[42] |
Ngai W K, Kao B, Chui C K, Cheng R, Chau M, Yip K Y. Efficient clustering of uncertain data. In: Proceedings of the 6th International Conference on Data Mining. 2006, 436–445
CrossRef
Google scholar
|
[43] |
Agrawal P, Widom J. Confidence-aware join algorithms. In: Proceedings of the 25th International Conference on Data Engineering. 2009, 628–639
CrossRef
Google scholar
|
[44] |
Cheng R, Singh S, Prabhakar S, Shah R, Vitter J S, Xia Y. Efficient join processing over uncertain data. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management. 2006, 738–747
CrossRef
Google scholar
|
[45] |
Kriegel H P, Kunath P, Pfeifle M, Renz M. Probabilistic similarity join on uncertain data. In: Proceedings of the International Conference on Database Systems for Advanced Applications. 2006, 295–309
CrossRef
Google scholar
|
[46] |
Ljosa V, Singh A K. Top-k spatial joins of probabilistic objects. In: Proceedings of the 24th International Conference on Data Engineering. 2008, 566–575
CrossRef
Google scholar
|
[47] |
Jestes J, Li F, Yan Z, Yi K. Probabilistic string similarity joins. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 327–338
CrossRef
Google scholar
|
[48] |
Lian X, Chen L. Set similarity join on probabilistic data. Proceedings of the VLDB Endowment, 2010, 3(1–2): 650–659
CrossRef
Google scholar
|
[49] |
Andritsos P, Fuxman A, Miller R J. Clean answers over dirty databases: probabilistic approach. In: Proceedings of the 22nd International Conference on Data Engineering. 2006, 30
CrossRef
Google scholar
|
[50] |
Wick M, McCallum A, Miklau G. Scalable probabilistic databases with factor graphs and mcmc. Proceedings of the VLDB Endowment, 2010, 3(1–2): 794–804
CrossRef
Google scholar
|
[51] |
Qi Y, Jain R, Singh S, Prabhakar S. Threshold query optimization for uncertain data. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 315–326
CrossRef
Google scholar
|
[52] |
Moore K F, Rastogi V, Ré C, Suciu D. Query containment of tier-2 queries over a probabilistic database. In: Proceedings of the VLDB Workshop on Management of Uncertain Data. 2010, 47–62
|
[53] |
Ge T, Grabiner D, Zdonik S. Monte carlo query processing of uncertain multidimensional array data. In: Proceedings of the 27th International Conference on Data Engineering. 2011, 936–947
CrossRef
Google scholar
|
[54] |
Soliman M A, Ilyas I F, Chang K C C. Top-k query processing in uncertain databases. In: Proceedings of the 23rd International Conference on Data Engineering. 2007, 896–905
CrossRef
Google scholar
|
[55] |
Yi K, Li F, Kollios G, Srivastava D. Efficient processing of top-k queries in uncertain databases with x-relations. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(12): 1669–1682
CrossRef
Google scholar
|
[56] |
Huang Y K, Chen C C, Lee C. Continuous k-nearest neighbor query for moving objects with uncertain velocity. GeoInformatica, 2009, 13(1): 1–25
CrossRef
Google scholar
|
[57] |
Zhang X, Chomicki J. Semantics and evaluation of top-k queries in probabilistic databases. Distributed and Parallel Databases, 2009, 26(1): 67–126
CrossRef
Google scholar
|
[58] |
Hua M, Pei J, Zhang W, Lin X. Ranking queries on uncertain data: a probabilistic threshold approach. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008, 673–686
CrossRef
Google scholar
|
[59] |
Cormode G, Li F, Yi K. Semantics of ranking queries for probabilistic data and expected ranks. In: Proceedings of the 25th International Conference on Data Engineering. 2009, 305–316
CrossRef
Google scholar
|
[60] |
Ge T, Zdonik S, Madden S. Top-k queries on uncertain data: on score distribution and typical answers. In: Proceedings of the 35th ACM SIGMOD International Conference on Management of Data. 2009, 375–388
CrossRef
Google scholar
|
[61] |
Soliman M A, Ilyas I F. Ranking with uncertain scores. In: Proceedings of the 25th International Conference on Data Engineering. 2009, 317–328
CrossRef
Google scholar
|
[62] |
Li J, Deshpande A. Ranking continuous probabilistic datasets. Proceedings of the VLDB Endowment, 2010, 3(1–2): 638–649
CrossRef
Google scholar
|
[63] |
Cheng R, Chen J, Mokbel M, Chow C Y. Probabilistic verifiers: evaluating constrained nearest-neighbor queries over uncertain data. In: Proceedings of the 24th International Conference on Data Engineering. 2008, 973–982
CrossRef
Google scholar
|
[64] |
Cheng R, Chen L, Chen J, Xie X. Evaluating probability threshold k-nearest-neighbor queries over uncertain data. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. 2009, 672–683
CrossRef
Google scholar
|
[65] |
Zhang Y, Lin X, Zhu G, Zhang W, Lin Q. Efficient rank based knn query processing over uncertain data. In: Proceedings of the 26th International Conference on Data Engineering. 2010, 28–39
CrossRef
Google scholar
|
[66] |
Lian X, Chen L. Probabilistic group nearest neighbor queries in uncertain databases. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(6): 809–824
CrossRef
Google scholar
|
[67] |
Yuen S M, Tao Y, Xiao X, Pei J, Zhang D. Superseding nearest neighbor search on uncertain spatial databases. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(7): 1041–1055
CrossRef
Google scholar
|
[68] |
Cheema M A, Lin X, Wang W, Zhang W, Pei J. Probabilistic reverse nearest neighbor queries on uncertain data. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(4): 550–564
CrossRef
Google scholar
|
[69] |
Lian X, Chen L. Probabilistic inverse ranking queries in uncertain databases. The International Journal on Very Large Data Bases, 2011, 20(1): 107–127
CrossRef
Google scholar
|
[70] |
Lian X, Chen L. Efficient processing of probabilistic reverse nearest neighbor queries over uncertain data. The International Journal on Very Large Data Bases, 2009, 18(3): 787–808
CrossRef
Google scholar
|
[71] |
Pei J, Jiang B, Lin X, Yuan Y. Probabilistic skylines on uncertain data. In: Proceedings of the 33rd International Conference on Very Large Data Bases. VLDB Endowment, 2007, 15–26
|
[72] |
Yuan Y, Wang G. Answering probabilistic reachability queries over uncertain graphs. Chinese Journal of Computers, 2010, 33(8): 1378–1386
CrossRef
Google scholar
|
[73] |
Lian X, Chen L. Top-k dominating queries in uncertain databases. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. 2009, 660–671
CrossRef
Google scholar
|
[74] |
Grädel E, Gurevich Y, Hirsch C. The complexity of query reliability. In: Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. 1998, 227–234
CrossRef
Google scholar
|
[75] |
Dalvi N, Suciu D. The dichotomy of conjunctive queries on probabilistic structures. In: Proceedings of the 26th ACM SIGMODSIGACT- SIGART Symposium on Principles of Database Systems. 2007, 293–302
CrossRef
Google scholar
|
[76] |
Fagin R, Lotem A, Naor M. Optimal aggregation algorithms for middleware. In: Proceedings of the 20th ACM SIGMOD-SIGACTSIGART Symposium on Principles of Database Systems. 2001, 102–113
CrossRef
Google scholar
|
[77] |
Li J, Saha B, Deshpande A. A unified approach to ranking in probabilistic databases. Proceedings of the VLDB Endowment, 2009, 2(1): 502–513
CrossRef
Google scholar
|
[78] |
Li F, Yi K, Jestes J. Ranking distributed probabilistic data. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. 2009, 361–374
CrossRef
Google scholar
|
[79] |
Dai X, Yiu M L, Mamoulis N, Tao Y, Vaitis M. Probabilistic spatial queries on existentially uncertain data. Advances in Spatial and Temporal Databases, 2005, 400–417
CrossRef
Google scholar
|
[80] |
Yiu M L, Mamoulis N, Dai X, Tao Y, Vaitis M. Efficient evaluation of probabilistic advanced spatial queries on existentially uncertain data. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(1): 108–122
CrossRef
Google scholar
|
[81] |
Cheng R, Kalashnikov D V, Prabhakar S. Evaluating probabilistic queries over imprecise data. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. 2003, 551–562
CrossRef
Google scholar
|
[82] |
Kriegel H P, Kunath P, Renz M. Probabilistic nearest-neighbor query on uncertain objects. In: Proceedings of the International Conference on Database Systems for Advanced Applications. 2007, 337–348
CrossRef
Google scholar
|
[83] |
Lian X, Chen L. Probabilistic inverse ranking queries over uncertain data. In: Proceedings of the International Conference on Database Systems for Advanced Applications. 2009, 35–50
CrossRef
Google scholar
|
[84] |
Lian X, Chen L. Monochromatic and bichromatic reverse skyline search over uncertain databases. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008, 213–226
CrossRef
Google scholar
|
[85] |
Tao Y, Xiao X, Cheng R. Range search on multidimensional uncertain data. ACM Transactions on Database Systems, 2007, 32(3): 15
CrossRef
Google scholar
|
[86] |
Bohm C, Pryakhin A, Schubert M. The gauss-tree: efficient object identification in databases of probabilistic feature vectors. In: Proceedings of the 22nd International Conference on Data Engineering. 2006, 9
CrossRef
Google scholar
|
[87] |
Ljosa V, Singh A K. APLA: indexing arbitrary probability distributions. In: Proceedings of the 23rd International Conference on Data Engineering. 2007, 946–955
CrossRef
Google scholar
|
[88] |
Cheng R, Xie X, Yiu M L, Chen J, Sun L. UV-diagram: a voronoi diagram for uncertain data. In: Proceedings of the 26th International Conference on Data Engineering. 2010, 796–807
CrossRef
Google scholar
|
[89] |
Angiulli F, Fassetti F. Indexing uncertain data in general metric spaces. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(9): 1640–1657
CrossRef
Google scholar
|
[90] |
Singh S, Mayfield C, Prabhakar S, Shah R, Hambrusch S. Indexing uncertain categorical data. In: Proceedings of the 23rd International Conference on Data Engineering. 2007, 616–625
CrossRef
Google scholar
|
[91] |
Kanagal B, Deshpande A. Indexing correlated probabilistic databases. In: Proceedings of the 35th SIGMOD International Conference on Management of Data. 2009, 455–468
CrossRef
Google scholar
|
[92] |
Chau M, Cheng R, Kao B, Ng J. Uncertain data mining: an example in clustering location data. In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2006, 199–204
CrossRef
Google scholar
|
[93] |
Li Y, Han J, Yang J. Clustering moving objects. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004, 617–622
CrossRef
Google scholar
|
[94] |
Lee S D, Kao B, Cheng R. Reducing UK-means to K-means. In: Proceedings of the 7th International Conference on Data Mining Workshops. 2007, 483–488
CrossRef
Google scholar
|
[95] |
Kao B, Lee S D, Cheung D W, Ho W S, Chan K. Clustering uncertain data using voronoi diagrams. In: Proceedings of the 8th International Conference on Data Mining. 2008, 333–342
CrossRef
Google scholar
|
[96] |
Dehne F, Noltemeier H. Voronoi trees and clustering problems. Information Systems, 1987, 12(2): 171–175
CrossRef
Google scholar
|
[97] |
Gullo F, Ponti G, Tagarelli A. Clustering uncertain data via Kmedoids. In: Proceedings of the International Conference on Scalable Uncertainty Management. 2008, 229–242
CrossRef
Google scholar
|
[98] |
Cormode G, McGregor A. Approximation algorithms for clustering uncertain data. In: Proceedings of the 27th ACM SIGMOD-SIGACTSIGART Symposium on Principles of Database Systems. 2008, 191–200
CrossRef
Google scholar
|
[99] |
Kriegel H P, Pfeifle M. Density-based clustering of uncertain data. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. 2005, 672–677
CrossRef
Google scholar
|
[100] |
Kriegel H P, Pfeifle M. Hierarchical density-based clustering of uncertain data. In: Proceedings of the 5th IEEE International Conference on Data Mining. 2005, 4
CrossRef
Google scholar
|
[101] |
Xu H, Li G. Density-based probabilistic clustering of uncertain data. In: Proceedings of the International Conference on Computer Science and Software Engineering. 2008, 474–477
CrossRef
Google scholar
|
[102] |
Hamdan H, Govaert G. Mixture model clustering of uncertain data. In: Proceedings of the 14th IEEE International Conference on Fuzzy Systems. 2005, 879–884
CrossRef
Google scholar
|
[103] |
Xiao L, Hung E. An efficient distance calculation method for uncertain objects. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining. 2007, 10–17
CrossRef
Google scholar
|
[104] |
Bi J, Zhang T. Support vector classification with input data uncertainty. Advances in Neural Information Processing Systems, 2004, 17: 161–169
|
[105] |
Bhattacharyya C, Pannagadatta K, Smola A J. A second order cone programming formulation for classifying missing data. Advances in Neural Information Processing Systems, 2005, 17: 153–160
|
[106] |
Yang J, Gunn S. Exploiting uncertain data in support vector classification. In: Proceedings of the International Conference on Knowledge– Based Intelligent Information and Engineering Systems. 2007, 148–155
CrossRef
Google scholar
|
[107] |
Yang J, Gunn S. Iterative constraints in support vector classification with uncertain information. Constraint-based Mining and Learning, 2007, 49
|
[108] |
Demichelis F, Magni P, Piergiorgi P, Rubin M A, Bellazzi R. A hierarchical naive bayes model for handling sample heterogeneity in classification problems: an application to tissue microarrays. BMC Bioinformatics, 2006, 7(1): 514
CrossRef
Google scholar
|
[109] |
Chui C K, Kao B, Hung E. Mining frequent itemsets from uncertain data. In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2007, 47–58
CrossRef
Google scholar
|
[110] |
Chui C K, Kao B. A decremental approach for mining frequent itemsets from uncertain data. In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2008, 64–75
CrossRef
Google scholar
|
[111] |
Leung C S, Carmichael C L, Hao B. Efficient mining of frequent patterns from uncertain data. In: Proceedings of the 7th International Conference on Data Mining Workshops. 2007, 489–494
CrossRef
Google scholar
|
[112] |
Leung C K S, Brajczuk D A. Efficient mining of frequent itemsets from data streams. In: Proceedings of the British National Conference on Databases. 2008, 2–14
CrossRef
Google scholar
|
[113] |
Leung C K S, Mateo M A F, Brajczuk D A. A tree-based approach for frequent pattern mining from uncertain data. In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2008, 653–661
CrossRef
Google scholar
|
[114] |
Hewawasam K, Premaratne K, Subasingha S, Shyu M L. Rule mining and classification in imperfect databases. In: Proceedings of the 8th International Conference on Information Fusion. 2005, 661–668
CrossRef
Google scholar
|
[115] |
Tobji M A B, Yaghlane B B, Mellouli K. A new algorithm for mining frequent itemsets from evidential databases. Proceedings of Information Processing and Management of Uncertainty. 2008, 8: 1535–1542
|
[116] |
Tobji M A B, Yaghlane B B, Mellouli K. Frequent itemset mining from databases including one evidential attribute. In: Proceedings of the International Conference on Scalable Uncertainty Management. 2008, 19–32
CrossRef
Google scholar
|
[117] |
Abiteboul S, Kimelfeld B, Sagiv Y, Senellart P. On the expressiveness of probabilistic XML models. The International Journal on Very Large Data Bases, 2009, 18(5): 1041–1064
CrossRef
Google scholar
|
[118] |
Li T, Shao Q, Chen Y. PEPX: a query-friendly probabilistic XML database. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management. 2006, 848–849
CrossRef
Google scholar
|
[119] |
Nierman A, Jagadish H. ProTDB: probabilistic data in XML. In: Proceedings of the 28th International Conference on Very Large Data Bases. VLDB Endowment, 2002, 646–657
CrossRef
Google scholar
|
[120] |
Abiteboul S, Senellart P. Querying and updating probabilistic information in XML. In: Proceedings of the International Conference on Extending Database Technology. 2006, 1059–1068
CrossRef
Google scholar
|
[121] |
Senellart P, Abiteboul S. On the complexity of managing probabilistic XML data. In: Proceedings of the 26th ACM SIGMOD-SIGACTSIGART Symposium on Principles of Database Systems. 2007, 283–292
CrossRef
Google scholar
|
[122] |
Hung E, Getoor L, Subrahmanian V. Probabilistic interval XML. In: Proceedings of International Conference on Database Theory. 2003, 361–377
CrossRef
Google scholar
|
[123] |
Hung E, Getoor L, Subrahmanian V. PXML: a probabilistic semistructured data model and algebra. In: Proceedings of the 19th International Conference on Data Engineering. 2003, 467–478
CrossRef
Google scholar
|
[124] |
Abiteboul S, Chan T H H, Kharlamov E, Nutt W, Senellart P. Aggregate queries for discrete and continuous probabilistic XML. In: Proceedings of the 13th International Conference on Database Theory. 2010, 50–61
CrossRef
Google scholar
|
[125] |
Kimelfeld B, Kosharovsky Y, Sagiv Y. Query efficiency in probabilistic XML models. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008, 701–714
CrossRef
Google scholar
|
[126] |
Zhao W, Dekhtyar A, Goldsmith J. Databases for interval probabilities. International Journal of Intelligent Systems, 2004, 19(9): 789–815
CrossRef
Google scholar
|
[127] |
Zhao W, Dekhtyar A, Goldsmith J. A framework for management of semistructured probabilistic data. Journal of Intelligent Information Systems, 2005, 25(3): 293–332
CrossRef
Google scholar
|
[128] |
Dekhtyar A, Goldsmith J, Hawkes S R. Semistructured probabilistic databases. In: Proceedings of the 13th International Conference on Scientific and Statistical Database Management. 2001, 36–45
CrossRef
Google scholar
|
[129] |
Hung E. Managing uncertainty and ontologies in databases. UMD Theses and Dissertations, 2005
|
[130] |
Magnani M, Montesi D. Management of interval probabilistic data. Acta Informatica, 2008, 45(2): 93–130
CrossRef
Google scholar
|
[131] |
Cohen S, Kimelfeld B, Sagiv Y. Incorporating constraints in probabilistic XML. In: Proceedings of the 27th ACM SIGMOD-SIGACTSIGART Symposium on Principles of Database Systems. 2008, 109–118
CrossRef
Google scholar
|
[132] |
Kimelfeld B, Sagiv Y. Matching twigs in probabilistic XML. In: Proceedings of the 33rd International Conference on Very Large Data Bases. VLDB Endowment, 2007, 27–38
|
[133] |
Adar E, Ré C. Managing uncertainty in social networks. IEEE Data Eng. Bull, 2007, 30(2): 15–22
|
[134] |
Hintsanen P. The most reliable subgraph problem. In: Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery. 2007, 471–478
CrossRef
Google scholar
|
[135] |
Hintsanen P, Toivonen H. Finding reliable subgraphs from large probabilistic graphs. Data Mining and Knowledge Discovery, 2008, 17(1): 3–23
CrossRef
Google scholar
|
[136] |
Zou Z, Li J, Gao H, Zhang S. Frequent subgraph pattern mining on uncertain graph data. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 583–592
CrossRef
Google scholar
|
[137] |
Zou Z, Li J, Gao H, Zhang S. Mining frequent subgraph patterns from uncertain graphs. Journal of Software, 2009, 20(11): 2965–2976
CrossRef
Google scholar
|
[138] |
Zou Z, Li J, Gao H, Zhang S. Mining frequent subgraph patterns from uncertain graph data. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(9): 1203–1218
CrossRef
Google scholar
|
[139] |
Potamias M, Bonchi F, Gionis A, Kollios G. K-nearest neighbors in uncertain graphs. Proceedings of the VLDB Endowment, 2010, 3(1–2): 997–1008
CrossRef
Google scholar
|
[140] |
Yuan Y, Chen L, Wang G. Efficiently answering probability thresholdbased shortest path queries over uncertain graphs. In: Proceedings of the International Conference on Database Systems for Advanced Applications. 2010, 155–170
CrossRef
Google scholar
|
[141] |
Papapetrou O, Ioannou E, Skoutas D. Efficient discovery of frequent subgraph patterns in uncertain graph databases. In: Proceedings of the 14th International Conference on Extending Database Technology. 2011, 355–366
CrossRef
Google scholar
|
[142] |
Han M, Zhang W, Li J Z. Raking: an efficient k-maximal frequent pattern mining algorithm on uncertain graph database. Chinese Journal of Computers, 2010, 33(8): 1387–1395
CrossRef
Google scholar
|
[143] |
Zou Z, Gao H, Li J. Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 633–642
CrossRef
Google scholar
|
[144] |
Zou Z, Li J, Gao H, Zhang S. Finding top-k maximal cliques in an uncertain graph. In: Proceedings of the 26th International Conference on Data Engineering. 2010, 649–652
CrossRef
Google scholar
|
[145] |
Yuan Y, Wang G, Wang H, Chen L. Efficient subgraph search over large uncertain graphs. Proceedings of the VLDB Endowment, 2011, 4(11): 876–886
|
[146] |
Yuan Y, Wang G, Chen L, Wang H. Efficient subgraph similarity search on large probabilistic graph databases. Proceedings of the VLDB Endowment, 2012, 5(9): 800–811
CrossRef
Google scholar
|
[147] |
Koyutürk M, Grama A, Szpankowski W. An efficient algorithm for detecting frequent subgraphs in biological networks. Bioinformatics. 2004, 20(Suppl 1): 200–207
CrossRef
Google scholar
|
[148] |
Valiant L G. The complexity of enumeration and reliability problems. SIAM Journal on Computing, 1979, 8(3): 410–421
CrossRef
Google scholar
|
[149] |
Jin C, Yi K, Chen L, Yu J X, Lin X. Sliding-window top-k queries on uncertain streams. Proceedings of the VLDB Endowment, 2008, 1(1): 301–312
CrossRef
Google scholar
|
[150] |
Ré C, Letchner J, Balazinksa M, Suciu D. Event queries on correlated probabilistic streams. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008, 715–728
CrossRef
Google scholar
|
[151] |
Alon N, Matias Y, Szegedy M. The space complexity of approximating the frequency moments. In: Proceedings of the 28th Annual ACM Symposium on Theory of Computing. 1996, 20–29
CrossRef
Google scholar
|
[152] |
Flajolet P, Martin G N. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 1985, 31(2): 182–209
CrossRef
Google scholar
|
[153] |
Zhang T, Ramakrishnan R, Livny M. Birch: an efficient data clustering method for very large databases. ACM Sigmod Record, 1996, 25(2): 103–114
CrossRef
Google scholar
|
[154] |
Aggarwal C C, Han J, Wang J, Yu P S. A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases. VLDB Endowment, 2003, 81–92
CrossRef
Google scholar
|
[155] |
Aggarwal C C, Yu P S. A framework for clustering uncertain data streams. In: Proceedings of the 24th International Conference on Data Engineering. 2008, 150–159
CrossRef
Google scholar
|
[156] |
Li Z, Ge T. Online windowed subsequence matching over probabilistic sequences. In: Proceedings of the International Conference on Management of Data. 2012, 277–288
CrossRef
Google scholar
|
[157] |
Lian X, Chen L. Efficient join processing on uncertain data streams. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 857–866
CrossRef
Google scholar
|
[158] |
Ge T, Liu F. Accuracy-aware uncertain stream databases. In: Proceedings of the 28th International Conference on Data Engineering. 2012, 174–185
CrossRef
Google scholar
|
[159] |
Peng L, Diao Y, Liu A. Optimizing probabilistic query processing on continuous uncertain data. Proceedings of the VLDB Endowment, 2011, 4(11): 1169–1180
|
[160] |
Jayram T, McGregor A, Muthukrishnan S, Vee E. Estimating statistical aggregates on probabilistic data streams. In: Proceedings of the 26th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 2007, 243–252
CrossRef
Google scholar
|
[161] |
Zhang Q, Li F, Yi K. Finding frequent items in probabilistic data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2008, 819–832
CrossRef
Google scholar
|
[162] |
Aggarwal C C, Han J, Wang J, Philip S Y. On high dimensional projected clustering of data streams. Data Mining and Knowledge Discovery, 2005, 10(3): 251–273
CrossRef
Google scholar
|
[163] |
Zhang C, Gao M, Zhou A. Tracking high quality clusters over uncertain data streams. In: Proceedings of the 25th International Conference on Data Engineering. 2009, 1641–1648
CrossRef
Google scholar
|
[164] |
Zhang W, Lin X, Zhang Y, Wang W, Zhu G, Xu Yu J. Probabilistic skyline operator over sliding windows. Information Systems, 2013, 38(8): 1212–1233
CrossRef
Google scholar
|
[165] |
Subramaniam S, Palpanas T, Papadopoulos D, Kalogeraki V, Gunopulos D. Online outlier detection in sensor data using non-parametric models. In: Proceedings of the 32nd International Conference on Very Large Data Bases. VLDB Endowment, 2006, 187–198
|
[166] |
Deshpande A, Guestrin C, Madden S R, Hellerstein J M, Hong W. Model-driven data acquisition in sensor networks. In: Proceedings of the 30th International Conference on Very Large Data Bases. VLDB Endowment, 2004, 588–599
CrossRef
Google scholar
|
[167] |
Hida Y, Huang P, Nishtala R. Aggregation query under uncertainty in sensor networks. Technical Report, 2004
|
[168] |
Welbourne E, Khoussainova N, Letchner J, Li Y, Balazinska M, Borriello G, Suciu D. Cascadia: a system for specifying, detecting, and managing rfid events. In: Proceedings of the 6th International Conference on Mobile Systems, Applications, and Services. 2008, 281–294
CrossRef
Google scholar
|
[169] |
Kanagal B, Deshpande A. Online filtering, smoothing and probabilistic modeling of streaming data. In: Proceedings of the 24th IEEE International Conference on Data Engineering. 2008, 1160–1169
CrossRef
Google scholar
|
[170] |
Zhang C J, Chen L, Tong Y, Liu Z. Cleaning uncertain data with a noisy crowd. In: Proceedings of the 31st IEEE International Conference on Data Engineering. 2015, 6–17
CrossRef
Google scholar
|
[171] |
Mo L, Cheng R, Li X, Cheung D W, Yang X S. Cleaning uncertain data for top-k queries. In: Proceedings of the 29th IEEE International Conference on Data Engineering. 2013, 134–145
|
[172] |
Panse F, Van Keulen M, De Keijzer A, Ritter N. Duplicate detection in probabilistic data. In: Proceedings of the 26th International Conference on Data Engineering Workshops. 2010, 179–182
CrossRef
Google scholar
|
[173] |
Van Keulen M, De Keijzer A. Qualitative effects of knowledge rules and user feedback in probabilistic data integration. Proceedings of the VLDB Endowment, 2009, 18(5): 1191–1217
CrossRef
Google scholar
|
[174] |
Cheng R, Chen J, Xie X. Cleaning uncertain data with quality guarantees. Proceedings of the VLDB Endowment, 2008, 1(1): 722–735
CrossRef
Google scholar
|
[175] |
Dong X L, Halevy A, Yu C. Data integration with uncertainty. Proceedings of the VLDB Endowment, 2009, 18(2): 469–500
CrossRef
Google scholar
|
/
〈 | 〉 |