Discovering context-aware conditional functional dependencies

Yuefeng DU , Derong SHEN , Tiezheng NIE , Yue KOU , Ge YU

Front. Comput. Sci. ›› 2017, Vol. 11 ›› Issue (4) : 688 -701.

PDF (603KB)
Front. Comput. Sci. ›› 2017, Vol. 11 ›› Issue (4) : 688 -701. DOI: 10.1007/s11704-016-5265-4
RESEARCH ARTICLE

Discovering context-aware conditional functional dependencies

Author information +
History +
PDF (603KB)

Abstract

Conditional functional dependencies(CFDs) are important techniques for data consistency. However, CFDs are limited to 1) provide the reasonable values for consistency repairing and 2) detect potential errors. This paper presents context-aware conditional functional dependencies(CCFDs) which contribute to provide reasonable values and detect potential errors. Especially, we focus on automatically discovering minimal CCFDs. In this paper, we present context relativity to measure the relationship of CFDs. The overlap of the related CFDs can provide reasonable values which result in more accuracy consistency repairing, and some related CFDs are combined into CCFDs.Moreover,we prove that discovering minimal CCFDs is NP-complete and we design the precise method and the heuristic method. We also present the dominating value to facilitate the process in both the precise method and the heuristic method. Additionally, the context relativity of the CFDs affects the cleaning results. We will give an approximate threshold of context relativity according to data distribution for suggestion. The repairing results are approvedmore accuracy, even evidenced by our empirical evaluation.

Keywords

conditional functional dependencies / context aware / rules discovery

Cite this article

Download citation ▾
Yuefeng DU, Derong SHEN, Tiezheng NIE, Yue KOU, Ge YU. Discovering context-aware conditional functional dependencies. Front. Comput. Sci., 2017, 11(4): 688-701 DOI:10.1007/s11704-016-5265-4

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

BittonD, Millman J, TorgersenS . A feasibility and performance study of dependency inference (database design). In: Proceedings of the 5th International Conference on Data Engineering. 1989, 635–641

[2]

AbiteboulS, HullR, VianuV. Foundations of Databases. Boston: Addison-Wesley, 1995

[3]

KivinenJ, Mannila H. Approximate inference of functional dependencies from relations. Theoretical Computer Science, 1995, 149(1): 129–149

[4]

MaherM. Constrained dependencies. Theoretical Computer Science, 1997, 173(1): 113–149

[5]

FanW F, GeertsF, JiaX B, Kementsietsidis A. Conditional functional dependencies for capturing data inconsistencies. ACM Transactions on Database Systems (TODS), 2008, 33(2): 1–44

[6]

FanW F, GeertsF. Foundations of Data Quality Management. San Rafael, Calif: Morgan and Claypool, 2012

[7]

BravoL, FanW F, GeertsF, Ma S. Increasing the expressivity of conditional functional dependencies without extra complexity. In: Proceedings of the 24th International Conference on Data Engineering. 2008, 516–525

[8]

RamanV, Hellerstein J. Potter’s wheel: an interactive data cleaning system. In: Proceedings of the 27th International Conference on Very Large Data Bases. 2001, 381–390

[9]

IlyasI, MarklV, HaasP, Brown P, AboulnagaA . Cords: automatic discovery of correlations and soft functional dependencies. In: Proceedings of the 30th ACM SIGMOD International Conference on Management of Data. 2004, 647–658

[10]

MayfieldC, Neville J, PrabhakarS . Eracer: a database approach for statistical inference and data cleaning. In: Proceedings of the 36th ACM SIGMOD International Conference on Management of Data. 2010, 75–86

[11]

DallachiesaM, EbaidA, EldawyA, Elmagarmid A, IlyasI , OuzzaniM, TangN. Nadeef: a commodity data cleaning system. In: Proceedings of the 39th ACM SIGMOD International Conference on Management of Data. 2013, 541–552

[12]

BohannonP, FanW F, FlasterM, Rastogi R. A cost-based model and effective heuristic for repairing constraints by value modification. In: Proceedings of the 31st ACM SIGMOD International Conference on Management of Data. 2005, 143–154

[13]

MaS, FanW F, BravoL. Extending inclusion dependencies with conditions. Theoretical Computer Science, 2014, 515: 64–95

[14]

FanW F, GeertsF, LiJ Z, Xiong M. Discovering conditional functional dependencies. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(5): 683–698

[15]

CormenH, Leiserson C, RivestR , SteinC. Introduction to algorithms. Cambridge: MIT Press, 2001

[16]

CongG, FanW F, GeertsF, Jia X, MaS . Improving data quality: consistency and accuracy. In: Proceedings of the 33rd International Conference on Very Large Data Bases. 2007, 315–326

[17]

ChuX, IlyasI, PapottiP. Discovering denial constraints. Proceedings of the VLDB Endowment, 2013, 6(13): 1498–1509

[18]

FanW F, GeertsF, TangN, Yu W Y. Inferring data currency and consistency for conflict resolution. In: Proceedings of the 29th International Conference on Data Engineering. 2013, 470–481

[19]

CaoY, FanW F, YuW Y. Determining the relative accuracy of attributes. In: Proceedings of the 39th ACM SIGMOD International Conference on Management of Data. 2013, 565–576

[20]

HaasL, Hernández M, HoH , PopaL, RothM. Clio grows up: from research prototype to industrial tool. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. 2005, 805–810

[21]

MaS, DuanL, FanW F, Hu C, ChenW G . Extending conditional dependencies with built-in predicates. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(12): 3274–3288

[22]

ChenW G, FanW F, MaS. Incorporating cardinality constraints and synonym rules into conditional functional dependencies. Information Processing Letters, 2009, 109(14): 783–789

[23]

HuhtalaY, Kärkkäinen J, PorkkaP , ToivonenH. Tane: an efficient algorithm for discovering functional and approximate dependencies. The Computer Journal, 1999, 42(2): 100–111

[24]

HuhtalaY, Karkkainen J, PorkkaP , ToivonenH. Efficient discovery of functional and approximate dependencies using partitions. In: Proceedings of the 4th International Conference on Data Engineering. 1998, 392–401

[25]

ChiangF, MillerR. Discovering data quality rules. Proceedings of the VLDB Endowment, 2008, 1(1): 1166–1177

[26]

ChiangF, MillerR.A unified model for data and constraint repair. In: Proceedings of the 27th International Conference on Data Engineering. 2011, 446–457

[27]

FanW F, MaS, TangN, Yu W Y. Interaction between record matching and data repairing. Journal of Data and Information Quality (JDIQ), 2014, 4(4): 1–16

[28]

WangJ N, TangN. Towards dependable data repairing with fixing rules. In: Proceedings of the 40th ACM SIGMOD International Conference on Management of Data. 2014, 457–468

[29]

InterlandiM, TangN. Proof positive and negative in data cleaning. In: Proceedings of the 31st International Conference on Data Engineering. 2015, 18–29

RIGHTS & PERMISSIONS

Higher Education Press and Springer-Verlag Berlin Heidelberg

AI Summary AI Mindmap
PDF (603KB)

Supplementary files

FCS-0688-15265-XFD_suppl_1

921

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/