CLS-Miner: efficient and effective closed high-utility itemset mining

Thu-Lan DAM , Kenli LI , Philippe FOURNIER-VIGER , Quang-Huy DUONG

Front. Comput. Sci. ›› 2019, Vol. 13 ›› Issue (2) : 357 -381.

PDF (963KB)
Front. Comput. Sci. ›› 2019, Vol. 13 ›› Issue (2) : 357 -381. DOI: 10.1007/s11704-016-6245-4
RESEARCH ARTICLE

CLS-Miner: efficient and effective closed high-utility itemset mining

Author information +
History +
PDF (963KB)

Abstract

High-utility itemset mining (HUIM) is a popular data mining task with applications in numerous domains. However, traditional HUIM algorithms often produce a very large set of high-utility itemsets (HUIs). As a result, analyzing HUIs can be very time consuming for users. Moreover, a large set of HUIs also makes HUIM algorithms less efficient in terms of execution time and memory consumption. To address this problem, closed high-utility itemsets (CHUIs), concise and lossless representations of all HUIs, were proposed recently. Although mining CHUIs is useful and desirable, it remains a computationally expensive task. This is because current algorithms often generate a huge number of candidate itemsets and are unable to prune the search space effectively. In this paper, we address these issues by proposing a novel algorithm called CLS-Miner. The proposed algorithm utilizes the utility-list structure to directly compute the utilities of itemsets without producing candidates. It also introduces three novel strategies to reduce the search space, namely chain-estimated utility co-occurrence pruning, lower branch pruning, and pruning by coverage. Moreover, an effective method for checking whether an itemset is a subset of another itemset is introduced to further reduce the time required for discovering CHUIs. To evaluate the performance of the proposed algorithm and its novel strategies, extensive experiments have been conducted on six benchmark datasets having various characteristics. Results show that the proposed strategies are highly efficient and effective, that the proposed CLS-Miner algorithmoutperforms the current state-ofthe- art CHUD and CHUI-Miner algorithms, and that CLSMiner scales linearly.

Keywords

utility mining / high-utility itemset mining / closed itemset mining / closed high-utility itemset mining

Cite this article

Download citation ▾
Thu-Lan DAM, Kenli LI, Philippe FOURNIER-VIGER, Quang-Huy DUONG. CLS-Miner: efficient and effective closed high-utility itemset mining. Front. Comput. Sci., 2019, 13(2): 357-381 DOI:10.1007/s11704-016-6245-4

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Agrawal R, Srikant R. Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases. 1994, 487–499

[2]

Zaki M J, Gouda K. Fast vertical mining using diffsets. In: Proceedings of the 9th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining. 2003, 326–335

[3]

Han J W, Pei J, Yin Y W. Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining and Knowledge Discovery, 2004, 8(1): 53–87

[4]

Han J, Wang J, Lu Y, Tzvetkov P. Mining top-k frequent closed patterns without minimum support. In: Proceedings of IEEE International Conference on Data Mining. 2002, 211–218

[5]

Grahne G, Zhu J. Fast algorithms for frequent itemset mining using fptrees. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(10): 1347–1362

[6]

Deng Z H, Lv S L. PrePost+: an efficient N-lists-based algorithm for mining frequent itemsets via children-parent equivalence pruning. Expert Systems with Applications, 2015, 42(13): 5424–5432

[7]

Dam T L, Li K, Fournier-Viger P, Duong Q H. An efficient algorithm for mining top-rank-k frequent patterns. Applied Intelligence, 2016, 45(1): 96–111

[8]

Liu Y, Liao W K, Choudhary A. A two-phase algorithm for fast discovery of high utility itemsets. In: Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2005, 689–695

[9]

Liu M, Qu J. Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012, 55–64

[10]

Tseng V, Shie B E, Wu C W, Yu P. Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(8): 1772–1786

[11]

Fournier-Viger P, Wu C W, Zida S, Tseng V. FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Proceedings of International Symposium on Methodologies for Intelligent Systems. 2014, 83–92

[12]

Song W, Liu Y, Li J. BAHUI: fast and memory efficient mining of high utility itemsets based on bitmap. International Journal of Data Warehousing and Mining, 2014, 10(1): 1–15

[13]

Song W, Zhang Z, Li J. A high utility itemset mining algorithm based on subsume index. Knowledge and Information Systems, 2016, 49(1): 315–340

[14]

Duong Q H, Liao B, Fournier-Viger P, Dam T L. An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies. Knowledge-Based Systems, 2016, 104: 106–122

[15]

Ahmed C, Tanbeer S, Jeong B S, Lee Y K. Efficient tree structures for high utility pattern mining in incremental databases. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(12): 1708–1721

[16]

Yang Q. Three challenges in data mining. Frontiers of Computer Science, 2010, 4(3): 324–333

[17]

Chen J, Li K, Tang Z, Bilal K, Yu S, Weng C, Li K. A parallel random forest algorithm for big data in a spark cloud computing environment. IEEE Transactions on Parallel and Distributed Systems, 2017, 28(4): 919–933

[18]

Song W, Liu Y, Li J. Mining high utility itemsets by dynamically pruning the tree structure. Applied Intelligence, 2014, 40(1): 29–43

[19]

Fournier-Viger P, Lin J C W, Duong Q H, Dam T L. In: Fujita H, Ali M, Selamat A, et al, eds. FHM+: Faster High-Utility Itemset Mining Using Length Upper-Bound Reduction. Cham: Springer International 2016, 115–127

[20]

Tseng V S, Wu C W, Fournier-Viger P, Yu P S. Efficient algorithms for mining the concise and lossless representation of high utility itemsets. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(3): 726–739

[21]

Pasquier N, Bastide Y, Taouil R, Lakhal L. Efficient mining of association rules using closed itemset lattices. Information Systems, 1999, 24(1): 25–46

[22]

Lucchese C, Orlando S, Perego R. Fast and memory efficient mining of frequent closed itemsets. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(1): 21–36

[23]

Zaki M J, Hsiao C J. Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(4): 462–478

[24]

Fournier-Viger P. FHN: efficient mining of high-utility itemsets with negative unit profits. In: Proceedings of International Conference on Advanced Data Mining and Applications. 2014, 16–29

[25]

Tseng V, Wu C W, Fournier-Viger P, Yu P. Efficient algorithms for mining top-k high utility itemsets. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(1): 54–67

[26]

Wu C W, Fournier-Viger P, Gu J Y, Tseng V S. Mining closed+ high utility itemsets without candidate generation. In: Proceedings of Conference on Technologies and Applications of Artificial Intelligence. 2015, 187–194

[27]

Chan R, Yang Q, Shen Y D. Mining high utility itemsets. In: Proceedings of the 3rd IEEE International Conference on Data Mining. 2003, 19–26

[28]

Lan G C, Hong T P, Tseng V S. An efficient projection-based indexing approach for mining high utility itemsets. Knowledge and Information Systems, 2014, 38(1): 85–107

[29]

Gouda K, Zaki M J. Genmax: an efficient algorithm for mining maximal frequent itemsets. Data Mining and Knowledge Discovery, 2005, 11(3): 223–242

[30]

Uno T, Kiyomi M, Arimura H. LCMver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: Proceedings of IEEE ICDM Workshop on Frequent Itemset Mining Implementations. 2004

[31]

Szathmary L, Valtchev P, Napoli A, Godin R, Boc A, Makarenkov V. A fast compound algorithm for mining generators, closed itemsets, and computing links between equivalence classes. Annals of Mathematics and Artificial Intelligence, 2014, 70(1–2): 81–105

[32]

Fournier-Viger P, Wu C W, Tseng V S. Novel concise representations of high utility itemsets using generator patterns. In: Proceedings of the International Conference on Advanced Data Mining and Applications. 2014, 30–43

[33]

Shie B E, Philip S Y, Tseng V S. Efficient algorithms for mining maximal high utility itemsets from data streams with different models. Expert Systems with Applications, 2012, 39(17): 12947–12960

[34]

Pasquier N, Bastide Y, Taouil R, Lakhal L. Discovering frequent closed itemsets for association rules. In: Proceedings of the International Conference on Database Theory. 1999, 398–416

[35]

Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu C W, Tseng V S. SPMF: a Java open-source pattern mining library. Journal of Machine Learning Research, 2014, 15: 3569–3573

[36]

Pisharath J, Liu Y, Liao W K, Choudhary A, Memik G, Parhi J. NUMineBench 2.0. CUCIS Technical Report CUCIS-2005-08-01. 2005

RIGHTS & PERMISSIONS

Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature

AI Summary AI Mindmap
PDF (963KB)

Supplementary files

Supplementary Material

1463

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/