Improved pattern tree for incremental frequent-pattern mining
Ming Zhou , Taiyong Wang
Transactions of Tianjin University ›› 2010, Vol. 16 ›› Issue (2) : 129 -134.
By analyzing the existing prefix-tree data structure, an improved pattern tree was introduced for processing new transactions. It firstly stored transactions in a lexicographic order tree and then restructured the tree by sorting each path in a frequency-descending order. While updating the improved pattern tree, there was no need to rescan the entire new database or reconstruct a new tree for incremental updating. A test was performed on synthetic dataset T10I4D100K with 100,000 transactions and 870 items. Experimental results show that the smaller the minimum support threshold, the faster the improved pattern tree achieves over CanTree for all datasets. As the minimum support threshold increased from 2% to 3.5%, the runtime decreased from 452.71 s to 186.26 s. Meanwhile, the runtime required by CanTree decreased from 1,367.03 s to 432.19 s. When the database was updated, the execution time of improved pattern tree consisted of construction of original improved pattern trees and reconstruction of initial tree. The experiment results showed that the runtime was saved by about 15% compared with that of CanTree. As the number of transactions increased, the runtime of improved pattern tree was about 25% shorter than that of FP-tree. The improved pattern tree also required less memory than CanTree.
data mining / association rules / improved pattern tree / incremental mining
| [1] |
Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases[C]. In: Proceedings of the 1993 ACM-SIGMOD International Conference on Management of Data (SIGMOD’93). Washington DC, 1993. 207–216. |
| [2] |
|
| [3] |
|
| [4] |
Park J S, Chen M S, Yu P S. An effective hash-based algorithm for mining association rules[C]. In: Proceedings of the 1995 ACM-SIGMOD International Conference on Management of Data (SIGMOD’95). San Jose, 1995. 175–186. |
| [5] |
Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation[C]. In: Proceedings of the 2000 ACMSIGMOD International Conference on Management of Data (SIGMOD’2000). Dallas, 2000. 1–12. |
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
Pei J, Han J, Lu H et al. H-Mine: Hyper-structure mining of frequent patterns in large databases[C]. In: First IEEE International Conference on Data Mining (ICDM’2001). Midland, 2001. 441–448. |
| [11] |
|
| [12] |
|
| [13] |
Li X, Deng Z H, Tang S. A fast algorithm for maintenance of association rules in incremental databases[C]. In: ADMA. Xi’an, 2006. 56–63. |
| [14] |
Cheung W, Zaïane OR. Incremental mining of frequent patterns without candidate generation or support constraint [C]. In: Proc IDEAS 2003. Hongkong, China, 2003. 111–116. |
| [15] |
IBM. QUEST Data Mining Project [EB/OL]. http://www.almaden.ibm.com/cs/quest, 2009-06-30. |
/
| 〈 |
|
〉 |