Adaptive Interval Configuration to Enhance Dynamic Approach for Mining Association Rules
Adaptive Interval Configuration to Enhance Dynamic Approach for Mining Association Rules *作者机构:Department of Automation Tsinghua University Beijing 100084 Department of Computer Science University of Hong Kong Hong Kong
出 版 物:《Tsinghua Science and Technology》 (清华大学学报(自然科学版(英文版))
年 卷 期:1999年第4卷第1期
页 面:57-65页
学科分类:0810[工学-信息与通信工程] 08[工学] 080401[工学-精密仪器及机械] 0804[工学-仪器科学与技术] 080402[工学-测试计量技术及仪器] 0835[工学-软件工程] 081002[工学-信号与信息处理]
基 金:中国科学院资助项目(79770052) RGC (the Hong Kong Research Grants Council)(338/065/0032)
主 题:association rules data mining dynamic process adaptive algorithm
摘 要:ost proposed algorithms for mining association rules follow the conventional level wise approach. The dynamic candidate generation idea introduced in the dynamic itemset counting (DIC) algorithm broke away from the level wise limitation which could find the large itemsets using fewer passes over the database than level wise algorithms. However, the dynamic approach is very sensitive to the data distribution of the database and it requires a proper interval size. In this paper an optimization technique named adaptive interval configuration (AIC) has been developed to enhance the dynamic approach. The AIC optimization has the following two functions. The first is that a homogeneous distribution of large itemsets over intervals can be achieved so that less unnecessary candidates could be generated and less database scanning passes are guaranteed. The second is that the near optimal interval size could be determined adaptively to produce the best response time. We also developed a candidate pruning technique named virtual partition pruning to reduce the size 2 candidate set and incorporated it into the AIC optimization. Based on the optimization technique, we proposed the efficient AIC algorithm for mining association rules. The algorithms of AIC, DIC and the classic Apriori were implemented on a Sun Ultra Enterprise 4000 for performance comparison. The results show that the AIC performed much better than both DIC and Apriori, and showed a strong robustness.