咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Clustering Text Data Streams 收藏

Clustering Text Data Streams

Clustering Text Data Streams

作     者:刘玉葆 蔡嘉荣 印鉴 傅蔚慈 

作者机构:Department of Computer ScienceSun Yat-Sen University Department of Computer Science and Engineeringthe Chinese University of Hong Kong 

出 版 物:《Journal of Computer Science & Technology》 (计算机科学技术学报(英文版))

年 卷 期:2008年第23卷第1期

页      面:112-128页

核心收录:

学科分类:0808[工学-电气工程] 08[工学] 081203[工学-计算机应用技术] 0835[工学-软件工程] 0701[理学-数学] 0811[工学-控制科学与工程] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:Supported by the National Natural Science Foundation of China under Grant Nos.60573097,60703111,60773198 the Natural Science Foundation of Guangdong Province under Grant No.06104916 the Specialized Research Foundation for the Doctoral Program of Higher Education under Grant No.20050558017 the Program for New Century Excellent Talents in University of China under Grant No.NCET-06-0727. 

主  题:clustering database applications data mining text data streams 

摘      要:Clustering text data streams is an important issue in data mining community and has a number of applications such as news group filtering, text crawling, document organization and topic detection and tracing etc. However, most methods are similarity-based approaches and only use the TF,IDF scheme to represent the semantics of text data and often lead to poor clustering quality. Recently, researchers argue that semantic smoothing model is more efficient than the existing TF,IDF scheme for improving text clustering quality. However, the existing semantic smoothing model is not suitable for dynamic text data context. In this paper, we extend the semantic smoothing model into text data streams context firstly. Based on the extended model, we then present two online clustering algorithms OCTS and OCTSM for the clustering of massive text data streams. In both algorithms, we also present a new cluster statistics structure named cluster profile which can capture the semantics of text data streams dynamically and at the same time speed up the clustering process. Some efficient implementations for our algorithms are also given. Finally, we present a series of experimental results illustrating the effectiveness of our technique.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分