咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Meaningful String Extraction B... 收藏

Meaningful String Extraction Based on Clustering for Improving Webpage Classification

Meaningful String Extraction Based on Clustering for Improving Webpage Classification

作     者:Chen Jie Tan Jianlong Liao Hao Zhou Yanquan 

作者机构:Beijing Univ Posts & Telecommun Beijing 100876 Peoples R China Chinese Acad Sci Inst Comp Technol Beijing 100190 Peoples R China Natl Engn Lab Informat Secur Technol Beijing 100190 Peoples R China 

出 版 物:《China Communications》 (中国通信(英文版))

年 卷 期:2012年第9卷第3期

页      面:68-77页

核心收录:

学科分类:12[管理学] 0810[工学-信息与通信工程] 1201[管理学-管理科学与工程(可授管理学、工学学位)] 0808[工学-电气工程] 0809[工学-电子科学与技术(可授工学、理学学位)] 08[工学] 0839[工学-网络空间安全] 081201[工学-计算机系统结构] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:supported by the National Natural Science Foundation of China under Grants No.61100205,No.60873001 the HiTech Research and Development Program of China under Grant No.2011AA010705 the Fundamental Research Funds for the Central Universities under Grant No.2009RC0212 

主  题:文档聚类 网页分类 字符串 提取 向量空间模型 网页文件 文本分类 VSM 

摘      要:Since webpage classification is different from traditional text classification with its irregular words and phrases,massive and unlabeled features,which makes it harder for us to obtain effective feature.To cope with this problem,we propose two scenarios to extract meaningful strings based on document clustering and term clustering with multi-strategies to optimize a Vector Space Model(VSM) in order to improve webpage classification.The results show that document clustering work better than term clustering in coping with document content.However,a better overall performance is obtained by spectral clustering with document clustering.Moreover,owing to image existing in a same webpage with document content,the proposed method is also applied to extract image meaningful terms,and experiment results also show its effectiveness in improving webpage classification.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分