Blog Post Extraction Using Title Finding
会议名称:《第五届全国信息检索学术会议》
会议日期:2009年
学科分类:081203[工学-计算机应用技术] 08[工学] 0835[工学-软件工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:863高技术研究发展计划资助(项目编号:2007AA01Z438)
关 键 词:Blog Post Title Finding VIPS SVM
摘 要:With the development of Web2.0,web mining applications pay more attention to blog *** order to prevent noises in blog pages from affecting the precision of web mining algorithms,it is very necessary to acquire posts from biog pages *** this paper,we propose a blog post extraction algorithm which uses title *** are two stages in the *** the first stage,text nodes which indicate the title of the post are found and used as the beginning of the *** take a machine learning approach to realize this stage,and employ SVM as classification *** the second stage,we find the end of the *** methods are introduced in this stage,one uses VIPS segmentation results,and the other is based on hand-coded rules. Experiments are conducted to see how titles are found and how posts are *** results show that our algorithm can obtain promising results.