Parallel Resource Mining From Bilingual Web Pages
会议名称:《第五届全国信息检索学术会议》
会议日期:2009年
学科分类:12[管理学] 1201[管理学-管理科学与工程(可授管理学、工学学位)]
基 金:supported by the National Natural Science Foundation of China under Grant No 60970057
关 键 词:Parallel resource Bilingual web pages Web mining
摘 要:A new way is reported to extract parallel linguistic resources from bilingual web pages,which differs from previous works from parallel web pages,The candidate bilingual resources are extracted from web pages by heuristic information of the HTML structures and hint *** pair resources are verified by a maximum entropy classifier combining length,word-overlap,alignment and text location *** shows satisfactory parallel resource mining *** study enriches parallel text mining methods from various web pages.