Unsupervised Graph-Based Tibetan Multi-Document Summarization
作者机构:School of Information and EngineeringMinzu University of ChinaBeijing100081China National Language Resource Monitoring&Research CenterMinority Languages BranchBeijing100081China University of CaliforniaIrvineCalifornia92617USA Department of PhysicsNew Jersey Institute of TechnologyNewarkNew Jersey07102-1982USA
出 版 物:《Computers, Materials & Continua》 (计算机、材料和连续体(英文))
年 卷 期:2022年第73卷第10期
页 面:1769-1781页
核心收录:
学科分类:0502[文学-外国语言文学] 050201[文学-英语语言文学] 05[文学]
基 金:This work was supported in part by the National Science Foundation Project of P.R.China 484 under Grant No.52071349 partially supported by Young and Middle-aged Talents Project of the State Ethnic Affairs 487 Commission
主 题:Multi-document summarization text clustering topic feature fusion graphic model
摘 要:Text summarization creates subset that represents the most important or relevant information in the original content,which effectively reduce information *** neural network method has achieved good results in the task of text summarization both in Chinese and English,but the research of text summarization in low-resource languages is still in the exploratory stage,especially in ***’s more,there is no large-scale annotated corpus for text *** lack of dataset severely limits the development of low-resource text *** this case,unsupervised learning approaches are more appealing in low-resource languages as they do not require labeled *** this paper,we propose an unsupervised graph-based Tibetan multi-document summarization method,which divides a large number of Tibetan news documents into topics and extracts the summarization of each *** obtained by using traditional graph-based methods have high redundancy and the division of documents topics are not detailed *** terms of topic division,we adopt two level clustering methods converting original document into document-level and sentence-level graph,next we take both linguistic and deep representation into account and integrate external corpus into graph to obtain the sentence semantic *** the shortcomings of the traditional K-Means clustering method and perform more detailed clustering of *** model sentence clusters into graphs,finally remeasure sentence nodes based on the topic semantic information and the impact of topic features on sentences,higher topic relevance summary is *** order to promote the development of Tibetan text summarization,and to meet the needs of relevant researchers for high-quality Tibetan text summarization datasets,this paper manually constructs a Tibetan summarization dataset and carries out relevant *** experiment results show that our method can effectively improve the