TOPIC SPLITTING: A HIERARCHICAL TOPIC MODEL BASED ON NON-NEGATIVE MATRIX FACTORIZATION
TOPIC SPLITTING: A HIERARCHICAL TOPIC MODEL BASED ON NON-NEGATIVE MATRIX FACTORIZATION作者机构:SKLSDE School of Computer Science Beihang University Beijing 100191 China Department of Scientific Research Academy of Armored Force Engineering Beijing 100072 China
出 版 物:《Journal of Systems Science and Systems Engineering》 (系统科学与系统工程学报(英文版))
年 卷 期:2018年第27卷第4期
页 面:479-496页
核心收录:
学科分类:0810[工学-信息与通信工程] 1205[管理学-图书情报与档案管理] 07[理学] 070104[理学-应用数学] 0802[工学-机械工程] 0811[工学-控制科学与工程] 0701[理学-数学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:国家自然科学基金 supported by the National Natural Science and Technology Basic Platform Projects
主 题:Hierarchical topic model non-negative matrix factorization hierarchical NMF topic splitting
摘 要:Hierarchical topic model has been widely applied in many real applications, because it can build a hierarchy on topics with guaranteeing of topics' quality. Most of traditional methods build a hierarchy by adopting low-level topics as new features to construct high-level ones, which will often cause semantic confusion between low-level topics and high-level ones. To address the above problem, we propose a novel topic model named hierarchical sparse NMF with orthogonal constraint (HSOC), which is based on non-negative matrix factorization and builds topic hierarchy via splitting super-topics into sub-topics. In HSOC, we introduce global independence, local independence and information consistency to constraint the split topics. Extensive experimental results on real-world corpora show that the purposed model achieves comparable performance on topic quality and better performance on semantic feature representation of documents compared with baseline methods.