Multi-Label Chinese Comments Categorization: Comparison of Multi-Label Learning Algorithms
作者机构:School of Computer and SoftwareNanjing University of Information Science&TechnologyNanjing210044China School of Information TechnologyDeakin UniversityVictoriaAustralia
出 版 物:《Journal of New Media》 (新媒体杂志(英文))
年 卷 期:2019年第1卷第2期
页 面:51-61页
学科分类:0502[文学-外国语言文学] 050201[文学-英语语言文学] 05[文学]
基 金:supported by the NSFC (Grant Nos. 61772281,61703212, 61602254) Jiangsu Province Natural Science Foundation [grant numberBK2160968] the Priority Academic Program Development of Jiangsu Higher Edu-cationInstitutions (PAPD) and Jiangsu Collaborative Innovation Center on AtmosphericEnvironment and Equipment Technology (CICAEET)
主 题:Multi-label classification Chinese text classification problem transformation,adapted algorithms
摘 要:Multi-label text categorization refers to the problem of categorizing text througha multi-label learning algorithm. Text classification for Asian languages such as Chinese isdifferent from work for other languages such as English which use spaces to separate *** classifying text, it is necessary to perform a word segmentation operation to converta continuous language into a list of separate words and then convert it into a vector of acertain dimension. Generally, multi-label learning algorithms can be divided into twocategories, problem transformation methods and adapted algorithms. This work will usecustomer s comments about some hotels as a training data set, which contains labels for allaspects of the hotel evaluation, aiming to analyze and compare the performance of variousmulti-label learning algorithms on Chinese text classification. The experiment involves threebasic methods of problem transformation methods: Support Vector Machine, Random Forest,k-Nearest-Neighbor;and one adapted algorithm of Convolutional Neural Network. Theexperimental results show that the Support Vector Machine has better performance.