咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Automatic Classification of Sw... 收藏

Automatic Classification of Swedish Metadata Using Dewey Decimal Classification:A Comparison of Approaches

Automatic Classification of Swedish Metadata Using Dewey Decimal Classification: A Comparison of Approaches

作     者:Koraljka Golub Johan Hagelback Anders Ardo 

作者机构:Department of Cultural SciencesFaculty of Arts and HumanitiesLinnaeus UniversityVaxjoSweden Department of Computer Science and Media TechnologyFaculty of TechnologyLinnaeus UniversityKalmarSweden Department of Electrical and Information TechnologyLund UniversityLundSweden 

出 版 物:《Journal of Data and Information Science》 (数据与情报科学学报(英文版))

年 卷 期:2020年第5卷第1期

页      面:18-38页

核心收录:

学科分类:1205[管理学-图书情报与档案管理] 12[管理学] 120501[管理学-图书馆学] 1201[管理学-管理科学与工程(可授管理学、工学学位)] 081104[工学-模式识别与智能系统] 08[工学] 0835[工学-软件工程] 120502[管理学-情报学] 0811[工学-控制科学与工程] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

主  题:LIBRIS Dewey Decimal Classification Automatic classification Machine learning Support Vector Machine Multinomial Naive Bayes Simple linear network Standard neural network 1D convolutional neural network Recurrent neural network Word embeddings String matching 

摘      要:Purpose:With more and more digital collections of various information resources becoming available,also increasing is the challenge of assigning subject index terms and classes from quality knowledge organization *** the ultimate purpose is to understand the value of automatically produced Dewey Decimal Classification(DDC)classes for Swedish digital collections,the paper aims to evaluate the performance of six machine learning algorithms as well as a string-matching algorithm based on characteristics of ***/methodology/approach:State-of-the-art machine learning algorithms require at least 1,000 training examples per *** complete data set at the time of research involved 143,838 records which had to be reduced to top three hierarchical levels of DDC in order to provide sufficient training data(totaling 802 classes in the training and testing sample,out of 14,413 classes at all levels).Findings:Evaluation shows that Support Vector Machine with linear kernel outperforms other machine learning algorithms as well as the string-matching algorithm on average;the string-matching algorithm outperforms machine learning for specific classes when characteristics of DDC are most suitable for the *** embeddings combined with different types of neural networks(simple linear network,standard neural network,1 D convolutional neural network,and recurrent neural network)produced worse results than Support Vector Machine,but reach close results,with the benefit of a smaller representation *** of features in machine learning shows that using keywords or combining titles and keywords gives better results than using only titles as *** only marginally improves the *** stop-words reduced accuracy in most cases,while removing less frequent words increased it *** greatest impact is produced by the number of training examples:81.90%accuracy on the training set is achieved when at least 1,000 records per class are available

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分