Data Cleaning About Student Information Based on Massive Open Online Course System
作者机构:Harbin Institute of TechnologyHarbinChina
出 版 物:《国际计算机前沿大会会议论文集》 (International Conference of Pioneering Computer Scientists, Engineers and Educators(ICPCSEE))
年 卷 期:2020年第1期
页 面:33-43页
主 题:MOOC Data cleaning Time series Intermittent missing Dimension reduction
摘 要:Recently,Massive Open Online Courses(MOOCs)is a major way of online learning for millions of people around the world,which generates a large amount of data in the ***,due to errors produced from collecting,system,and so on,these data have various inconsistencies and missing *** order to support accurate analysis,this paper studies the data cleaning technology for online open curriculum system,including missing value-time filling for time series,and rulebased input error *** data cleaning algorithm designed in this paper is divided into six parts:pre-processing,missing data processing,format and content error processing,logical error processing,irrelevant data processing and correlation *** paper designs and implements missing-value-filling algorithm based on time series in the missing data processing *** to the large number of descriptive variables existing in the format and content error processing module,it proposed one-based and separability-based criteria Hot+J3+*** online course data cleaning algorithm was analyzed in detail on algorithm design,implementation and *** a lot of rigorous testing,the function of each module performs normally,and the cleaning performance of the algorithm is of expectation.