Chi-Square and PCA Based Feature Selection for Diabetes Detection with Ensemble Classifier
作者机构:School of Computing and Information SciencesFlorida International UniversityUSA Department of Computer ScienceKhwaja Fareed University of Engineering and Information TechnologyRahim Yar Khan64200Pakistan Department of Computer ScienceBroward CollegeBroward CountyFloridaUSA Department of Information and Communication EngineeringYeungnam UniversityGyeongsan-si38541Korea
出 版 物:《Intelligent Automation & Soft Computing》 (智能自动化与软计算(英文))
年 卷 期:2023年第36卷第5期
页 面:1931-1949页
核心收录:
学科分类:1002[医学-临床医学] 100201[医学-内科学(含:心血管病、血液病、呼吸系病、消化系病、内分泌与代谢病、肾病、风湿病、传染病)] 10[医学]
主 题:Diabetes mellitus prediction feature fusion ensemble classifier principal component analysis chi-square
摘 要:Diabetes mellitus is a metabolic disease that is ranked among the top 10 causes of death by the world health *** the last few years,an alarming increase is observed worldwide with a 70%rise in the disease since 2000 and an 80%rise in male *** untreated,it results in complications of many vital organs of the human body which may lead to *** detection of diabetes is a task of significant importance to start timely *** study introduces a methodology for the classification of diabetic and normal people using an ensemble machine learning model and feature fusion of Chi-square and principal component *** ensemble model,logistic tree classifier(LTC),is proposed which incorporates logistic regression and extra tree classifier through a soft voting *** are also performed using several well-known machine learning algorithms to analyze their performance including logistic regression,extra tree classifier,AdaBoost,Gaussian naive Bayes,decision tree,random forest,and k nearest *** addition,several experiments are carried out using principal component analysis(PCA)and Chi-square(Chi-2)fea-tures to analyze the influence of feature selection on the performance of machine learning classifi*** indicate that Chi-2 features show high performance than both PCA features and original ***,the highest accuracy is obtained when the proposed ensemble model LTC is used with the proposed fea-ture fusion framework-work which achieves a 0.85 accuracy score which is the highest of the available approaches for diabetes *** addition,the statis-tical T-test proves the statistical significance of the proposed approach over other approaches.