Comparative Study of Variable Selection Using Genetic Algorithm with Various Types of Chromosomes
Comparative Study of Variable Selection Using Genetic Algorithm with Various Types of Chromosomes作者机构:School of Chemistry and Pharmaceutical EngineeringSichuan University of Science & Engineering College of Materials Science and EngineeringChongqing University College of BioengineeringChongqing University
出 版 物:《Chinese Journal of Structural Chemistry》 (结构化学(英文))
年 卷 期:2010年第29卷第9期
页 面:1431-1437页
核心收录:
学科分类:0710[理学-生物学] 07[理学] 071007[理学-遗传学] 0703[理学-化学]
基 金:supported by Youth Foundation of the Education Department of Sichuan Province (No.09ZB038)
主 题:support vector regression genetic algorithm variable selection quantitative structure activity relationship multiple linear regression
摘 要:In this study,different methods of variable selection using the multilinear step-wise regression(MLR) and support vector regression(SVR) have been compared when the performance of genetic algorithms(GAs) using various types of chromosomes is used.The first method is a GA with binary chromosome(GA-BC) and the other is a GA with a fixed-length character chromosome(GA-FCC).The overall prediction accuracy for the training set by means of 7-fold cross-validation was tested.All the regression models were evaluated by the test set.The poor prediction for the test set illustrates that the forward stepwise regression(FSR) model is easier to overfit for the training set.The results using SVR methods showed that the over-fitting could be overcome.Further,the over-fitting would be easier for the GA-BC-SVR method because too many variables fleetly induced into the model.The final optimal model was obtained with good predictive ability(R2 = 0.885,S = 0.469,Rcv2 = 0.700,Scv = 0.757,Rex2 = 0.692,Sex = 0.675) using GA-FCC-SVR method.Our investigation indicates the variable selection method using GA-FCC is the most appropriate for MLR and SVR methods.