Dealing with the Data Imbalance Problem in Pulsar Candidate Sifting Based on Feature Selection
作者机构:School of Mathematics and StatisticsHanshan Normal UniversityChaozhou 521000China School of Computer ScienceSouth China Normal UniversityGuangzhou 510631China
出 版 物:《Research in Astronomy and Astrophysics》 (天文和天体物理学研究(英文版))
年 卷 期:2024年第24卷第2期
页 面:125-137页
核心收录:
学科分类:07[理学] 070401[理学-天体物理] 0704[理学-天文学]
基 金:support from the National Natural Science Foundation of China(NSFC,grant Nos.11973022 and 12373108) the Natural Science Foundation of Guangdong Province(No.2020A1515010710) Hanshan Normal University Startup Foundation for Doctor Scientific Research(No.QD202129)
主 题:methods data analysis-(stars:)pulsars general-methods statistical
摘 要:Pulsar detection has become an active research topic in radio astronomy *** of the essential procedures for pulsar detection is pulsar candidate sifting(PCS),a procedure for identifying potential pulsar signals in a ***,pulsar candidates are always class-imbalanced,as most candidates are non-pulsars such as RFI and only a tiny part of them are from real *** imbalance can greatly affect the performance of machine learning(ML)models,resulting in a heavy cost as some real pulsars are *** deal with the problem,techniques of choosing relevant features to discriminate pulsars from non-pulsars are focused on,which is known as feature *** selection is a process of selecting a subset of the most relevant features from a feature *** distinguishing features between pulsars and non-pulsars can significantly improve the performance of the classifier even if the data are highly *** this work,an algorithm for feature selection called the K-fold Relief-Greedy(KFRG)algorithm is *** is a two-stage *** the first stage,it filters out some irrelevant features according to their K-fold Relief scores,while in the second stage,it removes the redundant features and selects the most relevant features by a forward greedy search *** on the data set of the High Time Resolution Universe survey verified that ML models based on KFRG are capable of PCS,correctly separating pulsars from non-pulsars even if the candidates are highly class-imbalanced.