Probabilistic outlier detection for sparse multivariate geotechnical site investigation data using Bayesian learning
为用贝叶斯的学习的稀少的 multivariate geotechnical 地点调查数据的概率的孤立点察觉作者机构:State Key Laboratory of Water Resources and Hydropower Engineering ScienceInstitute of Engineering Risk and Disaster PreventionWuhan University299 Bayi RoadWuhan 430072China Department of Civil and Environmental EngineeringNational University of SingaporeBlk E1A#07-031 Engineering Drive 2Singapore 117576Singapore
出 版 物:《Geoscience Frontiers》 (地学前缘(英文版))
年 卷 期:2021年第12卷第1期
页 面:425-439页
核心收录:
学科分类:081401[工学-岩土工程] 08[工学] 0813[工学-建筑学] 0814[工学-土木工程] 081301[工学-建筑历史与理论]
基 金:supported by the National Key R&D Program of China(Project No.2016YFC0800200) the NRF-NSFC 3rd Joint Research Grant(Earth Science)(Project No.41861144022) the National Natural Science Foundation of China(Project Nos.51679174,and 51779189) the Shenzhen Key Technology R&D Program(Project No.20170324) The financial support is grateful acknowledged
主 题:Outlier detection Site investigation Sparse multivariate data Mahalanobis distance Resampling by half-means Bayesian machine learning
摘 要:Various uncertainties arising during acquisition process of geoscience data may result in anomalous data instances(i.e.,outliers)that do not conform with the expected pattern of regular data *** sparse multivariate data obtained from geotechnical site investigation,it is impossible to identify outliers with certainty due to the distortion of statistics of geotechnical parameters caused by outliers and their associated statistical uncertainty resulted from data *** paper develops a probabilistic outlier detection method for sparse multivariate data obtained from geotechnical site *** proposed approach quantifies the outlying probability of each data instance based on Mahalanobis distance and determines outliers as those data instances with outlying probabilities greater than *** tackles the distortion issue of statistics estimated from the dataset with outliers by a re-sampling technique and accounts,rationally,for the statistical uncertainty by Bayesian machine ***,the proposed approach also suggests an exclusive method to determine outlying components of each *** proposed approach is illustrated and verified using simulated and real-life *** showed that the proposed approach properly identifies outliers among sparse multivariate data and their corresponding outlying components in a probabilistic *** can significantly reduce the masking effect(i.e.,missing some actual outliers due to the distortion of statistics by the outliers and statistical uncertainty).It also found that outliers among sparse multivariate data instances affect significantly the construction of multivariate distribution of geotechnical parameters for uncertainty *** emphasizes the necessity of data cleaning process(e.g.,outlier detection)for uncertainty quantification based on geoscience data.