The quest for conditional independence in prospectivity modeling: weights-of-evidence, boost weights-of-evidence, and logistic regression
The quest for conditional independence in prospectivity modeling: weights-of-evidence, boost weights-of-evidence, and logistic regression作者机构:Department of Geophysics and Geoinformatics TU Bergakademie Freiberg Freiberg 09596 Germany
出 版 物:《Frontiers of Earth Science》 (地球科学前沿(英文版))
年 卷 期:2016年第10卷第3期
页 面:389-408页
核心收录:
学科分类:02[经济学] 0202[经济学-应用经济学] 020208[经济学-统计学] 081803[工学-地质工程] 07[理学] 08[工学] 0818[工学-地质资源与地质工程] 0714[理学-统计学(可授理学、经济学学位)] 070103[理学-概率论与数理统计] 0701[理学-数学]
主 题:general weights of evidence joint conditionalindependence naive Bayes model Hammersley-Cliffordtheorem interaction terms statistical significance
摘 要:The objective of prospectivity modeling is prediction of the conditional probability of the presence T = 1 or absence T = 0 of a target T given favorable or prohibitive predictors B, or construction of a two classes {0,1} classification of T. A special case of logistic regression called weights-of-evidence (WofE) is geolo- gists' favorite method of prospectivity modeling due to its apparent simplicity. However, the numerical simplicity is deceiving as it is implied by the severe mathematical modeling assumption of joint conditional independence of all predictors given the target. General weights of evidence are explicitly introduced which are as simple to estimate as conventional weights, i.e., by counting, but do not require conditional independence. Complementary to the regres- sion view is the classification view on prospectivity modeling. Boosting is the construction of a strong classifier from a set of weak classifiers. From the regression point of view it is closely related to logistic regression. Boost weights-of-evidence (BoostWofE) was introduced into prospectivity modeling to counterbalance violations of the assumption of conditional independence even though relaxation of modeling assumptions with respect to weak classifiers was not the (initial) purpose of boosting. In the original publication of BoostWofE a fabricated dataset was used to "validate" this approach. Using the same fabricated dataset it is shown that BoostWofE cannot generally compensate lacking condi- tional independence whatever the consecutively proces- sing order of predictors. Thus the alleged features of BoostWofE are disproved by way of counterexamples, while theoretical findings are confirmed that logistic regression including interaction terms can exactly com- pensate violations of joint conditional independence if the predictors are indicators.