The quest for conditional independence in prospectivity modeling: weights-of-evidence, boost weights-of-evidence, and logistic regression

Helmut SCHAEBEN, Georg SEMMLER

PDF(1655 KB)
PDF(1655 KB)
Front. Earth Sci. ›› 2016, Vol. 10 ›› Issue (3) : 389-408. DOI: 10.1007/s11707-016-0595-y
RESEARCH ARTICLE
RESEARCH ARTICLE

The quest for conditional independence in prospectivity modeling: weights-of-evidence, boost weights-of-evidence, and logistic regression

Author information +
History +

Abstract

The objective of prospectivity modeling is prediction of the conditional probability of the presence T=1 or absence T=0 of a target T given favorable or prohibitive predictors B, or construction of a two classes {0,1} classification of T. A special case of logistic regression called weights-of-evidence (WofE) is geologists’ favorite method of prospectivity modeling due to its apparent simplicity. However, the numerical simplicity is deceiving as it is implied by the severe mathematical modeling assumption of joint conditional independence of all predictors given the target. General weights of evidence are explicitly introduced which are as simple to estimate as conventional weights, i.e., by counting, but do not require conditional independence. Complementary to the regression view is the classification view on prospectivity modeling. Boosting is the construction of a strong classifier from a set of weak classifiers. From the regression point of view it is closely related to logistic regression. Boost weights-of-evidence (BoostWofE) was introduced into prospectivity modeling to counterbalance violations of the assumption of conditional independence even though relaxation of modeling assumptions with respect to weak classifiers was not the (initial) purpose of boosting. In the original publication of BoostWofE a fabricated dataset was used to “validate” this approach. Using the same fabricated dataset it is shown that BoostWofE cannot generally compensate lacking conditional independence whatever the consecutively processing order of predictors. Thus the alleged features of BoostWofE are disproved by way of counterexamples, while theoretical findings are confirmed that logistic regression including interaction terms can exactly compensate violations of joint conditional independence if the predictors are indicators.

Keywords

general weights of evidence / joint conditional independence / naïve Bayes model / Hammersley–Clifford theorem / interaction terms / statistical significance

Cite this article

Download citation ▾
Helmut SCHAEBEN, Georg SEMMLER. The quest for conditional independence in prospectivity modeling: weights-of-evidence, boost weights-of-evidence, and logistic regression. Front. Earth Sci., 2016, 10(3): 389‒408 https://doi.org/10.1007/s11707-016-0595-y

References

[1]
Agterberg F P (2014). Geomathematics: Theoretical Foundations, Applications and Future Developments. Cham, Heidelberg, New York, Dordrecht, London: Springer
[2]
Agterberg F P, Bonham-Carter G F, Wright D F (1990). Statistical pattern integration for mineral exploration. In: Gaál G, Merriam D F, eds. Computer Applications in Resource Estimation Prediction and Assessment for Metals and Petroleum. Oxford, New York: Pergamon Press, 1–21
[3]
Agterberg F P, Cheng Q (2002). Conditional independence test for weights-of-evidence modeling. Nat Resour Res, 11(4): 249–255
CrossRef Google scholar
[4]
Berkson J (1944). Application of the logistic function to bio-assay. J Am Stat Assoc, 39(227): 357–365
[5]
Bonham-Carter G (1994). Geographic Information Systems for Geoscientists: Modeling with GIS. New York: Pergamon, Elsevier Science
[6]
Butz C J, Sanscartier M J (2002). Properties of weak conditional independence. In: Alpigini J J, Peters J F, Skowron A, Zhong N, eds. Rough Sets and Current Trends in Computing, Lecture Notes in Computer Science (Volume 2475). Berlin, Heidelberg: Springer, 349–356www2.cs.uregina.ca/ butz/publications/properties.ps.gz
[7]
Chalak K, White H (2012). Causality, conditional independence, and graphical separation in settable systems. Neural Comput, 24(7): 1611–1668
CrossRef Google scholar
[8]
Cheng Q (2012). Application of a newly developed boost weights of evidence model (BoostWofE) for mineral resources quantitative assessments. Journal of Jilin University, Earth Sci Ed, 42(6): 1976–1985
[9]
Cheng Q (2015). BoostWofE: a new sequential weights of evidence model reducing the effect of conditional dependency. Math Geosci, 47(5): 591–621
CrossRef Google scholar
[10]
Chilès J P, Delfiner P (2012). Geostatistics- Modeling Spatial Uncertainty (2nd ed). New York, Chichester, Weinheim, Brisbane, Singapore, Toronto: John Wiley & Sons
[12]
Dawid A P (1979). Conditional independence in statistical theory. J R Stat Soc, B, 41(1): 1–31
[13]
Dawid A P (2004). Probability, causality and the empirical world: a Bayes-de Finetti-Popper-Borel synthesis. Stat Sci, 19(1): 44–57
CrossRef Google scholar
[14]
Dawid A P (2007). Fundamentals of Statistical Causality. Research Report 279, Department of Statistical Science, University College London ESRI, ArcGIS. http://www.esri.com/software/arcgis
[15]
Ford A, Miller J M, Mol A G (2016). A comparative analysis of weights of evidence, evidential belief functions, and fuzzy logic for mineral potential mapping using incomplete data at the scale of investigation. Nat Resour Res, 25(1): 19–33
CrossRef Google scholar
[16]
Freund Y, Schapire R E (1997). A decision theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci, 55(1): 119–139
CrossRef Google scholar
[17]
Freund Y, Schapire R E (1999). A short introduction to boosting. Jinko Chino Gakkaishi, 14(5): 771–780
[18]
Friedman J, Hastie T, Tibshirani R (2000). Additive logistic regression: a statistical view of boosting. Ann Stat, 28(2): 337–407
CrossRef Google scholar
[19]
Good I J (1950). Probability and the Weighing of Evidence. London: Griffin
[20]
Good I J (1960). Weight of evidence, corroboration, explanatory power, information and the utility of experiments. J R Stat Soc, B, 22(2): 319–331
[21]
Good I J (1968). The Estimation of Probabilities: An Essay on Modern Bayesian Methods. MIT Research Monograph No. 30, The MIT Press, Cambridge, MA, 109
[22]
Harris D P, Pan G C (1999). Mineral favorability mapping: a comparison of artificial neural networks, logistic regression and discriminant analysis. Nat Resour Res, 8(2): 93–109
CrossRef Google scholar
[23]
Harris D P, Zurcher L, Stanley M, Marlow J, Pan G (2003). A comparative analysis of favorability mappings by weights of evidence, probabilistic neural networks, discriminant analysis, and logistic regression. Nat Resour Res, 12(4): 241–255
CrossRef Google scholar
[24]
Hastie T, Tibshirani R, Friedman J (2009). The Elements of Statistical Learning (2nd ed). New York: Springer
[25]
Hosmer D W, Lemeshow S, Sturdivant R X (2013). Applied Logistic Regression (3rd ed). Hoboken, NJ: Wiley & Sons
[26]
Journel A G (2002). Combining knowledge from diverse sources: an alternative to traditional data independence hypotheses. Math Geol, 34(5): 573–596
CrossRef Google scholar
[27]
Kreuzer O, Porwal A, eds. (2010). Special Issue “Mineral Prospectivity Analysis and Quantitative Resource Estimation”. Ore Geol Rev, 38(3): 121–304
CrossRef Google scholar
[28]
Krishnan S (2008). The τ-model for data redundancy and information combination in Earth sciences: theory and application. Math Geol, 40(6): 705–727
[29]
Minsky M, Selfridge O G (1961). Learning in random nets. In: Cherry C, ed. 4th London Symposium on Information Theory. London: Butterworths, 335–347
[30]
Pearl J (2009). Causality: Models, Reasoning, and Inference. 2nd ed.New York: Cambridge University Press
[31]
Polyakova E I, Journel A G (2007). The ν. Math Geol, 39(8): 715–733
CrossRef Google scholar
[32]
Porwal A, Carranza E J M (2015). Introduction to the Special Issue: GIS-based mineral potential modelling and geological data analyses for mineral exploration. Ore Geol Rev, 71: 477–483
CrossRef Google scholar
[33]
Porwal A, González-Álvarez I, Markwitz V, McCuaig T C, Mamuse A (2010). Weights of evidence and logistic regression modeling of magmatic nickel sulfide prospectivity in the Yilgarn Craton, Western Australia. Ore Geol Rev, 38(3): 184–196
CrossRef Google scholar
[34]
Reed L J, Berkson J (1929). The application of the logistic function to experimental data. J Phys Chem, 33(5): 760–779
CrossRef Google scholar
[35]
Rodriguez-Galiano V, Sanchez-Castillo M, Chica-Olmo M, Chica-Rivas M (2015). Machine learning predictive models for mineral prospectivity: an evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol Rev, 71: 804–818
CrossRef Google scholar
[36]
Schaeben H (2014a). Targeting: logistic regression, special cases and extensions. ISPRS Int J Geoinf, 3(4): 1387–1411Available at: http://www.mdpi.com/2220-9964/3/4/1387
CrossRef Google scholar
[37]
Schaeben H (2014b). Potential modeling: conditional independence matters. GEM-International Journal on Geomathematics, 5(1): 99–116
CrossRef Google scholar
[38]
Schaeben H (2014c). A mathematical view of weights-of-evidence, conditional independence, and logistic regression in terms of Markov random fields. Math Geosci, 46(6): 691–709
CrossRef Google scholar
[39]
Šochman J, Matas J (2004). Adaboost with totally corrective updates for fast face detection. In: Proc. 6th IEEE International Conference on Automatic Face and Gesture Recognition, Seoul, South Korea, 445–450
[40]
Suppes P (1970). A Probabilistic Theory of Causality. Amsterdam: North-Holland
[41]
Tolosana-Delgado R, van den Boogaart K G, Schaeben H (2014). Potential mapping from geochemical surveys using a Cox process. 10th Conference on Geostatistics for Environmental Applications, Paris, July 9–11, 2014
[42]
van den Boogaart K G, Schaeben H (2012). Mineral potential mapping using Cox–type regression for marked point processes. 34th IGC Brisbane, Australia
[43]
Wong M S K M , Butz C J (1999). Contextual weak independence in Bayesian networks. In: Proc. 15th Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, 670–679

Acknowledgments

The authors would like to thank two anonymous reviewers for their thorough and constructive efforts to help us improve our manuscript. The authors gratefully acknowledge financial funding by the German Federal Ministry for Economic Affairs and Energy (BMWi) within the frame of “Zentrales Innovationsprogramm Mittelstand” (ZIM) on Entwicklung eines Verfahrens zur dreidimensionalen Prognose von verdeckten Rohstofflagerstätten am Beispiel des Erzgebirges. Last but not least the authors greatly appreciate H. Konstanze Zschoke’s, MSc Geophysics, painstaking effort to convert our manuscript from the high-quality typesetting system LaTeX into the word processor MS Word.

RIGHTS & PERMISSIONS

2016 Higher Education Press and Springer-Verlag Berlin Heidelberg
AI Summary AI Mindmap
PDF(1655 KB)

Accesses

Citations

Detail

Sections
Recommended

/