The quest for conditional independence in prospectivity modeling: weights-of-evidence, boost weights-of-evidence, and logistic regression

Helmut SCHAEBEN; Georg SEMMLER

doi:10.1007/s11707-016-0595-y

Front. Earth Sci. ›› 2016, Vol. 10 ›› Issue (3) :389 -408. DOI: 10.1007/s11707-016-0595-y

RESEARCH ARTICLE

The quest for conditional independence in prospectivity modeling: weights-of-evidence, boost weights-of-evidence, and logistic regression

Author information +

History +

PDF (1655KB)

Abstract

The objective of prospectivity modeling is prediction of the conditional probability of the presence $T = 1$ or absence $T = 0$ of a target $T$ given favorable or prohibitive predictors $B$ , or construction of a two classes ${0, 1}$ classification of $T$ . A special case of logistic regression called weights-of-evidence (WofE) is geologists’ favorite method of prospectivity modeling due to its apparent simplicity. However, the numerical simplicity is deceiving as it is implied by the severe mathematical modeling assumption of joint conditional independence of all predictors given the target. General weights of evidence are explicitly introduced which are as simple to estimate as conventional weights, i.e., by counting, but do not require conditional independence. Complementary to the regression view is the classification view on prospectivity modeling. Boosting is the construction of a strong classifier from a set of weak classifiers. From the regression point of view it is closely related to logistic regression. Boost weights-of-evidence (BoostWofE) was introduced into prospectivity modeling to counterbalance violations of the assumption of conditional independence even though relaxation of modeling assumptions with respect to weak classifiers was not the (initial) purpose of boosting. In the original publication of BoostWofE a fabricated dataset was used to “validate” this approach. Using the same fabricated dataset it is shown that BoostWofE cannot generally compensate lacking conditional independence whatever the consecutively processing order of predictors. Thus the alleged features of BoostWofE are disproved by way of counterexamples, while theoretical findings are confirmed that logistic regression including interaction terms can exactly compensate violations of joint conditional independence if the predictors are indicators.

Keywords

general weights of evidence / joint conditional independence / naïve Bayes model / Hammersley–Clifford theorem / interaction terms / statistical significance

Cite this article

Download citation ▾

Helmut SCHAEBEN, Georg SEMMLER. The quest for conditional independence in prospectivity modeling: weights-of-evidence, boost weights-of-evidence, and logistic regression. Front. Earth Sci., 2016, 10(3): 389-408 DOI:10.1007/s11707-016-0595-y

登录浏览全文

4963

注册一个新账户忘记密码

Introduction

The objective of prospectivity modeling is to identify locations (pixels, voxels)

x ⊂ D

in some domain of definition

D

for which the conditional probability

P (T (x) = 1 | B (x))

of the presence,

T (x) = 1

, of a well defined target type of ore mineralization, given favorable or prohibitive factors

B (x)

is a relative maximum. Of course, the major prerequisite for such predictions is a proper conceptual model of the specified ore mineralization. A proper conceptual model may be turned into a regression-type model using the factors as spatially referenced predictors. Generally, a model considers the predictor

B (x) = (B 0 (x), B 1 (x), …, B m (x)) T

, with

B 0 (x) ≡ 1

for all

x ⊂ D

, and assigns a parameter

θ = (θ 0, …, θ m) T

, which quantifies by means of a link function

L

the extent of dependence of the conditional probability

P (T (x) = 1 | B (x))

on the predictors, i.e.,

(1)

L P (T (x) = 1 | B (x)) = B (x) T θ .

The target

T (x)

as well as the predictor

B (x)

refer to locations

x ⊂ D

with areal or volumetric extent, pixels or voxels, which are assumed to provide their physical support. Once computed, predicted conditional probabilities and the associated prediction errors, respectively, will be assigned to them as additional properties.

Recent surveys of mathematical models and numerical methods for prospectivity modeling have been compiled in two special issues, Mineral prospectivity analysis and quantitative resource estimation of Ore Geology Reviews 38(3), 121–304, guest edited by Kreuzer and Porwal (2010), and GIS-based mineral potential modelling and geological data analyses for mineral exploration of Ore Geology Reviews 71, 477–881, guest edited by Porwal and Carranza (2015). Agterberg (2014) devotes a major chapter to this topic. ArcGIS by ESRI features ArcSDM providing tools for prospectivity modeling. Despite its ubiquity in geological prospection, the mathematical assumptions to authorize an approach do not seem to be well communicated. Of special concern is the role of conditional independence which is not yet another version of stochastic independence but a concept on its own.

Moreover, a preference to publish case studies of prospectivity modeling applying “novel” procedures or “novel” variants can be observed. Whatever new method of prediction or classification has been developed in statistics, fuzzy logic or machine learning, it is being applied to case studies of prospectivity modeling, cf. the references of ( Kreuzer and Porwal, 2010; Porwal and Carranza, 2015). Subsequently, these case studies give rise to plenty empirical comparisons, for instance ( Harris and Pan, 1999; Harris et al., 2003; Porwal et al., 2010; Rodriguez-Galiano et al., 2015; Ford et al., 2016) which often conclude superior or inferior performance in one way or another. Clarification of the origins and considerations of the mutual relationships of various methods may render some differences in the results of their practical applications less surprising than others. Eventually, mathematical models and corresponding methods cannot be validated, their properties cannot be derived by way of case studies.

As a result, the practitioner of prospectivity modeling often seems lost in a vast variety of procedures, especially when exposing the essentials of his/her method of choice. The threefold objective of this communication is (i) to clarify the mathematical relationship of logistic regression, weights-of-evidence, and boost weights-of-evidence, (ii) to disprove the major claims put forward by Cheng (2012, 2015) with respect to his boost weights-of-evidence by virtue of counterexamples including the fabricated training dataset provided by Cheng (2015), and (iii) to exemplify theoretical findings that interaction terms of logistic regression models compensate lack of conditional independence.

Boosting is the affirmative answer to the mathematical problem whether it is possible to construct a strong classifier (with superior properties) from a set of weak classifiers (with poor properties). Boosting was not meant to relax modeling assumptions possibly required by classifiers. Boost weights-of-evidence (BoostWofE) ( Cheng, 2015) is the most recent attempt to relax the mathematical modeling assumption of weights-of-evidence ( Good, 1950, 1960, 1968; Minsky and Selfridge, 1961; Agterberg et al., 1990; Bonham-Carter, 1994; Schaeben, 2014c) which is joint conditional independence of all predictors given the target. Applying weights-of-evidence despite lacking conditional independence corrupts not only the predicted conditional probabilities but also their rank transforms, and thus the spatial pattern of prospectivity ( Schaeben, 2014a). Boosted weights of evidence differ from ordinary weights by additive terms ( Cheng, 2015, Eqs. (26) and (27), p. 602–603), introduced as approximations of conditional weights ( Cheng, 2015, Eqs. (11) and (12), p. 597) taken from the

ν

-model ( Polyakova and Journel, 2007) without citing it. For reasons of completeness the

ν

-model is recalled in the Appendix A.

Weights-of-evidence is an application of Bayes theorem for several variables, and the special case of logistic regression if the predictors

B

are nominal (categorical) and jointly conditional independent given the target

T

( Schaeben, 2014b). In turn, logistic regression ( Reed and Berkson, 1929; Berkson, 1944; Hosmer et al., 2013) is the canonical generalization of Bayesian weights-of-evidence allowing for deviations from conditional independence of a restricted form. Applying Hammersley–Clifford theorem, it was proven that logistic regression including interaction terms corresponding to violations of conditional independence compensates this lack exactly and is optimum, i.e., recovers the true conditional probability, if the joint distribution of predictors and target is of log-linear form ( Schaeben, 2014c).

Weights-of-evidence, novel boost weights-of-evidence, and classical logistic regression are discussed in basic mathematical terms, and then applied to the fabricated training dataset used by Cheng (2015) for the purpose of empirical comparison. To gain additional insight, the methods are also applied to the fabricated training datasets RANKIT used in earlier communications, e.g. ( Schaeben, 2014a, b).

Fundamentals: stochastic independence, conditional independence

Definition

For a set

{0, …, m}

of indexes, the

⊗

-product denotes both the product of random variables

Z ℓ

defined as

⊗ ℓ = 0 m Z ℓ = (Z 0, ..., Z m)

, and the product of their probability measures

P Z ℓ

. If the random variables

Z ℓ, ℓ = 0, …, m

, are independent, then the joint probability of any subset of random variables

Z ℓ

can be factorized into the product of the individual probabilities, i.e.,

(2)

P ⊗ ℓ ∈ M Z ℓ = ⊗ ℓ ∈ M P Z ℓ,

where

M

denotes any non-empty subset of the set

{0, …, m}

. In particular

P Z = P ⊗ ℓ = 0 m Z ℓ = ⊗ ℓ = 0 m P Z ℓ .

If the random variables

Z ℓ, ℓ = 1, …, m

, are conditionally independent given

Z 0

, then the joint conditional probability of any subset of random variables

Z ℓ

given

Z 0

can be factorized into the product of the individual conditional probabilities, i.e.,

(3)

P ⊗ ℓ ∈ M Z ℓ | Z 0 = ⊗ ℓ ∈ M P Z ℓ | Z 0,

and in particular

P ⊗ ℓ = 1 m Z ℓ | Z 0 = ⊗ ℓ = 1 m P Z ℓ | Z 0 .

Given

Z 0

, the observable variables

Z ℓ, ℓ ∈ M

, become independent. In practice, often the interpretation of

Z 0

as common cause for

Z ℓ, ℓ ∈ M

, applies. Thus, conditional independence is a probabilistic approach to causality ( Suppes, 1970; Dawid, 1979, 2004, 2007; Pearl, 2009; Chalak and White, 2012) while correlation, i.e., linear dependence, is not. Correlated random variables may be conditionally independent or not, conditionally independent random variables may be (significantly) correlated or not. Independence does not imply conditional independence and vice versa; pairwise conditional independence does not imply joint conditional independence.

Weak conditional independence was introduced by Wong and Butz (1999), and elaborated on by Butz and Sanscartier (2002). The definition of weak conditional independence by Cheng (2015) is irrelevant at this time as a statistical significance test is not provided.

Testing conditional independence

If the predictor variables

B ℓ, ℓ = 1, …, m

, and the target variable

T

are indicator variables, i.e., binary, then the joint probability is of log-linear form. If the predictor variables are jointly conditionally independent given the target variable, then by virtue of the Hammersley–Clifford theorem the log–linear model factorized into terms corresponding to the target variable

T

, the individual predictor variables

B ℓ

, and the individual products

T ⊗ B ℓ, ℓ = 1, …, m

, is sufficiently large to represent the joint probability ( Schaeben, 2014c). The sufficient size of such a factorized form of an appropriate log–linear model can be turned into the null-hypothesis of a statistical significance test. Thus, if the likelihood ratio test of this null-hypothesis leads to its possible rejection, then the assumption of joint conditional independence can be rejected, too.

Logistic regression, weights-of-evidence, boost weights-of-evidence

To fit a model, i.e., to estimate the model parameters

θ

of model ansatz Eq. (1), data within a training region are required. However, in contrast to geostatistics (Chilès and Delfiner, 2012), and to their major detriment, the methods of prospectivity modeling considered here do not consider spatially induced dependencies between the target and the predictors nor between predictor variables themselves. Alternatives have been suggested by van den Boogaart and Schaeben (2012) and Tolosana-Delgado et al. (2014).

A proper definition of the notion of spatial association does not exist in the realm of potential mapping or prospectivity modeling. In fact, the classical assumption of independently identically distributed random variables applies, distributions do not depend on location. Therefore, any spatial reference can be dropped, and models of the form

(4)

L P (T = 1 | B) = B T θ,

are considered, only. Instead of an illusive spatial association, the ordinary correlation matrix may provide some instructive information how to choose the predictors of a proper regression model.

Logistic regression

Conditional expectation of a binary random target variable

T

given a

(m + 1)

–variate random predictor variable

B = (B 0, B 1, …, B m) T

with

B 0 ≡ 1

is equal to a conditional probability, i.e.,

E(T | B) = P (T = 1 | B) .

Neglecting the error term as often the ordinary logistic regression model (without interaction terms) of a logit-transformed conditional probability in terms of a linear combination of predictors reads

(5)

logit ⁢ P (T = 1 | B) = β 0 + ∑ ℓ β ℓ B ℓ,

which can be rewritten in terms of a conditional probability as

(6)

P (T = 1 | B) = Λ (β 0 + ∑ ℓ β ℓ B ℓ),

as the logistic function denoted

Λ

is the inverse of the logit-transform.

If the joint probability of the indicator target variable and the predictor variables is of log–linear form and all predictor variables are conditionally independent given the target variable, then the conditional probability of the target variable given the predictors is of the form of Eq. (6) of the ordinary logistic regression model. Thus, in this case the ordinary logistic regression model is optimum. In particular, it is optimum if the predictor variables are categorical or discrete and jointly conditionally independent given the target variable ( Schaeben, 2014a).

The logistic regression model with interaction terms reads in terms of a logit

logit ⁢ P (T = 1 | B) = β 0 + ∑ ℓ β ℓ B ℓ + ∑ ℓ i, …, ℓ j β ℓ i, …, ℓ j B ℓ i … B ℓ j)

and in terms of a probability

(7)

P (T = 1 | B) = Λ (β 0 + ∑ ℓ β ℓ B ℓ + ∑ ℓ i, …, ℓ j β ℓ i, …, ℓ j B ℓ i … B ℓ j)) .

If the joint probability of the indicator target variable and the predictor variables is of log–linear form including interaction terms corresponding to lacking conditionally independence given the target variable, then the conditional probability of the target variable given the predictors is of the form of Eq. (7) of the logistic regression model enlarged by interaction terms. Thus, in this case the augmented logistic regression model is optimum. In particular, for categorical or discrete predictor variables, interaction terms can compensate any lack of conditional independence exactly, i.e., logistic regression with interaction terms is optimum in case of lacking conditional independence ( Schaeben, 2014a).

Given the sample

b ℓ, i, T i, i = 1, …, n, ℓ = 1, …, m

, the parameters of the logistic regression model are estimated with well established, well understood methods based on probability, and encoded in any major statistical software package applying (i) the method of maximum likelihood estimation numerically (ii) realized with Fisher scoring algorithm (a form of Newton-Raphson, a special case of iteratively reweighted least squares algorithm) ensuring nice statistical properties of the estimates like consistency, asymptotic normality, and efficiency of the estimates.

The properties of a fitted logistic regression model are assessed by the significance of the estimated regression parameters, and by Akaike information criterion (AIC)

AIC = − ln ⁡ (L) + 2 k,

where

L

is the maximized value of the likelihood function, and

k

be the total number of estimated parameters of the model. AIC provides a measure of the relative quality of a statistical model for a given training dataset. More measures to assess the fit of a logistic model including the receiver operating characteristic curve are discussed in (Hosmer et al., 2013). Despite a small AIC, a logistic regression model is discarded if at least one of its fitted parameters is not significant to prevent overfitting to the given training dataset and poor predicting performance for any other application dataset.

A fitted regression model is mathematically authorized for prediction if the model assumptions are satisfied, if it resembles the true relationship of the target and the predictors, and if their fitted regression parameters are significant. Large errors of predictions render a fitted model inappropriate for prediction in any case.

Weights-of-evidence

Weights-of-evidence is an application of Bayes theorem. Bayes theorem for several variables

B 0, B 1, …, B m

B 0 ≡ 1

, reads in terms of odds

O(T = 1 | B = B) = O (T = 1) ∏ ℓ = 1 m P (B ℓ | ⊗ j = 0 ℓ − 1 B j = (1, b 1, …, b ℓ − 1) ∧ T = 1) ∏ ℓ = 1 m P (B ℓ | ⊗ j = 0 ℓ − 1 B j = (1, b 1, …, b ℓ − 1) ∧ T = 0) = O (T = 1) ∏ ℓ = 1 m F ℓ

with Bayes factors ( Good, 1968)

(8)

F ℓ = P (B ℓ | ⊗ j = 0 ℓ − 1 B j = (1, b 1, …, b ℓ − 1) ∧ T = 1) P (B ℓ | ⊗ j = 0 ℓ − 1 B j = (1, b 1, …, b ℓ − 1) ∧ T = 0), ℓ = 1, …, m .

Applying the logit transform yields

(9)

logit P (T = 1 | B) = logit P (T = 1) + ∑ ℓ = 1 m ln ⁡ F ℓ .

General weights of evidence

Let

V ℓ, ℓ = 1, …, m

, denote the set of

ℓ

-variations of the set

{0, 1}

, and

v ℓ = 2 ℓ

its total number of elements. Let

Z ℓ = ⊗ j = 1 ℓ B j, ℓ = 1, …, m

. Given a realization

b k, ℓ ∈ V ℓ, k = 1, …, v ℓ

Z ℓ, ℓ = 1, …, m

F ℓ

is rewritten for

ℓ = 2, …, m

more detailed as

F ℓ = F (b k, ℓ) = P (Z ℓ = b k, ℓ ∧ T = 1) P (Z ℓ = b k, ℓ ∧ T = 0) P (Z ℓ − 1 = b k, ℓ − 1 ∧ T = 0) P (Z ℓ − 1 = b k, ℓ − 1 ∧ T = 1), k = 1, …, v ℓ,

where

b k, ℓ − 1

agrees with

b k, ℓ

in the first

(ℓ − 1)

entries. Then

F (Z ℓ) = ∑ k = 1 v ℓ F ℓ (b k, ℓ) I {B k, ℓ} (Z ℓ),

where

I {b k, ℓ}

denotes the indicator function with respect to the set

{b k, ℓ}

containing the vector

b k, ℓ

. Then

(10)

O (T = 1 | B) = O (T = 1) ∏ ℓ = 1 m ∑ k = 1 v ℓ F (b k, ℓ) I {B k, ℓ} (Z ℓ),

and

(11)

logit ⁢ P (T = 1 | B) = logit ⁢ P (T = 1) + ∑ ℓ = 1 m ∑ k = 1 v ℓ ln ⁡ F (b k, ℓ) I {b k, ℓ} (Z ℓ),

where

ln ⁡ F (b k, ℓ)

of Eq. (11) could be referred to as general weights of evidence, emphasizing that conditional independence is not required. Since the tacit assumption as usually is that the Bayes factors and their logarithms exist, i.e., that none of the involved probabilities is 0 or 1, they can be estimated elementarily by corresponding frequencies, i.e., by counting occurrences within a given training region.

Weights of evidence assuming conditional independence

Assuming joint conditional independence of all predictors

B 1, …, B m

given the target

T

simplifies

F ℓ

F ℓ CI = P (B ℓ | T = 1) P (B ℓ | T = 0), ℓ = 1, …, m,

and results in

(12)

logit ⁢ P (T = 1 | B) = logit ⁢ P (T = 1) + ∑ ℓ = 1 m ln ⁡ P (B ℓ | T = 1) P (B ℓ | T = 0), = logit ⁢ P (T = 1) + ∑ ℓ = 1 m W ℓ

with

(13)

W ℓ = ln ⁡ P (B ℓ | T = 1) P (B ℓ | T = 0),

provided that neither numerator nor denominator in the definition of

W ℓ

, Eq. (13), vanish. Eq. (12) tells us how to update and improve the unconditional

logit P (T = 1)

considering information provided by

B 1, …, B m

assuming their joint conditional independence given the target. Introducing

(14)

W ℓ (1) = ln ⁡ P (B ℓ = 1 | T = 1) P (B ℓ = 1 | T = 0), W ℓ (0) = ln ⁡ P (B ℓ = 0 | T = 1) P (B ℓ = 0 | T = 0) .

Eq. (12) can be rewritten in terms of predictor variables as

(15)

logit ⁢ P (T = 1 | B) = logit ⁢ P (T = 1) + ∑ ℓ = 1 m (W ℓ (1) B ℓ + W ℓ (0) (1 − B ℓ)) = logit ⁢ P (T = 1) + W (0) + ∑ ℓ = 1 m C ℓ B ℓ,

with contrasts

C ℓ = W ℓ (1) − W ℓ (0), ℓ = 1, …, m,

and

W (0) = ∑ ℓ = 1 m W ℓ (0)

. Besides less involved weights, the major difference between the two models, conventional weights-of-evidence assuming joint conditional independence, Eq. (12), and general weights-of-evidence, Eq. (11), is the presence of interaction terms, i.e., product terms of predictors, in the latter. Moreover, comparing Eq. (5) and Eq. (15) reveals that in case of joint conditional independence of all predictors given the target variable the regression coefficients simplify to

β 0 = logit ⁢ P (T = 1) + W (0), β ℓ = C ℓ, ℓ = 1, …, m,

( Schaeben, 2014a, b). Obviously the model parameters become independent of one another, and can be estimated by mere counting. This special case of a logistic regression model is usually referred to as the method of weights of evidence. Its practical application is restricted by the modeling assumption of joint conditional independence of all predictors given the target. The other way round, logistic regression is the canonical generalization of weights of evidence.

Boost weights-of-evidence (BoostWofE)

As with respect to weights-of-evidence, numerous attempts aim at relaxing its modeling assumption of joint conditional independence and subsequent corrections of the weights, e.g., the multiplicative

τ

- and the additive

ν

-correction of weights ( Journel, 2002; Polyakova and Journel, 2007; Krishnan, 2008).

BoostWofE has recently been introduced by Cheng (2015) as combining elements of weights-of-evidence and AdaBoost ( Freund and Schapire, 1997; Freund and Schapire, 1999; Hastie et al., 2009) to simplify estimation of the general weights of evidence, Eq. (11), and the additive

ν

correction term of Polyakova and Journel (2007), respectively ( Cheng, 2015, p. 597).

In general, boosting turns several weak learners into a single strong learner. For the two-class problem, boosting can be viewed as an approximation to additive modeling on the logistic scale using maximum Bernoulli likelihood as a criterion ( Friedman et al., 2000) used in logistic regression. AdaBoost is a linear classifier with all its desirable properties, its output converges to the logarithm of the likelihood ratio ( Šochman and Matas, 2004). AdaBoost fits an additive logistic regression model, using a criterion similar to the binomial log-likelihood. LogitBoost directly optimizes the binomial log-likelihood ( Friedman et al., 2000).

Weights-of-evidence is the special case of logistic regression characterized by the additional assumption of jointly conditionally independent indicator predictors. Boosting weights of evidence appears to aim at improving especially weak learners depending on joint conditional independence while the first canonical improvement would be to proceed from weights-of-evidence to logistic regression.

Boost weights-of-evidence ( Cheng, 2015) processes indicator predictors to update a prior unconditional probability sequentially with the conventional weight of evidence assigned to the first predictor, and successively boosted weights assigned to subsequent predictors ( Cheng, 2015, Eq. (26), p. 602) resulting in

(16)

b oost W ℓ (i) = W ℓ (i) + Q ℓ (i), i = 0, 1, ℓ = 1, …, m,

with

Q 1 (i) = 0

, and

Q ℓ (i)

presumably approximating

ln ⁡ ν ℓ (i)

, the correction term provided by the

ν

-model (Polyakova and Journel, 2007), Eq. (A2) of the Appendix A, by a weighted mean of corresponding conditional probabilities ( Cheng, 2015, p. 597). The boost terms

Q ℓ

are explicitly given as sum of

(ℓ − 1)

terms, where each term is given as logarithm of a ratio of sums of two ratios of conditional probabilities each ( Cheng, 2015, Eqs. (26) and (27), pp. 602–603). For more details the reader is referred to Appendix B.

In a similar way as the additive

ν

modifications of conventional weights of evidence (Appendix A), the additive modifications by successive boosting ( Cheng, 2015) of weights of evidence may allow for some small deviations from joint conditional independence in a very restricted form. They cannot emulate the effect of multiplicative interaction terms of predictors included in logistic regression models. Since

Q 1 = 0

, there are as many different boost weights-of-evidence models as permutations of

m

predictors

B ℓ, ℓ = 1, …, m

, i.e.,

m!

different boost models. Whether the procedure presented by Cheng (2015) to estimate the boost weights of evidence permutes correspondingly was not addressed; there is no obvious reason to assume that the order of predictors is irrelevant.

From an application with fabricated training data and a case study with observed training data, ( Cheng, 2015, p. 618) concludes that his novel boost weights-of-evidence method can significantly reduce the effect of conditional dependence in a simple and intuitive way as derived by Cheng (2015, pp. 596–607), in fact in a more generic way than other approaches ( Cheng, 2015, p. 620).

Empirical comparison

Since case studies cannot generally provide insight in the method applied, instructive examples with fabricated or simulated data are used to exemplify, expose and confirm theoretical findings. Here we primarily use the same dataset Q as Cheng (2015) for the particular purposes (i) to criticize and refute validation by case studies in general, and (ii) to disprove the major claims put forward by Cheng (2012, 2015) that BoostWofE significantly reduces the effect of lack of joint conditional independence ( Cheng, 2015, p. 618). Since the assumption of conditional independence is only mildly violated for the dataset Q, we demonstrate the effect of its serious violation using once more the dataset RANKIT ( Schaeben, 2014a, b).

Chengs’s (2015) fabricated training dataset Q

The training dataset Q is taken from the publication ( Cheng, 2015). The digital map images of spatial distributions of three indicator predictor variables

B ℓ, ℓ = 1, 2, 3

, and the indicator target variable

T

given their realizations

b ℓ, i, t i, i = 1, …, 100, ℓ = 1, …, 3

, are displayed in Fig. 1.

For three indicator predictor variables there are eight different combinations of their joint possible realizations. Then there are eight corresponding conditional frequencies

h (T = 1 | (B 1, B 2, B 3) = (b 1, b 2, b 3)), b 1, b 2, b 3 = 0, 1

, referred to as ground truth of the training dataset Q because these conditional frequencies determined by counting are unbiased estimates of the corresponding conditional probabilities

P (T = 1 | (B 1, B 2, B 3) = (B 1, b 2, B 3))

. The spatial distribution of the conditional frequencies is displayed in Fig. 2, the numbers are compiled in Table 15.

Pearson and Kendall correlation

A measure of spatial association like the variogram of geostatistics is unknown in all conventional methods of prospectivity modeling. Since the general assumption of identically independent distributed random variables applies to all conventional methods of prospectivity modeling, inspection of the correlation matrices is reasonable. In case of indicator predictors and data, respectively, it is sufficient to check any of the three conventional correlations, e.g. Table 1, as Pearson, Kendall, and Spearman correlation coefficients agree.

Statistical tests of Kendall correlation coefficients are summarized in Table 2 and Table 3.

Thus,

B 1

and

B 2

are inferred to be significantly correlated with

T

, while

B 3

seems to be rather uncorrelated with

T

The tests suggest that

B 1

is neither correlated with

B 2

nor with

B 3

, and that

B 2

and

B 3

are significantly correlated for every level of significance α>3.261456e‒06.

Conditional independence

Checking for instance for

(B 1, B 2, B 3) = (1, 1, 1)

whether the observed joint conditional frequencies interpreted as elementary estimates of the corresponding probabilities can be factorized given

T = 1

P^((B 1, B 2, B 3) = (1, 1, 1) | T = 1) = 0.300,

P^(B 1 = 1 | T = 1) * P^(B 2 = 1 | T = 1) * P^(B 3 = 1 | T = 1) = 0.196,

T = 0

P^((B 1, B 2, B 3) = (1, 1, 1) | T = 0) = 0.011,

P^(B 1 = 1 | T = 0) * P^(B 2 = 1 | T = 0) * P^(B 3 = 1 | T = 0) = 0.010,

suggests that the mathematical modeling assumption of joint conditional independence is violated to a minor extent only. Whether these minor deviations are statistically significant or not is the objective of a corresponding statistical test.

In terms of random variables joint conditional independence of indicator predictor variables given the indicator target variable can be tested with reference to a corresponding log–linear model. The full log–linear model is of course sufficiently large as all variables are indicator variables, and therefore the joint probability, i.e., all contingency tables, can be represented as log–linear model.

The statistical test of joint conditional independence referring to the three-terms log–linear model of Table 4 leads to infer that the null–hypothesis of joint conditional independence given

T

can reasonably be rejected for all levels of significance α>0.008 with respect to the likelihood ratio or α>0.003 with respect to Pearson statistic, i.e., the modeling assumption of joint conditional independence is significantly violated. However, the statistical tests of pairwise conditional independence referring to the corresponding two-terms log–linear models of Table 5, Table 6, and Table 7, respectively, reveal that only

B 2

and

B 3

are significantly not conditional independent given

T

, while conditional independence of

B 1

and

B 2

, and

B 1

and

B 3

, respectively, given

T

is violated to a statistically insignificant extent only.

In ( Cheng, 2015) it is merely concluded that joint conditional independence does not apply as pairwise conditional independence of

B 2

and

B 3

given

T

does not apply.

Application of general weights-of-evidence

Applying the general weights of evidence, Eq. (11), estimated by counting as conventional weights of evidence results in

O (T = 1 | B 1, B 2, B 3) = 0.111 * [2.739 B 1 + 0.402 (1 − B 1)] * [5.476 B 2 B 1 + 2.481 B 2 (1 − B 1) + 0.328 (1 − B 2) * B 1 + 0.455 (1 − B 2) * (1 − B 1)] * [1.800 B 3 B 1 B 2 + 1.000 B 3 (1 − B 1) B 2 + 0.600 (1 − B 3) B 1 B 2 + 1.111 (1 − B 3) B 1 (1 − B 2) + 1.000 (1 − B 3) (1 − B 1) B 2 + 1.088 (1 − B 3) (1 − B 1) (1 − B 2)]

and perfectly recovers the ground truth, of course, cf. Fig. 2. It should be noted that the interaction terms

B 3 B 1 (1 − B 2)

and

B 3 (1 − B 1) (1 − B 2)

are not included as their weights vanish. The general weights of evidence could be used for the purpose of prediction in the same way as conventional weights. However, as with conventional weights of evidence, a measure for the predictive power or the reliability of the prediction is missing. Caution seems to be appropriate, as the general weights of evidence may result in overfitting.

Application of conventional weights-of-evidence assuming conditional independence

Despite the obvious violation of the required mathematical modeling assumption of joint conditional independence of all predictors given the target, weights-of-evidence is applied to the training dataset Q and results in weights and contrasts

The figures 0.898 and ‒0.477 for the conventional weights of $B 2$ given with the first and second equation of Eq. (49) by Cheng (2015, p. 612) are wrong, probably typos. Consequently, the figures for the conditional probabilities estimated with conventional weights of evidence given in the right column of Eq. (52) by Cheng (2015, p. 613) are erroneous, too.

as compiled in Table 8.

With

P (T = 1) = 0.100, O (T = 1) = 0.111, ln (O (T = 1)) = − 2.197

and

W (0) = − 2.162, ln ⁡ (O (T = 1)) + W (0) = − 4.359

the weights-of-evidence model reads explicitly

(17)

P^Wof 3 E (T = 1 | B 1 B 2 B 3) = Λ (− 4.359 + 1.917 B 1 + 2.037 B 2 + 1.126 B 3) .

Comparing the total number of occurrences of the target event

T = 1

given by the sum of the realizations

t i, i = 1, …, 100

, with the sum of all estimated conditional probabilities (interpreted as estimated total number of occurrences of the target event

T = 1

( Cheng, 2015))

∑ t i = 10, ∑ P^Wof 3 E (T = 1 | B 1 B 2 B 3) = 10.257,

(18)

∑ P^Wof 3 E (T = 1 | B 1 B 2 B 3) − ∑ t i = 0.257,

which is the test statistic of the so-called “new omnibus test” of conditional independence ( Agterberg and Cheng, 2002, p. 252), confirms that joint conditional independence is disturbed to a small extent only.

For the two-terms weights-of-evidence model

(19)

P^Wof 2 E (T = 1 | B 1 B 2) = Λ (− 4.359 + 1.917 B 1 + 2.037 B 2),

the difference

(20)

∑ P^Wof 2 E (T = 1 | B 1 B 2) − ∑ t i = − 0.611,

and reveals in particular a change of sign of the error compared to the three-terms weights-of-evidence model of Eq. (17).

Comparing digital map images of Fig. 3 displaying the ground truth estimated elementarily with conditional frequencies, and the conditional probabilities

P^Wof 3 E (T = 1 | B 1 B 2 B 3)

and

P^Wof 2 E (T = 1 | B 1 B 2)

, respectively, estimated with weights of evidence reveals that weights-of-evidence with three predictors yields a corrupted pattern of predicted conditional probabilities due to the violation of the modeling assumption. Weights-of-evidence with the two predictors

B 1

and

B 2

, for which the null-hypothesis of conditional independence given

T

could not reasonably be rejected, cf. Table 5, results in a simplified pattern reflecting merely the spatial distribution of the two predictors used for prediction.

Application of BoostWofE

Processing the predictor variables in the same order

B 1, B 2, B 3

as ( Cheng, 2015) the weights and the contrast with respect to the predictor

B 1

remain unchanged, while the weights and the contrasts with respect to predictors

B 2

and

B 3

are affected by boosting

The figures 0.018 and ‒0.016 for the boosted weights of the third predictor $B 3$ given with Eq. (48) by Cheng (2015, p. 611) are likely to be typos. However their difference 0.034 is about right such that the computed predicted probabilities are not much affected.

, cf. Table 9.

With

boost 123 W (0) = − 1.876, ln ⁡ (O (T = 1)) + boost 123 W (0) = − 4.073,

the Boost123WofE model by Cheng (2015) reads explicitly

(21)

boost 123 P^WofE (T = 1 | B 1 B 2 B 3) = Λ (− 4.073 + 1.917 B 1 + 2.192 B 2 + 0.034 B 3),

and results in

∑ boost 123 P^WofE (T = 1 | B 1 B 2 B 3) = 9.919,

(22)

∑ boost 123 P^WofE (T = 1 | B 1 B 2 B 3) − ∑ t i = − 0.081.

Comparing WofE Eq. (17) and Boost123WofE Eq. (21), in this example boosting as introduced by Cheng (2015) puts less weight on

B 3

and more weight on

B 2

resulting in a largely improved error Eq.(22) of the same sign as the error Eq.(20) of the two-terms weights-of-evidence model Eq.(19) .

Comparing the map images of Fig. 4 leads to recognize that the result of Boost123WofE largely agrees with the result of Wof2E, the application of weights-of-evidence considering

B 1

and

B 2

only. The map images just confirm the comparison of the two models given explicitly with Eq. (19) and Eq. (21), respectively.

Next, the order of boosting is changed from

(B 1, B 2, B 3)

(B 2, B 1, B 3)

and

(B 3, B 1, B 2)

, respectively.

For

(B 2, B 1, B 3)

Table 10 gives the corresponding figures.

With

boost 213 W (0) = − 1.903, ln ⁡ (O (T = 1)) + boost 213 W (0) = − 4.100

the Boost213WofE model now reads explicitly

(23)

boost 213 P^WofE (T = 1 | B 1 B 2 B 3) = Λ (− 4.100 + 2.088 B 1 + 2.036 B 2 + 0.140 B 3),

and results in

∑ boost 213 P^WofE (T = 1 | B 1 B 2 B 3) = 10.045,

(24)

∑ boost 213 P^WofE (T = 1 | B 1 B 2 B 3) − ∑ t i = 0.045.

For

(B 3, B 1, B 2)

Table 11 gives the corresponding figures.

With

boost 312 W (0) = − 2.247, ln ⁡ (O (T = 1)) + boost 312 W (0) = − 4.444,

the Boost312WofE model now reads explicitly

(25)

boost 312 P^WofE (T = 1 | B 1 B 2 B 3) = Λ (− 4.444 + 2.101 B 1 + 1.798 B 2 + 1.125 B 3),

and results in

∑ boost 312 P^WofE (T = 1 | B 1 B 2 B 3) = 9.377

(26)

∑ boost 312 P^WofE (T = 1 | B 1 B 2 B 3) − ∑ t i = − 0.623.

The error Eq. (24) of the Boost213WofE model Eq. (23) is small and of the opposite sign as the error Eq. (22) of the Boost123WofE model Eq. (21), while the error Eq. (26) of the Boost312WofE model Eq. (25) is about the same as of the two-terms weights-of-evidence model, Eq. (20).

Visual inspection of the map images of Fig. 5 clearly indicates that the BoostWofE model ( Cheng, 2015) depends on the choice of the first predictor variable to initiate the boost algorithm, and on the order of the subsequent predictors. Checking all permutations, cf. Fig. 5, the dataset designed by Cheng (2015) to “prove” properties of BoostWofE indicates that the relative order of predictors

B 2

and

B 3

, which were inferred to be not conditionally independent given the target

T

, is particularly sensitive. When the predictor

B 2

precedes the predictor

B 3

, the patterns of the predictors

B 1

and

B 2

are visible in the pattern of predicted conditional probability while the pattern of

B 3

is not. Contrary, when the predictor

B 3

precedes the predictor

B 2

, the pattern of the predictor

B 3

becomes visible in the pattern of the predicted conditional probability and apparently contributes spatial resolution to the the pattern provided by

B 2

. Unfortunately, Cheng (2015) does neither discuss how to choose the first predictor variable, nor the influence of the order of the subsequently one by one included predictors.

Application of logistic regression

Three logistic regression models are fitted to the dataset Q. The first uses

B 1

and

B 2

only, Table 12, the second adds

B 3

, Table 13, and the third adds the interaction term

B 2 : B 3

to compensate for the lack of conditional independence, Table 14.

For all logistic regression models

∑ P^lrM (T = 1 | B 1 B 2 B 3) = ∑ t i

as this is a constitutive equation.

Obviously, the full logistic regression model recovers the ground truth perfectly, cf. Fig. 6. However, its fitted parameters are not significant except the so-called intercept, and therefore it represents an instance of overfitting. Thus, for the training dataset Q the full logistic model is not at all appropriate for prediction.

Generally, for indicator predictors the general weights-of-evidence approach and the full logistic regression model will always recover the ground truth. As for the training dataset Q the full logistic regression model with

∑ ℓ = 1 m (m ℓ) + 1 = 2 m = 8

terms might be rendered parsimonious compared to the general weights-of-evidence model with

∑ ℓ = 1 m 2 ℓ + 1 = 2 m + 1 − 1 = 15

terms. Moreover, only logistic regression provides statistical significance of its fitted parameters to judge whether a model is appropriate for prediction or not.

The only significant logistic regression model with predictors

B 1

and

B 2

results in a simplified pattern quite similar to the pattern of Wof2E, cf. Fig. 3, and Boost123WofE, cf. Fig. 4. However, the notion of significance does neither exist in WofE nor in BoostWofE by Cheng (2015).

If there was any doubt in the inappropriateness of the full logistic regression model for prospectivity modeling, the large standard errors as depicted in Fig. 7 confirm that the predicted conditional probabilities cannot be considered reasonably reliable.

Summary of training dataset Q

All fitted models are explicitly compiled in Eqs. (27) to (34).

(27)

P^Wof 3 E (T = 1 | B 1 B 2 B 3) = Λ (− 4.359 + 1.917 B 1 + 2.037 B 2 + 1.126 B 3),

(28)

P^Wof 2 E (T = 1 | B 1 B 2 B 3) = Λ (− 4.044 + 1.917 B 1 + 2.037 B 2),

(29)

boost 123 P^WofE (T = 1 | B 1 B 2 B 3) = Λ (− 4.073 + 1.917 B 1 + 2.192 B 2 + 0.034 B 3),

(30)

boost 213 P^WofE (T = 1 | B 1 B 2 B 3) = Λ (− 4.100 + 2.088 B 1 + 2.036 B 2 + 0.140 B 3),

(31)

boost 312 P^WofE (T = 1 | B 1 B 2 B 3) = Λ (− 4.444 + 2.101 B 1 + 1.798 B 2 + 1.125 B 3),

(32)

P^fit (T = 1 | B 1 B 2 B 3) = Λ (− 4.402 + 2.299 B 1 + 2.407 B 2),

(33)

P^3 (T = 1 | B 1 B 2 B 3) = Λ (− 4.421 + 2.301 B 1 + 2.330 B 2 + 0.187 B 3),

(34)

P^full (T = 1 | B 1 B 2 B 3) = Λ (− 3.806 + 1.609 B 1 + 1.609 B 2 − 14.759 B 3 . + 0.587 B 1 B 2 − 1.609 B 1 B 3 + 14.759 B 2 B 3 + 2.708 B 1 B 2 B 3) .

Conditional probabilities

P^(T = 1 | B)

predicted with several fitted models are compiled in Table 15.

Fabricated training dataset RANKIT

The fabricated training dataset RANKIT, Fig. 8, was already used earlier to demonstrate the effects of a serious violation of the mathematical modeling assumption of conditional independence ( Schaeben, 2014a, b). Here, applying boost weights-of-evidence as suggested by Cheng (2015) results in the fitted models explicitly given by

(35)

boost 12 P WofE (T = 1 | B 1 B 2) = Λ (− 3.074 + 1.725 B 1 + 1.157 B 2),

(36)

boost 21 P WofE (T = 1 | B 1 B 2) = Λ (− 2.997 + 1.372 B 1 + 1.349 B 2) .

First, Eqs. (35) and (36) confirm again that the fitted models depend on the order of processing the two predictors. However, here they are quite similar, but both of them do not reduce the effect of lacking conditional independence. In fact, Figure 9 clearly reveals that whatever the order boosting does not generally provide a means to compensate the lack of conditional independence but yields similar patterns of predicted conditional probabilities as conventional weights-of-evidence or logistic regression without the interaction term

B 1 : B 2

. Including the interaction term results in the full logistic regression model which is significant ( Schaeben, 2014b) and recovers the ground truth given in terms of conditional frequencies by counting.

Discussion of results

General weights of evidence are not “very difficult if not impossible” to estimate “given a limited number of training data” ( Cheng, 2015, p. 597). In fact, if conditional probabilities being 0 or 1 can reasonably be excluded, they are estimated by counting frequencies of occurrences of several events like conventional weights. Since the target event

T = 1

is usually a very rare event in prospectivity modeling, chances are fairly small that the denominators

P ((Z ℓ, T) = (b j, ℓ, 0))

involved in the Bayes factors, Eq. (8), referring to

T = 0

vanish. When applying general weights-of-evidence to the limited number of training data as provided by the dataset q ( Cheng, 2015) problems were not encountered. Thus, difficulties to be resolved by boost weights of evidence do not seem to exist.

Given the ten pages it takes ( Cheng, 2015, pp. 596–607) to derive the procedure of boost weights-of-evidence relying on an ad hoc approximation which is not justified in any way, it cannot be confirmed that BoostWofE is simple and intuitively appealing as compared with logistic regression, Eq. (5), for instance. Like conventional weights-of-evidence and opposed to logistic regression, boost weights-of-evidence lacks the notion of significance of fitted model parameters.

As for the fabricated training data set Qused by Cheng (2015), the statistical test of the null–hypothesis of joint conditional independence of all three predictors

B ℓ, ℓ = 1, 2, 3

, given the target

T

leads to infer to reasonably reject it. However, it can be rejected because

B 2

and

B 3

are not conditional independent given

T

, while the null hypothesis cannot reasonably be rejected for the other pairs of predictors.

Then, the fabricated training data set Q obviously exemplifies that the results of boost weights-of-evidence largely depend on the boosting sequence of predictors. It has to be noted that criteria for the user decision how to choose the sequence are not provided in ( Cheng, 2015). Whatever the boosting sequence, boost weights-of-evidence does not reduce the effect of violated conditional independence but leads either to corrupted patterns of predicted conditional probabilities as conventional weights-of-evidence does, or to simplified patterns similar to those accomplished by omitting

B 3

from the prediction.

As with respect to the training dataset RANKIT which is significantly lacking joint conditional independence, application of boost weights-of-evidence results in the same corrupted pattern of predicted conditional probabilities as conventional weights-of-evidence.

In fact, boost weights-of-evidence method as introduced by Cheng (2015) does not generally reduce the effect of lacking conditional independence, not to mention the simple, intuitive and more generic way as claimed by Cheng (2015, p. 620). Non of the claims by Cheng (2015) could be verified.

It is commonplace that examples cannot substitute mathematical proofs, while one counterexample is sufficient to disprove a mathematical statement. In the same way, it is not possible to derive properties of an ad hoc procedure nor to validate it by way of an example with fabricated or observed training data. It is impossible to judge the performance of an ad hoc procedure by way of its application to observed training data. It takes mathematical-statistical analysis to derive a method, to turn it into an algorithm, and to encode it in software.

Conclusions

Boost weights-of-evidence ( Cheng, 2015) is another improper attempt to relax the mathematical modeling assumption of joint conditional independence of all predictors given the target hampering reliable predictions of prospectivity. Boost weights-of-evidence does not generally reduce not to mention significantly reduce the effect of lacking joint conditional independence. Its application yields corrupted predicted conditional probabilities and corrupted spatial patterns of prospectivity as weights-of-evidence does. In particular, its results depend on the sequential processing order of predictors. General weights of evidence do not require the modeling assumption of joint conditional independence of all predictors given the target, and can be estimated by mere counting like conventional weights. Logistic regression provides a relative measure of fit, and can distinguish significantly fitted models appropriate for predictions.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Agterberg F P (2014). Geomathematics: Theoretical Foundations, Applications and Future Developments. Cham, Heidelberg, New York, Dordrecht, London: Springer

[2]	Agterberg F P, Bonham-Carter G F, Wright D F (1990). Statistical pattern integration for mineral exploration. In: Gaál G, Merriam D F, eds. Computer Applications in Resource Estimation Prediction and Assessment for Metals and Petroleum. Oxford, New York: Pergamon Press, 1–21

[3]	Agterberg F P, Cheng Q (2002). Conditional independence test for weights-of-evidence modeling. Nat Resour Res, 11(4): 249–255

[4]	Berkson J (1944). Application of the logistic function to bio-assay. J Am Stat Assoc, 39(227): 357–365

[5]	Bonham-Carter G (1994). Geographic Information Systems for Geoscientists: Modeling with GIS. New York: Pergamon, Elsevier Science

[6]	Butz C J, Sanscartier M J (2002). Properties of weak conditional independence. In: Alpigini J J, Peters J F, Skowron A, Zhong N, eds. Rough Sets and Current Trends in Computing, Lecture Notes in Computer Science (Volume 2475). Berlin, Heidelberg: Springer, 349–356

[7]	Chalak K, White H (2012). Causality, conditional independence, and graphical separation in settable systems. Neural Comput, 24(7): 1611–1668

[8]	Cheng Q (2012). Application of a newly developed boost weights of evidence model (BoostWofE) for mineral resources quantitative assessments. Journal of Jilin University, Earth Sci Ed, 42(6): 1976–1985

[9]	Cheng Q (2015). BoostWofE: a new sequential weights of evidence model reducing the effect of conditional dependency. Math Geosci, 47(5): 591–621

[10]	Chilès J P, Delfiner P (2012). Geostatistics- Modeling Spatial Uncertainty (2nd ed). New York, Chichester, Weinheim, Brisbane, Singapore, Toronto: John Wiley & Sons

[11]	Dawid A P (1979). Conditional independence in statistical theory. J R Stat Soc, B, 41(1): 1–31

[12]	Dawid A P (2004). Probability, causality and the empirical world: a Bayes-de Finetti-Popper-Borel synthesis. Stat Sci, 19(1): 44–57

[13]	Dawid A P (2007). Fundamentals of Statistical Causality. Research Report 279, Department of Statistical Science, University College London ESRI, ArcGIS.

[14]	Ford A, Miller J M, Mol A G (2016). A comparative analysis of weights of evidence, evidential belief functions, and fuzzy logic for mineral potential mapping using incomplete data at the scale of investigation. Nat Resour Res, 25(1): 19–33

[15]	Freund Y, Schapire R E (1997). A decision theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci, 55(1): 119–139

[16]	Freund Y, Schapire R E (1999). A short introduction to boosting. Jinko Chino Gakkaishi, 14(5): 771–780

[17]	Friedman J, Hastie T, Tibshirani R (2000). Additive logistic regression: a statistical view of boosting. Ann Stat, 28(2): 337–407

[18]	Good I J (1950). Probability and the Weighing of Evidence. London: Griffin

[19]	Good I J (1960). Weight of evidence, corroboration, explanatory power, information and the utility of experiments. J R Stat Soc, B, 22(2): 319–331

[20]	Good I J (1968). The Estimation of Probabilities: An Essay on Modern Bayesian Methods. MIT Research Monograph No. 30, The MIT Press, Cambridge, MA, 109

[21]	Harris D P, Pan G C (1999). Mineral favorability mapping: a comparison of artificial neural networks, logistic regression and discriminant analysis. Nat Resour Res, 8(2): 93–109

[22]	Harris D P, Zurcher L, Stanley M, Marlow J, Pan G (2003). A comparative analysis of favorability mappings by weights of evidence, probabilistic neural networks, discriminant analysis, and logistic regression. Nat Resour Res, 12(4): 241–255

[23]	Hastie T, Tibshirani R, Friedman J (2009). The Elements of Statistical Learning (2nd ed). New York: Springer

[24]	Hosmer D W, Lemeshow S, Sturdivant R X (2013). Applied Logistic Regression (3rd ed). Hoboken, NJ: Wiley & Sons

[25]	Journel A G (2002). Combining knowledge from diverse sources: an alternative to traditional data independence hypotheses. Math Geol, 34(5): 573–596

[26]	Kreuzer O, Porwal A, eds. (2010). Special Issue “Mineral Prospectivity Analysis and Quantitative Resource Estimation”. Ore Geol Rev, 38(3): 121–304

[27]	Krishnan S (2008). The τ-model for data redundancy and information combination in Earth sciences: theory and application. Math Geol, 40(6): 705–727

[28]	Minsky M, Selfridge O G (1961). Learning in random nets. In: Cherry C, ed. 4th London Symposium on Information Theory. London: Butterworths, 335–347

[29]	Pearl J (2009). Causality: Models, Reasoning, and Inference. 2nd ed.New York: Cambridge University Press

[30]	Polyakova E I, Journel A G (2007). The ν. Math Geol, 39(8): 715–733

[31]	Porwal A, Carranza E J M (2015). Introduction to the Special Issue: GIS-based mineral potential modelling and geological data analyses for mineral exploration. Ore Geol Rev, 71: 477–483

[32]	Porwal A, González-Álvarez I, Markwitz V, McCuaig T C, Mamuse A (2010). Weights of evidence and logistic regression modeling of magmatic nickel sulfide prospectivity in the Yilgarn Craton, Western Australia. Ore Geol Rev, 38(3): 184–196

[33]	Reed L J, Berkson J (1929). The application of the logistic function to experimental data. J Phys Chem, 33(5): 760–779

[34]	Rodriguez-Galiano V, Sanchez-Castillo M, Chica-Olmo M, Chica-Rivas M (2015). Machine learning predictive models for mineral prospectivity: an evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol Rev, 71: 804–818

[35]	Schaeben H (2014a). Targeting: logistic regression, special cases and extensions. ISPRS Int J Geoinf, 3(4): 1387–1411

[36]	Schaeben H (2014b). Potential modeling: conditional independence matters. GEM-International Journal on Geomathematics, 5(1): 99–116

[37]	Schaeben H (2014c). A mathematical view of weights-of-evidence, conditional independence, and logistic regression in terms of Markov random fields. Math Geosci, 46(6): 691–709

[38]	Šochman J, Matas J (2004). Adaboost with totally corrective updates for fast face detection. In: Proc. 6th IEEE International Conference on Automatic Face and Gesture Recognition, Seoul, South Korea, 445–450

[39]	Suppes P (1970). A Probabilistic Theory of Causality. Amsterdam: North-Holland

[40]	Tolosana-Delgado R, van den Boogaart K G, Schaeben H (2014). Potential mapping from geochemical surveys using a Cox process. 10th Conference on Geostatistics for Environmental Applications, Paris, July 9–11, 2014

[41]	van den Boogaart K G, Schaeben H (2012). Mineral potential mapping using Cox–type regression for marked point processes. 34th IGC Brisbane, Australia

[42]	Wong M S K M , Butz C J (1999). Contextual weak independence in Bayesian networks. In: Proc. 15th Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, 670–679

RIGHTS & PERMISSIONS

Higher Education Press and Springer-Verlag Berlin Heidelberg

PDF (1655KB)

1337

Accesses

Citation

Detail

Sections

Recommended

About the journal

Browse

Authors & reviewers

Abstract

Keywords

Cite this article

Introduction

Fundamentals: stochastic independence, conditional independence

Definition

Testing conditional independence

Logistic regression, weights-of-evidence, boost weights-of-evidence

Logistic regression

Weights-of-evidence

General weights of evidence

Weights of evidence assuming conditional independence

Boost weights-of-evidence (BoostWofE)

Empirical comparison

Chengs’s (2015) fabricated training dataset Q

Pearson and Kendall correlation

Conditional independence

Application of general weights-of-evidence

Application of conventional weights-of-evidence assuming conditional independence

Application of BoostWofE

Application of logistic regression

Summary of training dataset Q

Fabricated training dataset RANKIT

Discussion of results

Conclusions

References

RIGHTS & PERMISSIONS