A data representation method using distance correlation

Xinyan LIANG; Yuhua QIAN; Qian GUO; Keyin ZHENG

doi:10.1007/s11704-023-3396-y

PDF(1511 KB)

Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (1) : 191303. DOI: 10.1007/s11704-023-3396-y

Excellent Young Computer Scientists Forum

RESEARCH ARTICLE

A data representation method using distance correlation

Xinyan LIANG¹ ,
Yuhua QIAN¹^,² ,
Qian GUO³^,⁴ ,
Keyin ZHENG¹

Author information +

History +

Abstract

Association in-between features has been demonstrated to improve the representation ability of data. However, the original association data reconstruction method may face two issues: the dimension of reconstructed data is undoubtedly higher than that of original data, and adopted association measure method does not well balance effectiveness and efficiency. To address above two issues, this paper proposes a novel association-based representation improvement method, named as AssoRep. AssoRep first obtains the association between features via distance correlation method that has some advantages than Pearson’s correlation coefficient. Then an improved matrix is formed via stacking the association value of any two features. Next, an improved feature representation is obtained by aggregating the original feature with the enhancement matrix. Finally, the improved feature representation is mapped to a low-dimensional space via principal component analysis. The effectiveness of AssoRep is validated on 120 datasets and the fruits further prefect our previous work on the association data reconstruction.

Graphical abstract

Keywords

association / representation / distance correlation / classification

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Xinyan LIANG, Yuhua QIAN, Qian GUO, Keyin ZHENG. A data representation method using distance correlation. Front. Comput. Sci., 2025, 19(1): 191303 https://doi.org/10.1007/s11704-023-3396-y

1 Introduction

The success of deep learning [1, 2], multi-label learning [3, 4], kernel learning [5, 6] shows that learning with enhanced features instead of the original features maybe more effective. For example, the multilayer perceptron and attention have been designed to enhance the representation ability of data in an implicit manner, achieving the performance improvement of machine learning models [7]. However, their poor interpretability strongly limits their application in the trusted domain. In this article, the interpretability denotes the transparency of the model specifically related to humans’ ability to understand it [8]. Hence, it is necessary to develop an interpretable representation enhancement method. Recently, some researchers have attempted to enhance the representation ability of data by fully mining and utilizing the latent information in data with some transparent techniques [9, 10].

The association information that characterizes the relationship among features/variables is a kind of important latent information of data. The datasets to analyze are mostly collected from real applications, they often contain important and rich association relationship forms [11–13]. However, most researchers in machine learning domain prefer obtaining the independent feature representation by putting the orthogonal constraint on a new feature space for some reasons such as feature decoupling, simplicity in modeling. This strategy means that the association among features is removed, which not only causes information waste but also maybe not a good strategy for learning on association data. Ours recent work (the method is named as AF) [9] applies association among features calculated using Pearson’s correlation coefficient (pCor) to data reconstruction, finding that association in-between features can improve the representation ability of data. However, AF has two limitations:

1. Data representation obtained by AF is high dimension or sparse. AF consists of feature boosting process and association-based fusion process. In order to model the high-order information of features and improve the nonlinear representation ability of original data, feature boosting process adopted a simple but effective way of adding the power of each feature value into the original feature space. This process indeed achieves their goal, but it also causes a tricky problem that the dimension of new representation must be higher than that of the original data representation. For example, if the dimension of given data set is 100, the dimension of new representation will be 1000 when the parameter

L

takes value 10. The issue of curse of dimensionality limits AF application to the high-dimensional data. Hence, it is desirable to develop an association-based data reconstruction method that can generate a lower-dimensional data representation.

2. pCor used to capture the association between features by AF does not well balance effectiveness and efficiency. AF’s one core task is to measure the association degree between two feature vectors. Some association measure methods like pCor is computationally efficient, but some methods themselves have some limitations. For example, pCor’s value dose not accurately reveal whether two features are independent; moreover, pCor is only appropriate for calculating association between two feature vectors with the same dimension. Others like MIC, MNC can mine more relationships, but it is computationally inefficient. Overall, the association computed by simple association measure methods is inaccurate, while advanced methods are computationally inefficient. Hence, it is necessary to explore a more practical association measure method that can balance both effectiveness and efficiency to association-based data reconstruction task.

Based on the above analysis, ours aim is to develop a novel association data reconstruction that well balances efficiency and effectiveness by using more proper association measure method and low dimensional embedding techniques. To this end, we develop an association-based representation enhancement, which is shortened to AssoRep. AssoRep first obtains the association between features via distance correlation method that has some advantages than Pearson’s correlation coefficient. Then an improved matrix is formed via stacking the association value of any two features. Next, an improved feature representation is obtained by aggregating the original feature with the enhancement matrix. Finally, the improved feature representation is mapped to a low-dimensional space via principal component analysis. It is noted that the work mechanism of AssoRep’s each process is transparent.

The contributions of this work are as follows:

1. We introduce a fresh perspective on data representation improvement through association between features, which perfects the relationship-based learning that mainly focuses on relationships among samples such as graph neural network and spectral clustering.

2. A novel distance correlation-based data representation method is proposed, and it well balances effectiveness and efficiency compared to its counterpart AF [9].

3. The experimental results on 120 benchmark show that the proposed AssoRep outperforms the other methods in term of five popular evaluation metrics widely used for classification in most cases.

The remainder of this paper is organized as follows: Section 2 reviews the related works including learning with association and feature argumentation. Section 3 details the AssoRep, a representation framework for the associated data. Section 4 details the experimental setup and the results on the classification task. In Section 5, the conclusions and future work are presented.

2 Related work

Our work falls into the category of association mining, learning based on association and feature enhancement. To show the place of our work, we shall simply review them as follows.

Association mining: To measure the association among variables, the scholars have proposed lots of methods. For example, the well-known Pearson correlation coefficient was designed for measuring the strength of linear trend between two variables [14]; Spearman’s rho [15] and Kendall’s tau [16] were developed for measuring the degree of monotonic trend between two variables. For identifying the complex association relationships among variables such as trigonometric function, inverse trigonometric function some advanced methods have been developed such as distance corrlation (DC) [17], maximal information coefficient (MIC) [18], and maximal neighborhood coefficient (MNC) [19].

Learning with association: The association has been proven to be a kind of effective latent information for performance improvement or other aims on some machine learning tasks, especially multi-label learning [20–22]. For example, to enable binary relevance with label correlation exploitation abilities, the researches have proposed the chaining structure, the stacking structure and the controlling structure based on three assumptions: random label correlations, full-order label correlations and pruned label correlations, respectively [23]. Recently, association is also been applied to other tasks. For example, Kou et al. [24] developed a mining label association rules method for automatically mining the mixed order correlation among labels, and then applied the correlations to multi-label feature select task. Troncoso et al. [8] explained models for time series forecasting with the help of numeric association rules. Although the above methods achieve success for different aims in various tasks, most works are from multi-label learning task, and consider the association among labels.

Feature argumentation: The feature argumentation generally serves two purposes: producing new samples and boosting the representation ability. The former generates more diverse and discriminative features by noise injection [25], sampling on hyperbolic normal distribution [26], each generated feature corresponds to a new examples. Similar to our work, the latter aims to re-represent the examples based on the original features and extra information like distance information [27], multi-view features [28], or multi-scale information [29]. For example, Jia et al. [10] achieved a performance improvement of multi-dimensional classification on the augmented feature space that consists of counting statistics on the class membership of neighboring as well as distance information between examples and their

k

nearest neighbors via

k

NN techniques. Wang et al. [30] induced an enhanced feature representation by fusing multi-scale discriminative information from different layers of the convolutional neural network into a single feature vector. Wang et al. [31] enriched the feature space using confidence-rated class prototype features to replenish discriminative characteristics of the underlying ground-truth labels for partial label training examples. Its benefits were demonstrated in many applications such as multi-label learning [10], multi-modal classification [9], and multi-camera tracking [32]. Another kind of feature augmentation method is the feature selection that removes the unimportant features to achieve the purpose [33]. For example, Liu et al. [34] removed weaker features from multiple candidate sets based on an exploration-exploitation strategy reinforcement learning. It is worth noticing that some existing feature argumentation methods like the CRAM

_{c}

use the discriminative information from output space (label information). In this paper, we introduce a fresh perspective on data representation improvement that only uses information association from the input (feature) space.

3 The AssoRep method

This article proposes a framework of enhancing representation via association and name it as AssoRep. AssoRep includes (1) relationship boosting, (2) association mining, and (3) association embedding.

Let

X

be a set with

n

examples and

Y

be its corresponding label set. Then a dataset can be represented as

D = (X, Y),

where

X = {x_{1}, x_{2}, \dots, x_{n}} \in R^{n \times m}

where

x_{i} = {x_{i 1}, x_{i 2}, \dots,

x_{i m}} \in R^{m}

denotes

i

th example,

n

and

m

are the numbers of examples and features, respectively;

Y = {y_{1}, y_{2}, \dots, y_{n}} \in R^{n}

where

y_{i}

x_{i}

’ label.

Let

F

be the feature vector set of the data set

D

. Then it is written as follows

F = {f_{1}, f_{2}, \dots, f_{m}},

where

f_{i} = {x_{1 i}, x_{2 i}, \dots, x_{n i}} \in R^{n}

denotes the

i

th feature vector from

X

3.1 Relationship boosting

Its aim is to enrich features by adding transform terms using different transform functions. This process can be viewed as the first enhancement for

X

. In this article, the power functions with different integer order are used to this end. The effectiveness of boosting relationship with power functions has been validated by some works such as [9, 35].

Let

B \in R^{n \times m L}

be the relationship boosting data representation of

X

. Given a set of power functions

ϕ = {ϕ_{1} (x), ϕ_{2} (x), \dots, ϕ_{L} (x)}

where

L

is the maximal order,

ϕ_{t} (x) = x^{t}, t \in {1, 2, \dots, L}

, we obtain

B

as follows:

1. For each feature vector

f_{i} \in F

, compute its transform values using the power functions

ϕ

and represent these transform values as the following matrix form

(1)

\begin{array}{l} B_{i} = [ϕ_{1} (f_{i}), ϕ_{2} (f_{i}), \dots, ϕ_{L} (f_{i})] \in R^{n \times L} . \end{array}

2. Concatenate the transform values of

m

feature vectors from

F

as follows,

(2)

\begin{array}{l} B = [B_{1}, B_{2}, \dots, B_{m}], \end{array}

where

(3)

\begin{array}{l} B_{i} = [\begin{matrix} ϕ_{1} (x_{1 i}) & ϕ_{2} (x_{1 i}) & \dots & ϕ_{L} (x_{1 i}) \\ ϕ_{1} (x_{2 i}) & ϕ_{2} (x_{2 i}) & \dots & ϕ_{L} (x_{2 i}) \\ ⋮ & ⋮ & ⋮ \\ ϕ_{1} (x_{n i}) & ϕ_{2} (x_{n i}) & \dots & ϕ_{L} (x_{n i}) \end{matrix}] . \end{array}

3.2 Association mining

The purpose of this article is to enhance the representation ability of given datasets via association information between feature vectors. Hence, one core task is to measure the association degree between two feature vectors, and the choice of association mining methods is important.

3.2.1 Choice of association mining method

If we view every feature as a variable in statistic, then in correlation analysis, the methods which are used to measure correlation coefficient can be adopted for mining association among features. A basic aspects of the correlation analysis can see the literature [36]. In the following, we briefly introduce some correlation analysis methods and detail the distance correlation that is used in our work.

The widely-used Pearson correlation coefficient (pCor), also named as Pearson product-moment correlation coefficient, can give the strength of linear trend between two variables [14]. Spearman’s rho [15] that was reprinted and reflected more than once (see [37, 38]) and Kendall’s tau [16] are two rank order correlation coefficients. Both of them are often used to measure the degree of monotonic trend between two variables. A comparison analysis between Spearman’s rho and Kendall’s tau can be seen in literature [39].

Mutual information, a frequently-used mathematical theory, is often used to construct the association measurement tools [40]. For example, in 2011, David et al. thought if a relationship exists between two variables, then a grid can be drawn on the scatter plot of the two variables that partitions the data to encapsulate that relationship. Based on the idea, they proposed the maximal information coefficient (MIC) where these grid partitions are applied to estimate mutual information [18]. With the inspiration of MIC, Cheng et al. developed effective bivariate and multivariate association mining techniques by replacing the example with its neighbor points from the perspective of neighborhood information [19, 41]. They show the powerful ability of capturing various kinds of functional relationships.

The above mentioned methods are either with their own shortcomings (i.e., pCor) or typically computationally intensive (i.e., MIC, MNC). With the trade-off between measurement effectiveness and computational complexity, we choose the distance correlation (dCor) [17], a correlation analysis method based on characterize function, as the mining association information tool. Given two feature vectors

X \in R^{p}

and

Y \in R^{q}

, where

p

and

q

are the dimention of the two vectors, the distance covariance of two feature vectors

V

, distance correlation [17] between two random variables is defined by

(4)

R^{2} (X, Y) = {\begin{cases} \frac{V^{2} (X, Y)}{\sqrt{V^{2} (X) V^{2} (Y)}}, V^{2} (X) V^{2} (Y) > 0; \\ 0, V^{2} (X) V^{2} (Y) = 0. \end{cases}

where

(5)

\begin{aligned} V^{2} (X, Y) = & ∥ f_{X, Y} (t, s) - f_{X} (t) f_{Y} (s) ∥^{2} \\ = \frac{1}{c_{p} c_{q}} \int_{R^{p + q}} \frac{| f_{X, Y} (t, s) - f_{X} (t) f_{Y} (s) |^{2}}{| t |_{p}^{1 + p} | s |_{q}^{1 + q}} d_{t} d_{s}, \end{aligned}

(6)

\begin{array}{l} V^{2} (X) =∥ f_{X, X} (t, s) - f_{X} (t) f_{X} (s) ∥^{2}, \end{array}

(7)

\begin{array}{l} V^{2} (Y) =∥ f_{Y, Y} (t, s) - f_{Y} (t) f_{Y} (s) ∥^{2}, \end{array}

where

f_{X}

f_{Y}

and

f_{X, Y}

denote the characteristic function of

X

Y

and the joint characteristic function between both of them, respectively.

The distance correlation possesses the following features:

●

0 \leq R \leq 1

;

●

R (X, Y)

is defined for

X

and

Y

in arbitrary dimensions, while the widely-used Pearson’s correlation coefficient (pCor) must be same. That is to say, the constraint

p = q

has to be meet for pCor but not for dCor;

●

R (X, Y) = 0

characterizes independence of

X

and

Y

while the pCor is not;

● Compared with MIC, MNC etc., it is computationally efficient.

3.2.2 Computing the association in-between features

It aims to obtain an association matrix as enhancement matrix via stacking the association values of any two feature vectors where the association between features is computed via distance correlation method.

To measure the association between any two feature vectors in a given data set, empirical distance correlation (dCor) [17] is introduced due to its good properties described above, especially than Pearson’s correlation coefficient.

Let

F^{ϕ}

be the feature vector set of

B

that is the relationship boosting data representation of

X

. Then it can be denoted as

(8)

\begin{array}{l} F^{ϕ} = {h_{1}, h_{2}, \dots, h_{m L}}, \end{array}

where

h_{j} = {ϕ_{t} (x_{1 i}), ϕ_{t} (x_{2 i}), \dots, ϕ_{t} (x_{n i})} \in R^{n}

denotes the

j

th feature vector from

B

shown in Eq. (2), where

j = L (i - 1) + t

Given two feature vectors

h_{i} = {x_{1 i}, x_{2 i}, \dots, x_{n i}} \in F^{ϕ}

and

h_{j} = {x_{1 j}, x_{2 j}, \dots, x_{n j}} \in F^{ϕ}

, where

n

is the number of examples. The empirical distance covariance of the two feature vectors

V_{n} (h_{i}, h_{j})

is defined by

(9)

V_{n}^{2} (h_{i}, h_{j}) = \frac{1}{n^{2}} \sum_{k, l = 1}^{n} A_{k l} B_{k l},

where

A_{k l} = a_{k l} - \bar{a_{k \cdot}} - \bar{a_{\cdot l}} - \bar{a_{\cdot \cdot}}

B_{k l} = b_{k l} - \bar{b_{k \cdot}} - \bar{b_{\cdot l}} - \bar{b_{\cdot \cdot}}

and each term of them are computed as follows:

(10)

\begin{array}{l} a_{k l} = | x_{k i} - x_{l i} |_{p}, b_{k l} = | x_{k j} - x_{l j} |_{p}, \end{array}

(11)

\bar{a_{k \cdot}} = \frac{1}{n} \sum_{l = 1}^{n} a_{k l}, \bar{b_{k \cdot}} = \frac{1}{n} \sum_{l = 1}^{n} b_{k l},

(12)

\bar{a_{\cdot l}} = \frac{1}{n} \sum_{k = 1}^{n} a_{k l}, \bar{b_{\cdot l}} = \frac{1}{n} \sum_{k = 1}^{n} b_{k l},

(13)

\bar{a_{\cdot \cdot}} = \frac{1}{n^{2}} \sum_{k, l = 1}^{n} a_{k l}, \bar{b_{\cdot \cdot}} = \frac{1}{n^{2}} \sum_{k, l = 1}^{n} b_{k l} .

Similarly,

V_{n} (h_{i})

V_{n} (h_{j})

can be defined as

(14)

V_{n}^{2} (h_{i}) = \frac{1}{n^{2}} \sum_{k, l = 1}^{n} A_{k l}^{2},

(15)

V_{n}^{2} (h_{j}) = \frac{1}{n^{2}} \sum_{k, l = 1}^{n} B_{k l}^{2} .

Based on Eqs. (9), (14), and (15), the two feature vectors empirical distance correlation

R_{n} (h_{i}, h_{j})

can be obtained with Eq. (16)

(16)

R_{n}^{2} (h_{i}, h_{j}) = {\begin{cases} \frac{V_{n}^{2} (h_{i}, h_{j})}{\sqrt{V_{n}^{2} (h_{i}) V_{n}^{2} (h_{j})}}, V_{n}^{2} (h_{i}) V_{n}^{2} (h_{j}) > 0; \\ 0, V_{n}^{2} (h_{i}) V_{n}^{2} (h_{j}) = 0. \end{cases}

With Eq. (16), the enhancement matrix can be obtained and represented as

(17)

\begin{array}{l} R = [\begin{matrix} R_{n}^{2} (h_{1}, h_{1}) & R_{n}^{2} (h_{1}, h_{2}) & \dots & R_{n}^{2} (h_{1}, h_{m L}) \\ R_{n}^{2} (h_{2}, h_{1}) & R_{n}^{2} (h_{2}, h_{2}) & \dots & R_{n}^{2} (h_{2}, h_{m L}) \\ ⋮ & ⋮ & ⋮ \\ R_{n}^{2} (h_{m L}, h_{1}) & R_{n}^{2} (h_{m L}, h_{2}) & \dots & R_{n}^{2} (h_{m L}, h_{m L}) \end{matrix}] . \end{array}

(3) Association embedding: It aims to further enhance feature representation of

X

by aggregating the first enhancement result

B

with the enhancement matrix

R

Let

B_{i j}

and

R_{i j}

denote the

i

th row and

j

th column of the matrix

B

and

R

respectively,

X^{'}

be the final enhanced data representation. Then the element

X_{i j}^{'}

of the

i

th row and

j

th column of the matrix

X^{'}

can be computed by

(18)

\begin{aligned} X_{i j}^{'} & = \sum_{m {\overset{´}{}}^{'} = 0}^{m - 1} \frac{1}{1!} R_{i k_{1}} B_{i k_{1}} + \sum_{m {\overset{´}{}}^{'} = 0}^{m - 1} \frac{1}{2!} R_{i k_{2}} B_{i k_{2}} + \dots + \\ \sum_{m {\overset{´}{}}^{'} = 0}^{m - 1} \frac{1}{(n)!} R_{i k_{n}} B_{i k_{n}} + \dots \\ = \sum_{k = 1}^{\infty} w_{k} R_{k j} B_{k j} \\ = \sum_{k = 1}^{m L} w_{k} R_{k j} B_{k j} + ϵ \\ \approx \sum_{k = 1}^{m L} w_{k} R_{k j} B_{k j}, \end{aligned}

where

w = [w_{1}, w_{2}, \dots, w_{k}, \dots, w_{(m L - 1)}, w_{m L}] =

[\underset{m}{\underset{⏟}{1 / (1!), 1 / (2!), \dots, 1 / (L!)}}] \in R^{m L}

k_{l} = m {\overset{´}{}}^{'} L + l

ϵ

is an infinitesimal.

Further, let

R_{\cdot j}

denote the

j

th column of the matrix

R

B_{i \cdot}

denote the

i

th row of the matrix

R

, and

\otimes

denote the element-wise product. Then

X_{i j}^{'}

can be computed in the form of vector inter product by

(19)

\begin{array}{l} X_{i j}^{'} \approx B_{i \cdot} (w^{T} \otimes R_{\cdot j}) . \end{array}

Let

W = [w; w; \dots; w] \in R^{n \times m L}

, then

E

can be computed in the form of matrix multiplication by

(20)

\begin{array}{l} X^{'} \approx B (W \otimes R) . \end{array}

The behavior of Eq. (18) is similar to the self-attention mechanism [42]. Specifically, the association matrix

R

in Eq. (18) corresponds to the similarity

A

of the query matrix

K

and key matrix

Q

in the self-attention mechanism, i.e.,

A = Q K^{T}

A_{i j}

denotes that the similarity between feature

i

and

j

, and the similarity based on the inner product of vectors can be thought as a measure of the linear relationship; While

R_{i j}

denotes that association between feature

i

and

j

, and its values can more complex relationship via some advanced association mining technique.

A V

corresponds to

B R

where

V

denotes the values in the self-attention mechanism. Noting that the power functions in relationship boosting process make the feature values dramatically. Inspired by Taylor’s Formula, a reweighting strategy is used to relieve the problem, i.e.,

W \otimes R

. The vast success of self attention in various tasks have proven the effectiveness of the mechanism.

AssoRep algorithm only is a presentation method and its output is

X^{'}

. So, to finish some downstream tasks such as classification, clustering, the AssoRep algorithm must combine with existing machine learning algorithms. The combining process is very simple, we do not need any modification for existing machine learning algorithms. In this following, we gave the steps in the context of supervised learning.

For supervised learning task, we first need to combine the enhanced representation

E

and the label set

Y

, and obtain a new data set

D^{'}

. It can be represented as

D^{'} = (X^{'}, Y) .

Let

L (D)

be a supervised machine learning model to be combined and it takes

D

as input. Then we only let

L

take

D^{'}

as input, i.e.,

L (D^{'})

, the process of combining AssoRep algorithm with the supervised algorithm

L

is achieved. We can instantiate

L

with different classifiers such as logistic regression, support vector machine and random forest.

It should be noticed that the relationship boosting process in AssoRep algorithm causes dimension increment of the new representation obtained by AssoRep. To address this issue, principal component analysis (PCA) is used.

In summary, the efficiency of AssoRep comes from two aspects. The first is that dCor is high-efficiency than NMI, MIC, and MIC

_{e}

. The second is that the dimension of the new representation is reduced with PCA. With these advantage, the AssoRep has many potential applications such as drug properties prediction, recommended system. Taking the drug properties prediction for example, there exist the complex relationships among different types structure descriptors [43], these relationship information can be fully used to improve the molecular representations via the AssoRep.

4 Experiment

This section aims to validate the effectiveness of AssoRep on classification task from four perspective: comparison analysis on datasets with different sample size, generality coupled with the existing classification algorithms, comparison with other other feature enhancement methods, and efficiency analysis on different association mining methods. For most datasets, 10-fold cross validation is adopted for all approaches to compute the mean of each performance metric. For few of datasets, the classification algorithms are very unstable when 10-fold cross validation is adopted, according to demands,

2 \times 5

-fold or

5 \times 2

-fold cross validation is adopted.

4.1 Evaluation metrics

To measure the performance of a classification result, we employ five frequently-used metrics [44]: accuracy (

A C

), precision (

P E

), recall (

R E

F 1

score, and kappa (

K

). The larger values of these five evaluation measures indicate a better classification performance. They are defined as follows.

(21)

{\begin{cases} A C = \frac{T P + T N}{n}, \\ P E = \frac{T P}{T P + F P}, \\ R E = \frac{T P}{T P + F N}, \\ F 1 = \frac{2 P E \times R E}{P E + R E}, \\ K = \frac{p_{o} - p_{e}}{1 - p_{e}}, \end{cases}

where

●

T P

denotes the number of true positives;

●

T N

denotes the number of true negatives;

●

F P

denotes the number of false positives;

●

F N

denotes the number of false negatives;

●

n = T P + T N + F N + F P

;

●

p_{o} = A C

is the empirical probability of agreement on the label assigned to any sample (the observed agreement ratio), and

p_{e} = \frac{(T P + F N) \times (T P + F P)}{n} + \frac{(F P + T N) \times (F N + T N)}{n}

is the expected agreement when both annotators assign labels randomly.

4.2 Experimental results on 120 benchmark datasets with different sample size

The quality of the association matrix in Eq. (17) is key for performance guarantee of the AssoRep. Given any two random variables, more observation values (sample size) of the two random variables are, the more accurate the association degree measured via one association mining method is [45]. Hence, the main factor that influences on the performance of AssoRep is the sample size of datasets. For comprehensively showing the behavior of the AssoRep on the datasets with different sample size, we report its results on 120 datasets whose sample size vary from 10 to 67557. Based on the sample size, these datasets are equally divided into two groups:

● Group 1: The number of sample

n

is larger than 700;

● Group 2: The number of sample

n

is smaller than 700.

To a fair comparison, 115 datasets out of 120 directly use the pre-processed ones by Fernandez et al. [46]. AWArgsift-hist [47], MM-IMDB-T [48], MM-IMDB-I [48], and Gesture-R [49] are used as vector features for adaptation to logistic regression algorithm.

All experiments are carried out in Python 3.6 on a server with an AMD EPYC 7542 32-Core Processor with 755 G RAM. The combined algorithms are from the Scikit-learn python library [50].

**4.2.1 Results on the Group 1**

In this experiment, we aim to validate the effectiveness of AssoRep on 60 datasets with larger sample size. Tab.1 displays the detailed characteristics of each dataset including number of examples (

n

), number of features (

d

), and number of class labels (

L

). As shown in Tab.1, the sample size

n

varies from 748 to 67557. Specifically, let

L (D)

and

L (D^{'})

be the algorithms that learn from the original data representation and AssoRep data representation, respectively. Then,

L

takes value the logistic regression algorithm (LR), we compare LR

(D)

with LR

(D^{'})

on 60 benchmark datasets. The experimental results are shown in Tab.2 and Tab.3 where LR

(D)

and LR

(D^{'})

denote that classifier LR learns from the original data representation

D

and AssoRep data representation

D^{'}

, respectively. For each metric of each data set, the best result between LR

(D)

and LR

(D^{'})

is marked with the bold font.

**Tab.1 Characteristics of the first group of datasets whose sample sizes are larger than 700 (Group 1)**

ID	Dataset	$n$	$d$	$L$	ID	Dataset	$n$	$d$	$L$	ID	Dataset	$n$	$d$	$L$
L1	abalone	4177	8	3	L2	adult	48842	14	2	L3	annealing	798	38	6
L4	bank	4521	17	2	L5	blood	748	4	2	L6	car	1728	6	4
L7	ctg-10classes	2126	21	10	L8	ctg-3classes	2126	21	3	L9	chess-krvk	28056	6	18
L10	chess-krvkp	3196	36	2	L11	connect-4	67557	42	2	L12	contrac	1473	9	3
L13	energy-y1	768	8	3	L14	wav-mfcc	15352	80	1215	L15	led-display	1000	7	10
L16	letter	20000	16	26	L17	magic	19020	10	2	L18	mammographic	961	5	2
L19	molec-biol-splice	3190	60	3	L20	monks-3	3190	6	2	L21	mushroom	8124	21	2
L22	musk-2	6598	166	2	L23	nursery	12960	8	5	L24	oocMerl2F	1022	25	3
L25	oocMerl4D	1022	41	2	L26	oocTris2F	912	25	2	L27	oocTris5B	912	32	3
L28	optical	3823	62	10	L29	ozone	2536	72	2	L30	page-blocks	5473	10	5
L31	pendigits	7494	16	10	L32	pima	768	5	2	L33	plant-margin	1600	64	100
L34	plant-shape	1600	64	100	L35	plant-texture	1600	36	100	L89	ringnorm	7400	20	2
L37	semeion	1593	256	10	L38	spambase	4601	57	2	L39	st-german-credit	1000	24	2
L40	st-image	2310	18	7	L41	st-landsat	4435	36	6	L42	st-shuttle	43500	9	7
L43	st-vehicle	846	18	4	L44	steel-plates	1941	27	7	L45	thyroid	3772	21	3
L46	tic-tac-toe	958	9	2	L47	titanic	2201	3	2	L48	twonorm	7400	20	2
L49	wall-following	5456	24	4	L50	waveform	5000	21	3	L51	wine-quality-red	1599	11	6
L52	wine-quality-white	4898	11	7	L53	yeast	1484	8	10	L54	robotnavigation	5456	25	4
L55	AWArgsift-hist	3048	2000	10	L56	UJIndoorLoc	21048	520	5	L57	MM-IMDB-T	7799	600	2
L58	MM-IMDB-I	7799	2048	2	L59	YouTubeFaces4	5074	838	31	L60	Gesture-R	4977	2048	83

Tab.2 Classification performance comparison between LR $(D)$ and LR $(D^{'})$ on benchmark datasets L1-L40

Data	Accuracy		Precision		Recall		F1		Kappa
Data	LR $(D)$	LR $(D^{'})$	LR $(D)$	LR $(D^{'})$	LR $(D)$	LR $(D^{'})$	LR $(D)$	LR $(D^{'})$	LR $(D)$	LR $(D^{'})$
L1	0.647±0.020	0.662±0.021 $∙$	0.636±0.021	0.652±0.022 $∙$	0.642±0.020	0.658±0.021 $∙$	0.636±0.021	0.653±0.021 $∙$	0.469±0.031	0.492±0.031 $∙$
L2	0.843±0.007	0.852±0.007 $∙$	0.796±0.010	0.809±0.012 $∙$	0.738±0.016	0.757±0.014 $∙$	0.759±0.014	0.777±0.013 $∙$	0.521±0.028	0.557±0.025 $∙$
L3	0.873±0.027	0.951±0.014 $∙$	0.792±0.116	0.924±0.077 $∙$	0.641±0.111	0.888±0.080 $∙$	0.678±0.113	0.901±0.077 $∙$	0.620±0.087	0.871±0.040 $∙$
L4	0.895±0.007	0.897±0.005 $\circ$	0.761±0.040	0.760±0.022 $\circ$	0.612±0.030	0.641±0.029 $∙$	0.644±0.035	0.675±0.027 $∙$	0.301±0.066	0.357±0.052 $∙$
L5	0.772±0.015	0.786±0.019 $\circ$	0.705±0.099	0.713±0.054 $\circ$	0.549±0.023	0.608±0.033 $∙$	0.535±0.036	0.621±0.040 $∙$	0.135±0.059	0.267±0.074 $∙$
L6	0.794±0.027	0.881±0.026 $∙$	0.577±0.123	0.809±0.090 $∙$	0.444±0.062	0.630±0.068 $∙$	0.470±0.079	0.667±0.075 $∙$	0.498±0.072	0.737±0.060 $∙$
L7	0.768±0.032	0.834±0.027 $∙$	0.762±0.054	0.837±0.041 $∙$	0.639±0.039	0.782±0.039 $∙$	0.668±0.042	0.796±0.035 $∙$	0.720±0.039	0.802±0.033 $∙$
L8	0.894±0.018	0.912±0.018 $∙$	0.827±0.048	0.861±0.044 $∙$	0.775±0.046	0.827±0.035 $∙$	0.796±0.042	0.841±0.035 $∙$	0.701±0.054	0.754±0.053 $∙$
L9	0.282±0.008	0.351±0.010 $∙$	0.242±0.038	0.337±0.041 $∙$	0.203±0.011	0.299±0.018 $∙$	0.188±0.010	0.294±0.021 $∙$	0.179±0.009	0.264±0.012 $∙$
L10	0.970±0.012	0.971±0.014 $\circ$	0.970±0.013	0.971±0.014 $\circ$	0.970±0.012	0.971±0.014 $\circ$	0.970±0.012	0.971±0.014 $\circ$	0.940±0.025	0.941±0.027 $\circ$
L11	0.754±0.000	0.830±0.004 $∙$	0.720±0.091	0.784±0.006 $∙$	0.502±0.001	0.728±0.006 $∙$	0.434±0.002	0.748±0.006 $∙$	0.005±0.002	0.499±0.012 $∙$
L12	0.507±0.042	0.568±0.055 $∙$	0.491±0.053	0.550±0.066 $∙$	0.472±0.044	0.531±0.055 $∙$	0.474±0.047	0.533±0.058 $∙$	0.221±0.067	0.318±0.085 $∙$
L13	0.874±0.013	0.881±0.012 $\circ$	0.847±0.026	0.862±0.022 $\circ$	0.786±0.020	0.796±0.020 $\circ$	0.795±0.023	0.807±0.024 $\circ$	0.792±0.021	0.804±0.020 $\circ$
L14	0.231±0.011	0.281±0.007 $∙$	0.137±0.009	0.164±0.007 $∙$	0.166±0.010	0.209±0.008 $∙$	0.142±0.009	0.174±0.007 $∙$	0.229±0.011	0.280±0.007 $∙$
L15	0.735±0.040	0.735±0.040 $\circ$	0.745±0.039	0.745±0.039 $\circ$	0.736±0.040	0.736±0.040 $\circ$	0.731±0.038	0.731±0.038 $\circ$	0.705±0.045	0.705±0.045 $\circ$
L16	0.723±0.013	0.846±0.009 $∙$	0.725±0.013	0.849±0.009 $∙$	0.721±0.013	0.845±0.009 $∙$	0.720±0.013	0.846±0.009 $∙$	0.712±0.014	0.840±0.010 $∙$
L17	0.791±0.006	0.850±0.008 $∙$	0.782±0.009	0.845±0.009 $∙$	0.745±0.007	0.820±0.011 $∙$	0.756±0.007	0.829±0.010 $∙$	0.517±0.014	0.660±0.019 $∙$
L18	0.823±0.035	0.832±0.035 $\circ$	0.825±0.035	0.834±0.034 $\circ$	0.823±0.035	0.831±0.034 $\circ$	0.822±0.035	0.831±0.035 $\circ$	0.645±0.070	0.663±0.069 $\circ$
L19	0.835±0.018	0.951±0.013 $∙$	0.819±0.020	0.942±0.014 $∙$	0.831±0.021	0.949±0.012 $∙$	0.824±0.020	0.945±0.013 $∙$	0.735±0.029	0.920±0.021 $∙$
L20	0.761±0.123	0.930±0.067 $∙$	0.777±0.127	0.937±0.063 $∙$	0.761±0.124	0.930±0.067 $∙$	0.757±0.125	0.929±0.068 $∙$	0.521±0.246	0.859±0.135 $∙$
L21	0.947±0.009	1.000±0.000 $∙$	0.947±0.009	1.000±0.000 $∙$	0.946±0.009	1.000±0.000 $∙$	0.947±0.009	1.000±0.000 $∙$	0.893±0.018	1.000±0.000 $∙$
L22	0.949±0.005	0.945±0.005 $\circ$	0.921±0.011	0.921±0.012 $\circ$	0.878±0.015	0.858±0.016 $\circ$	0.898±0.011	0.885±0.012 $\circ$	0.795±0.021	0.771±0.023 $\circ$
L23	0.899±0.007	0.916±0.007 $∙$	0.649±0.056	0.660±0.056 $∙$	0.664±0.057	0.676±0.057 $∙$	0.656±0.056	0.668±0.056 $∙$	0.851±0.010	0.876±0.010 $∙$
L24	0.918±0.021	0.930±0.021 $\circ$	0.881±0.046	0.923±0.034 $∙$	0.893±0.054	0.919±0.038 $\circ$	0.883±0.045	0.919±0.032 $∙$	0.823±0.045	0.847±0.047 $\circ$
L25	0.796±0.036	0.837±0.020 $∙$	0.788±0.051	0.819±0.038 $∙$	0.731±0.045	0.803±0.028 $∙$	0.746±0.047	0.809±0.030 $∙$	0.499±0.092	0.619±0.061 $∙$
L26	0.797±0.030	0.836±0.030 $∙$	0.800±0.033	0.834±0.029 $∙$	0.787±0.031	0.829±0.035 $∙$	0.789±0.031	0.830±0.032 $∙$	0.580±0.061	0.661±0.064 $∙$
L27	0.924±0.021	0.930±0.024 $\circ$	0.866±0.151	0.915±0.109 $\circ$	0.828±0.140	0.897±0.109 $\circ$	0.840±0.141	0.900±0.107 $\circ$	0.846±0.044	0.858±0.050 $\circ$
L28	0.964±0.016	0.968±0.013 $\circ$	0.965±0.016	0.969±0.013 $\circ$	0.964±0.016	0.968±0.013 $\circ$	0.964±0.016	0.968±0.013 $\circ$	0.960±0.018	0.965±0.015 $\circ$
L29	0.969±0.008	0.966±0.011 $\circ$	0.570±0.173	0.743±0.176 $∙$	0.533±0.074	0.584±0.045 $∙$	0.542±0.104	0.611±0.060 $∙$	0.092±0.205	0.226±0.120 $∙$
L30	0.954±0.003	0.959±0.004 $\circ$	0.862±0.043	0.842±0.049 $\circ$	0.659±0.039	0.701±0.029 $∙$	0.725±0.044	0.753±0.030 $∙$	0.720±0.023	0.763±0.024 $∙$
L31	0.943±0.010	0.983±0.004 $∙$	0.943±0.010	0.983±0.005 $∙$	0.943±0.010	0.983±0.004 $∙$	0.942±0.010	0.983±0.005 $∙$	0.937±0.011	0.981±0.005 $∙$
L32	0.779±0.029	0.779±0.029 $\circ$	0.768±0.039	0.768±0.039 $\circ$	0.734±0.029	0.734±0.029 $\circ$	0.743±0.031	0.743±0.031 $\circ$	0.490±0.062	0.490±0.062 $\circ$
L33	0.747±0.025	0.798±0.022 $∙$	0.724±0.025	0.779±0.019 $∙$	0.750±0.025	0.796±0.022 $∙$	0.714±0.023	0.767±0.020 $∙$	0.745±0.026	0.796±0.023 $∙$
L34	0.509±0.032	0.564±0.038 $∙$	0.444±0.033	0.502±0.045 $∙$	0.518±0.030	0.569±0.035 $∙$	0.446±0.033	0.501±0.042 $∙$	0.504±0.032	0.560±0.038 $∙$
L35	0.809±0.018	0.839±0.028 $∙$	0.789±0.028	0.823±0.047 $∙$	0.810±0.022	0.841±0.033 $∙$	0.776±0.025	0.811±0.040 $∙$	0.807±0.018	0.837±0.028 $∙$
L36	0.760±0.016	0.986±0.005 $∙$	0.763±0.015	0.986±0.005 $∙$	0.760±0.016	0.986±0.005 $∙$	0.759±0.016	0.986±0.005 $∙$	0.520±0.032	0.972±0.010 $∙$
L37	0.890±0.031	0.927±0.019 $∙$	0.896±0.032	0.932±0.018 $∙$	0.889±0.031	0.927±0.019 $∙$	0.889±0.032	0.927±0.019 $∙$	0.878±0.035	0.919±0.021 $∙$
L38	0.925±0.011	0.932±0.008 $∙$	0.925±0.012	0.931±0.010 $∙$	0.919±0.012	0.927±0.008 $∙$	0.921±0.012	0.929±0.009 $∙$	0.843±0.023	0.857±0.018 $∙$
L39	0.761±0.040	0.771±0.040 $\circ$	0.716±0.053	0.733±0.054 $∙$	0.684±0.060	0.685±0.062 $\circ$	0.691±0.062	0.695±0.065 $\circ$	0.389±0.117	0.400±0.122 $\circ$
L40	0.913±0.019	0.929±0.014 $∙$	0.917±0.016	0.930±0.013 $∙$	0.913±0.019	0.929±0.014 $∙$	0.913±0.018	0.928±0.013 $∙$	0.898±0.022	0.917±0.016 $∙$

Tab.3 Classification performance comparison between LR $(D)$ and LR $(D^{'})$ on benchmark datasets L41-L60

Data	Accuracy		Precision		Recall		F1		Kappa
Data	LR $(D)$	LR $(D^{'})$	LR $(D)$	LR $(D^{'})$	LR $(D)$	LR $(D^{'})$	LR $(D)$	LR $(D^{'})$	LR $(D)$	LR $(D^{'})$
L41	0.838±0.009	0.887±0.009 $∙$	0.803±0.029	0.867±0.012 $∙$	0.757±0.009	0.858±0.008 $∙$	0.751±0.011	0.861±0.009 $∙$	0.797±0.011	0.860±0.011 $∙$
L42	0.930±0.002	0.992±0.001 $∙$	0.522±0.079	0.780±0.116 $∙$	0.488±0.078	0.633±0.093 $∙$	0.501±0.078	0.653±0.087 $∙$	0.783±0.008	0.978±0.003 $∙$
L43	0.792±0.024	0.819±0.024 $∙$	0.790±0.029	0.821±0.024 $∙$	0.794±0.025	0.821±0.024 $∙$	0.786±0.027	0.819±0.025 $∙$	0.723±0.032	0.759±0.032 $∙$
L44	0.706±0.026	0.744±0.023 $∙$	0.731±0.049	0.773±0.028 $∙$	0.695±0.042	0.756±0.043 $∙$	0.702±0.048	0.758±0.033 $∙$	0.619±0.033	0.671±0.029 $∙$
L45	0.950±0.004	0.960±0.010 $∙$	0.876±0.063	0.897±0.074 $\circ$	0.664±0.041	0.705±0.088 $\circ$	0.672±0.025	0.763±0.083 $∙$	0.528±0.005	0.644±0.099 $∙$
L46	0.983±0.016	0.983±0.016 $\circ$	0.988±0.012	0.988±0.012 $\circ$	0.976±0.023	0.976±0.023 $\circ$	0.981±0.018	0.981±0.018 $\circ$	0.962±0.037	0.962±0.037 $\circ$
L47	0.776±0.019	0.778±0.019 $\circ$	0.760±0.025	0.763±0.025 $\circ$	0.700±0.029	0.703±0.028 $\circ$	0.714±0.030	0.718±0.030 $\circ$	0.437±0.056	0.444±0.055 $\circ$
L48	0.979±0.006	0.979±0.006 $\circ$	0.979±0.006	0.979±0.006 $\circ$	0.979±0.006	0.979±0.006 $\circ$	0.979±0.006	0.979±0.006 $\circ$	0.957±0.012	0.957±0.012 $\circ$
L49	0.688±0.013	0.922±0.011 $∙$	0.690±0.036	0.921±0.016 $∙$	0.593±0.023	0.918±0.019 $∙$	0.622±0.027	0.919±0.016 $∙$	0.514±0.021	0.882±0.017 $∙$
L50	0.869±0.015	0.869±0.015 $\circ$	0.869±0.015	0.869±0.015 $\circ$	0.869±0.015	0.869±0.015 $\circ$	0.868±0.015	0.868±0.015 $\circ$	0.803±0.023	0.803±0.023 $\circ$
L51	0.592±0.030	0.604±0.044 $\circ$	0.277±0.043	0.296±0.034 $\circ$	0.253±0.017	0.280±0.028 $∙$	0.246±0.022	0.281±0.031 $∙$	0.316±0.052	0.351±0.073 $\circ$
L52	0.537±0.014	0.541±0.018 $\circ$	0.289±0.047	0.365±0.157 $\circ$	0.228±0.018	0.257±0.049 $∙$	0.221±0.018	0.264±0.064 $∙$	0.234±0.026	0.260±0.029 $∙$
L53	0.588±0.044	0.611±0.030 $∙$	0.568±0.092	0.552±0.065 $\circ$	0.485±0.059	0.533±0.050 $∙$	0.499±0.064	0.529±0.054 $\circ$	0.458±0.059	0.493±0.041 $∙$
L54	0.688±0.013	0.900±0.014 $∙$	0.690±0.036	0.903±0.021 $∙$	0.593±0.023	0.893±0.019 $∙$	0.622±0.027	0.897±0.018 $∙$	0.514±0.021	0.849±0.022 $∙$
L55	0.137±0.011	0.192±0.019 $∙$	0.109±0.014	0.154±0.017 $∙$	0.109±0.010	0.157±0.017 $∙$	0.103±0.009	0.149±0.015 $∙$	0.113±0.012	0.170±0.020 $∙$
L56	0.930±0.005	0.981±0.002 $∙$	0.933±0.005	0.983±0.003 $∙$	0.932±0.007	0.982±0.003 $∙$	0.932±0.005	0.982±0.003 $∙$	0.909±0.007	0.976±0.003 $∙$
L57	0.709±0.021	0.725±0.018 $∙$	0.708±0.022	0.725±0.018 $∙$	0.707±0.022	0.722±0.019 $∙$	0.707±0.022	0.722±0.019 $∙$	0.415±0.043	0.445±0.037 $∙$
L58	0.612±0.021	0.644±0.014 $∙$	0.610±0.021	0.645±0.014 $∙$	0.608±0.021	0.638±0.014 $∙$	0.608±0.021	0.637±0.014 $∙$	0.218±0.042	0.279±0.028 $∙$
L59	0.470±0.026	0.496±0.020 $∙$	0.492±0.032	0.515±0.024 $∙$	0.443±0.031	0.479±0.018 $∙$	0.453±0.030	0.486±0.018 $∙$	0.412±0.030	0.441±0.020 $∙$
L60	0.928±0.005	0.936±0.007 $∙$	0.937±0.005	0.943±0.006 $∙$	0.928±0.006	0.936±0.007 $∙$	0.928±0.005	0.935±0.007 $∙$	0.928±0.006	0.935±0.007 $∙$

The following observations can be made from Tab.2 and Tab.3:

1. LR

(D^{'})

is statistically much better than LR

(D)

in term of each performance metric. In these 60 datasets, LR

(D^{'})

gets the much higher values of accuracy, precision, recall, F1 and kappa for 55, 53, 55, 55, and 55 datasets, respectively, while LR

(D)

only get the best ones for 2, 2, 1, 1, and 1 datasets, respectively. Even for the best cases for LR

(D)

, the classification performance of LR

(D^{'})

is very close to those of the LR

(D)

. It is worth noting that LR

(D^{'})

can statistically and clearly improve the each index on most of the datasets. For example, LR

(D^{'})

achieves a larger improvement of 0.986−0.760=0.226, 0.986−0.763=0.223, 0.986−0.760=0.226, 0.986−0.759=0.227, 0.972−0.520=0.452 on the dataset L36 in term of the accuracy, precision, recall, F1 and kappa, respectively. Especially, based on the new representation obtained by the AssoRep on the L21 dataset, all performance metrics of LR increase from 0.947, 0.947, 0.946, 0.947, and 0.893 to 1 respectively.

2. Moreover, the AssoRep method tends to perform better on the original data representation with a lower performance. For example, when the representation ability of the dataset L9 is enhanced via the AssoRep, its accuracy markedly increases from the 0.282 to 0.351; while the AssoRep has not obtained a performance improvement on the datasets L10 with the accuracy of 0.970 and L48 with the accuracy of 0.979.

Furthermore, we apply the paired

t

-test to assess whether the LR

(D^{'})

performs significantly better than the LR

(D)

. Specifically, given two compared algorithms

a

and

b

, an evaluation metric

m

. We run each algorithm

k

times, algorithms

a

gets

k

evaluation metric values

m_{1}^{a}, m_{2}^{a}, \dots, m_{k}^{a}

in terms of

m

, algorithms

b

gets

k

evaluation metric values

m_{1}^{b}, m_{2}^{b}, \dots, m_{k}^{b}

in terms of

m

. The mean value and standard deviation value of

△_{1}, △_{2}, \dots, △_{k}

are denoted as

μ

and

σ

, respectively, where

△_{i} = m_{i}^{a} - m_{i}^{b}

. It follows a t distribution with

k - 1

numerator degrees of freedom, deified as

τ_{t} = | \frac{\sqrt{k} μ}{σ} |

In this paper, its null hypothesis that algorithms

a

and

b

have the same performance is rejected if the returned

p

-value is less than the specified significance level 5%. The results are recorded in Tab.2 and Tab.3, in which

∙

\circ

, and

⊙

denote that AssoRep is better/tied/worse than the corresponding methods by the paired

t

-test with confidence level 5%, respectively.

As shown in Tab.2 and Tab.3, LR

(D^{'})

is significantly better than the LR

(D)

on 40, 41, 45, 46, and 45 of 60 datasets, while no case that LR

(D)

is significantly better than the LR

(D^{'})

happened at signification level

α = 5 %

. The results validate that enhancing representation with association among features is indeed effective on the datasets with the larger sample size.

**4.2.2 Results on Group 2**

The experiment aims to show the behavior of AssoRep on smaller sample size data. To this end, we use 60 datasets shown in Tab.4 where the detailed characteristics of each dataset including number of examples (

n

), number of features (

d

), and number of class labels (

L

) are displayed. As shown in Tab.4, the sample size

n

varies from 10 to 690. The experimental settings are the same as that on Group 1. The experimental results are reported in Tab.5 and Tab.6.

**Tab.4 Characteristics of the second group of datasets whose the numbers are smaller than 700 (Group 2)**

ID	Dataset	$n$	$d$	$L$	ID	Dataset	$n$	$d$	$L$	ID	Dataset	$n$	$d$	$L$
S1	ac-inflam	120	6	2	S2	acute-nephritis	120	6	2	S3	arrhythmia	452	262	13
S4	audiology-std	226	59	18	S5	balance-scale	625	4	3	S6	balloons	16	4	2
S7	breast-cancer	286	9	2	S8	conn-bench-sonar	208	60	2	S9	conn-bench-vowel	528	11	11
S10	credit-approval	690	15	2	S11	cylinder-bands	512	35	2	S12	dermatology	366	34	6
S13	echocardiogram	131	10	2	S14	ecoli	336	7	8	S15	fertility	100	9	2
S16	flag	194	28	8	S17	glass	214	9	6	S18	haberman-survival	306	3	2
S19	hayes-roth	132	3	3	S20	heart-cleveland	303	13	5	S21	heart-hungarian	294	12	2
S22	heart-switzerland	123	12	2	S23	heart-va	200	12	5	S24	hepatitis	155	19	2
S25	hill-valley	606	100	2	S26	horse-colic	300	25	2	S27	ilpd-indian-liver	583	9	2
S28	image-segmentation	210	19	7	S29	ionosphere	351	33	2	S30	iris	150	4	3
S31	lenses	24	4	3	S32	low-res-spect	531	100	9	S33	lung-cancer	32	56	3
S34	lymphography	148	18	4	S35	molec-biol-promoter	106	57	2	S36	monks-1	124	6	2
S37	monks-2	169	6	2	S38	musk-1	476	166	2	S39	parkinsons	195	22	2
S40	pb-MATERIAL	106	4	3	S41	pb-REL-L	103	4	3	S42	pb-SPAN	92	4	3
S43	pb-T-OR-D	102	4	3	S44	pb-TYPE	105	4	3	S45	planning	182	12	2
S46	post-operative	90	8	3	S47	primary-tumor	330	17	15	S48	seeds	210	7	3
S49	soybean	307	35	18	S50	spect	80	22	2	S51	spectf	80	44	2
S52	st-australian-credit	690	14	2	S53	st-heart	270	13	2	S54	synthetic-control	600	60	6
S55	teaching	151	5	3	S56	trains	10	28	2	S57	vc-2classes	310	6	2
S58	vc-3classes	310	6	3	S59	wine	179	13	3	S60	zoo	101	16	7

Tab.5 Classification performance comparison between LR $(D)$ and LR $(D^{'})$ on benchmark datasets S1-S40

Data	Accuracy		Precision		Recall		F1		Kappa
Data	LR $(D)$	LR $(D^{'})$	LR $(D)$	LR $(D^{'})$	LR $(D)$	LR $(D^{'})$	LR $(D)$	LR $(D^{'})$	LR $(D)$	LR $(D^{'})$
S1	1.000±0.000	1.000±0.000 $\circ$	1.000±0.000	1.000±0.000 $\circ$	1.000±0.000	1.000±0.000 $\circ$	1.000±0.000	1.000±0.000 $\circ$	1.000±0.000	1.000±0.000 $\circ$
S2	1.000±0.000	1.000±0.000 $\circ$	1.000±0.000	1.000±0.000 $\circ$	1.000±0.000	1.000±0.000 $\circ$	1.000±0.000	1.000±0.000 $\circ$	1.000±0.000	1.000±0.000 $\circ$
S3	0.661±0.048	0.695±0.037 $\circ$	0.427±0.084	0.452±0.071 $\circ$	0.384±0.083	0.415±0.058 $\circ$	0.389±0.080	0.414±0.054 $\circ$	0.457±0.082	0.494±0.066 $\circ$
S4	0.696±0.032	0.789±0.029 $∙$	0.493±0.047	0.542±0.061 $\circ$	0.490±0.067	0.582±0.082 $\circ$	0.473±0.045	0.549±0.067 $\circ$	0.648±0.038	0.750±0.035 $∙$
S5	0.862±0.028	0.922±0.005 $∙$	0.580±0.016	0.615±0.003 $∙$	0.624±0.021	0.667±0.000 $∙$	0.599±0.019	0.640±0.002 $∙$	0.745±0.052	0.855±0.008 $∙$
S6	0.619±0.048	0.730±0.159 $\circ$	0.480±0.195	0.601±0.315 $\circ$	0.588±0.088	0.688±0.188 $\circ$	0.515±0.152	0.623±0.260 $\circ$	0.171±0.171	0.385±0.385 $\circ$
S7	0.711±0.036	0.727±0.031 $\circ$	0.630±0.090	0.671±0.057 $\circ$	0.575±0.049	0.613±0.034 $∙$	0.569±0.067	0.619±0.039 $\circ$	0.175±0.116	0.257±0.076 $\circ$
S8	0.773±0.027	0.788±0.023 $\circ$	0.775±0.027	0.792±0.025 $\circ$	0.770±0.027	0.785±0.023 $\circ$	0.770±0.027	0.786±0.023 $\circ$	0.542±0.054	0.573±0.046 $\circ$
S9	0.557±0.046	0.801±0.019 $∙$	0.552±0.066	0.816±0.013 $∙$	0.556±0.050	0.801±0.019 $∙$	0.537±0.058	0.797±0.019 $∙$	0.512±0.051	0.781±0.021 $∙$
S10	0.858±0.041	0.861±0.040 $\circ$	0.861±0.038	0.864±0.037 $\circ$	0.863±0.038	0.867±0.038 $\circ$	0.857±0.041	0.861±0.039 $\circ$	0.717±0.080	0.723±0.078 $\circ$
S11	0.733±0.056	0.748±0.053 $\circ$	0.730±0.068	0.742±0.058 $\circ$	0.702±0.061	0.727±0.057 $\circ$	0.705±0.066	0.729±0.057 $\circ$	0.418±0.125	0.462±0.113 $\circ$
S12	0.978±0.027	0.978±0.027 $\circ$	0.981±0.022	0.981±0.022 $\circ$	0.976±0.030	0.976±0.030 $\circ$	0.975±0.030	0.975±0.030 $\circ$	0.972±0.034	0.972±0.034 $\circ$
S13	0.812±0.063	0.818±0.052 $\circ$	0.819±0.097	0.828±0.079 $\circ$	0.749±0.069	0.755±0.058 $\circ$	0.766±0.075	0.773±0.063 $\circ$	0.540±0.149	0.5353±0.124 $\circ$
S14	0.868±0.015	0.872±0.016 $\circ$	0.642±0.030	0.646±0.029 $\circ$	0.637±0.019	0.643±0.018	0.633±0.032	0.639±0.032 $\circ$	0.817±0.021	0.822±0.023 $\circ$
S15	0.854±0.047	0.850±0.053 $\circ$	0.438±0.012	0.438±0.013 $\circ$	0.485±0.026	0.483±0.029 $\circ$	0.460±0.014	0.459±0.016 $\circ$	−0.030±0.048	-0.035±0.052 $\circ$
S16	0.487±0.061	0.530±0.085 $\circ$	0.289±0.042	0.340±0.078 $\circ$	0.310±0.051	0.363±0.076 $\circ$	0.290±0.042	0.341±0.076 $\circ$	0.351±0.078	0.412±0.104 $\circ$
S17	0.620±0.054	0.660±0.051 $\circ$	0.484±0.093	0.532±0.084 $\circ$	0.483±0.075	0.542±0.078 $\circ$	0.472±0.076	0.525±0.076 $\circ$	0.462±0.076	0.520±0.069 $\circ$
S18	0.737±0.016	0.751±0.033 $\circ$	0.679±0.105	0.687±0.081 $\circ$	0.548±0.029	0.597±0.032 $\circ$	0.527±0.050	0.603±0.040 $\circ$	0.120±0.065	0.233±0.078 $\circ$
S19	0.544±0.062	0.844±0.032 $∙$	0.554±0.061	0.881±0.027 $∙$	0.584±0.069	0.856±0.035 $∙$	0.546±0.060	0.860±0.032 $∙$	0.302±0.099	0.759±0.050 $∙$
S20	0.583±0.028	0.589±0.030 $\circ$	0.303±0.054	0.329±0.078 $\circ$	0.310±0.042	0.318±0.040 $\circ$	0.301±0.046	0.314±0.048 $\circ$	0.306±0.049	0.311±0.054 $\circ$
S21	0.824±0.039	0.839±0.034 $\circ$	0.813±0.041	0.829±0.035 $\circ$	0.800±0.048	0.819±0.043 $\circ$	0.804±0.045	0.822±0.040 $\circ$	0.6100±0.090	0.645±0.079 $\circ$
S22	0.371±0.050	0.392±0.048 $\circ$	0.231±0.021	0.232±0.032 $\circ$	0.241±0.030	0.241±0.030 $\circ$	0.232±0.026	0.226±0.031 $\circ$	0.090±0.067	0.094±0.069 $\circ$
S23	0.326±0.058	0.336±0.071 $\circ$	0.255±0.064	0.303±0.082 $\circ$	0.272±0.059	0.303±0.067 $\circ$	0.256±0.058	0.294±0.071 $\circ$	0.111±0.078	0.127±0.093 $\circ$
S24	0.810±0.039	0.840±0.032 $\circ$	0.711±0.066	0.765±0.053 $\circ$	0.723±0.085	0.728±0.069 $\circ$	0.713±0.073	0.736±0.067 $\circ$	0.428±0.146	0.476±0.125 $\circ$
S25	0.660±0.032	0.700±0.030 $∙$	0.775±0.011	0.787±0.029 $\circ$	0.656±0.032	0.696±0.030 $∙$	0.615±0.050	0.672±0.040 $∙$	0.314±0.065	0.395±0.061 $∙$
S26	0.798±0.035	0.827±0.024 $∙$	0.786±0.044	0.822±0.028 $∙$	0.777±0.034	0.800±0.031 $∙$	0.780±0.036	0.807±0.029 $∙$	0.560±0.073	0.616±0.057 $∙$
S27	0.716±0.011	0.725±0.010 $\circ$	0.626±0.028	0.653±0.034 $∙$	0.563±0.018	0.558±0.015 $\circ$	0.557±0.024	0.545±0.023 $\circ$	0.154±0.040	0.147±0.037 $\circ$
S28	0.864±0.016	0.872±0.029 $\circ$	0.872±0.022	0.875±0.030 $\circ$	0.864±0.016	0.872±0.029 $\circ$	0.860±0.018	0.870±0.030 $\circ$	0.841±0.019	0.851±0.034 $\circ$
S29	0.880±0.046	0.920±0.042 $∙$	0.891±0.048	0.935±0.040 $∙$	0.851±0.055	0.894±0.052 $∙$	0.863±0.053	0.908±0.049 $∙$	0.729±0.104	0.818±0.096 $∙$
S30	0.907±0.053	0.973±0.033 $∙$	0.924±0.041	0.978±0.027 $∙$	0.907±0.053	0.973±0.033 $∙$	0.904±0.058	0.973±0.033 $∙$	0.860±0.080	0.960±0.049 $∙$
S31	0.764±0.057	0.792±0.080 $\circ$	0.717±0.125	0.782±0.085 $\circ$	0.716±0.116	0.774±0.107 $\circ$	0.671±0.108	0.743±0.094 $\circ$	0.574±0.103	0.637±0.131 $\circ$
S32	0.712±0.034	0.737±0.023 $\circ$	0.735±0.039	0.775±0.021 $∙$	0.712±0.034	0.737±0.023 $\circ$	0.701±0.037	0.731±0.025 $\circ$	0.691±0.037	0.718±0.025 $\circ$
S33	0.434±0.121	0.488±0.081 $\circ$	0.446±0.140	0.538±0.103 $∙$	0.455±0.115	0.496±0.082 $\circ$	0.430±0.123	0.489±0.081 $\circ$	0.157±0.172	0.219±0.122 $\circ$
S34	0.823±0.039	0.849±0.025 $\circ$	0.677±0.079	0.676±0.012 $\circ$	0.675±0.081	0.674±0.015 $\circ$	0.672±0.078	0.673±0.014 $\circ$	0.661±0.076	0.707±0.050 $\circ$
S35	0.781±0.042	0.834±0.034 $∙$	0.786±0.044	0.836±0.035 $∙$	0.781±0.043	0.834±0.033 $∙$	0.780±0.043	0.834±0.033 $∙$	0.562±0.085	0.668±0.067 $∙$
S36	0.669±0.070	0.726±0.068 $\circ$	0.674±0.074	0.742±0.070 $\circ$	0.669±0.071	0.725±0.070 $\circ$	0.666±0.071	0.718±0.080 $\circ$	0.337±0.141	0.450±0.141 $\circ$
S37	0.550±0.072	0.561±0.096 $\circ$	0.407±0.133	0.448±0.160 $\circ$	0.459±0.066	0.481±0.100 $\circ$	0.403±0.073	0.445±0.118 $\circ$	−0.089±0.145	-0.043±0.219 $\circ$
S38	0.857±0.088	0.891±0.039 $\circ$	0.858±0.052	0.891±0.039 $\circ$	0.857±0.088	0.894±0.038 $\circ$	0.855±0.055	0.890±0.039 $\circ$	0.711±0.108	0.780±0.078 $\circ$
S39	0.852±0.055	0.882±0.056 $∙$	0.822±0.080	0.855±0.076 $∙$	0.780±0.061	0.830±0.066 $∙$	0.794±0.068	0.839±0.069 $∙$	0.591±0.136	0.678±0.137 $∙$
S40	0.853±0.039	0.859±0.035 $\circ$	0.556±0.097	0.542±0.055 $\circ$	0.597±0.070	0.619±0.049 $\circ$	0.566±0.070	0.573±0.049 $\circ$	0.610±0.090	0.625±0.077 $\circ$

Tab.6 Classification performance comparison between LR $(D)$ and LR $(D^{'})$ on benchmark datasets S41-S60

Data	Accuracy		Precision		Recall		F1		Kappa
Data	LR $(D)$	LR $(D^{'})$	LR $(D)$	LR $(D^{'})$	LR $(D)$	LR $(D^{'})$	LR $(D)$	LR $(D^{'})$	LR $(D)$	LR $(D^{'})$
S41	0.652±0.075	0.675±0.081 $\circ$	0.485±0.089	0.475±0.066 $\circ$	0.508±0.071	0.516±0.068 $\circ$	0.487±0.077	0.486±0.067 $\circ$	0.371±0.138	0.402±0.151 $\circ$
S42	0.693±0.063	0.713±0.050 $\circ$	0.730±0.091	0.725±0.073 $\circ$	0.645±0.044	0.653 ±0.068 $\circ$	0.652±0.053	0.664±0.067 $\circ$	0.481±0.086	0.506±0.094 $\circ$
S43	0.868±0.037	0.882±0.052 $\circ$	0.683±0.202	0.759±0.170 $\circ$	0.615±0.100	0.706±0.140 $\circ$	0.625±0.124	0.712±0.137 $\circ$	0.270±0.230	0.431±0.269 $\circ$
S44	0.590±0.037	0.644±0.035 $∙$	0.414±0.075	0.579±0.055 $∙$	0.432±0.043	0.530±0.043 $∙$	0.404±0.046	0.524±0.039 $∙$	0.422±0.048	0.513±0.048 $∙$
S45	0.709±0.021	0.715±0.015 $\circ$	0.356±0.008	0.357±0.008 $\circ$	0.496±0.012	0.500±0.000 $\circ$	0.415±0.007	0.417±0.005 $\circ$	−0.010±0.031	0.000±0.000 $\circ$
S46	0.680±0.079	0.680±0.079 $\circ$	0.334±0.078	0.334±0.078 $\circ$	0.450±0.081	0.450±0.081 $\circ$	0.382±0.077	0.382±0.077 $\circ$	−0.040±0.110	−0.040±0.110 $\circ$
S47	0.503±0.100	0.509±0.041 $\circ$	0.341±0.118	0.331±0.081 $\circ$	0.370±0.117	0.383±0.081 $\circ$	0.335±0.114	0.337±0.082 $\circ$	0.428±0.118	0.439±0.076 $\circ$
S48	0.933±0.044	0.971±0.032 $∙$	0.939±0.042	0.976±0.026 $∙$	0.933±0.044	0.971±0.032 $∙$	0.932±0.044	0.971±0.032 $∙$	0.900±0.065	0.957±0.047 $∙$
S49	0.892±0.065	0.892±0.056 $\circ$	0.907±0.078	0.900±0.048 $\circ$	0.913±0.064	0.906±0.048 $\circ$	0.901±0.072	0.895±0.050 $\circ$	0.881±0.072	0.881±0.061 $\circ$
S50	0.645±0.057	0.686±0.043 $∙$	0.581±0.073	0.636±0.064 $∙$	0.571±0.064	0.592±0.042 $\circ$	0.570±0.070	0.589±0.052 $∙$	0.149±0.134	0.204±0.093 $∙$
S51	0.752±0.055	0.785±0.064 $\circ$	0.766±0.057	0.801±0.070 $\circ$	0.751±0.056	0.784±0.065 $\circ$	0.748±0.058	0.782±0.066 $\circ$	0.503±0.111	0.569±0.129 $\circ$
S52	0.667±0.011	0.668±0.012 $\circ$	0.458±0.085	0.562±0.044 $\circ$	0.497±0.008	0.522±0.015 $\circ$	0.418±0.012	0.484±0.025 $\circ$	−0.018±0.021	0.054±0.038 $\circ$
S53	0.839±0.054	0.850±0.051 $\circ$	0.842±0.057	0.854±0.051 $\circ$	0.836±0.055	0.845±0.052 $\circ$	0.836±0.056	0.847±0.053 $\circ$	0.673±0.111	0.694±0.104 $\circ$
S54	0.940±0.024	0.960±0.021 $\circ$	0.946±0.022	0.965±0.019 $\circ$	0.940±0.024	0.960±0.021 $\circ$	0.938±0.025	0.960±0.021 $\circ$	0.928±0.029	0.952±0.026 $\circ$
S55	0.506±0.048	0.511±0.025 $\circ$	0.508±0.049	0.518±0.027 $\circ$	0.509±0.049	0.513±0.025 $\circ$	0.497±0.047	0.509±0.024 $\circ$	0.261±0.072	0.268±0.038 $\circ$
S56	0.774±0.180	0.900±0.134 $\circ$	0.775±0.210	0.933±0.088 $\circ$	0.742±0.188	0.900±0.128 $\circ$	0.716±0.204	0.891±0.144 $\circ$	0.470±0.383	0.799±0.258 $\circ$
S57	0.842±0.042	0.848±0.058 $\circ$	0.824±0.049	0.829±0.064 $\circ$	0.823±0.058	0.828±0.078 $\circ$	0.819±0.050	0.825±0.071 $\circ$	0.639±0.101	0.651±0.141 $\circ$
S58	0.852±0.050	0.858±0.050 $\circ$	0.815±0.071	0.826±0.071 $\circ$	0.803±0.069	0.813±0.069 $\circ$	0.804±0.070	0.814±0.070 $\circ$	0.761±0.082	0.772±0.082 $\circ$
S59	0.983±0.026	0.994±0.017 $\circ$	0.983±0.026	0.994±0.017 $\circ$	0.986±0.021	0.995±0.014 $\circ$	0.983±0.025	0.994±0.017 $\circ$	0.974±0.039	0.992±0.025 $\circ$
S60	0.951±0.022	0.954±0.015 $\circ$	0.940±0.034	0.916±0.052 $\circ$	0.891±0.041	0.901±0.038 $\circ$	0.896±0.045	0.892±0.044 $\circ$	0.935±0.029	0.940±0.020 $\circ$

From Tab.5 and Tab.6, we have the following observations:

1. For each performance metric, LR

(D^{'})

tends to obtain better performance, which is consists with the results on Group 1. For example, the accuracy, precision, recall, F1 and kappa increase from 0.557, 0.552, 0.556, 0.537, and 0.512 to 0.801, 0.816, 0.801, 0.797, and 0.781, achieving the performance improvement of 24.4%, 26.4%, 24.5%, 26.0%, and 26.9%, respectively on the dataset S9. Overall, LR

(D^{'})

wins 258 times, ties 21 times, losses 21 times in the 300 experimental configurations (5 metrics

\times

60 datasets).

2. LR

(D^{'})

tends to have much smaller standard deviations than LR

(D)

, which suggests that the LR

(D^{'})

is much better robustness for small-scale data classification.

3. The AssoRep method often performs better on the original data representation with a lower performance. For example, when the representation ability of the dataset S19 is enhanced via the AssoRep, its accuracy is improved from the 0.544 to 0.844. This suggests that the association information between features is a good auxiliary information for representation learning.

Furthermore, we test whether the LR

(D^{'})

performs significantly better than the LR

(D)

via the paired

t

-test. As shown in Tab.5 and Tab.6, LR

(D^{'})

is significantly better than the LR

(D)

on 13, 14, 12, 12, and 13 datasets at signification level

α = 5 %

. Compared to the results on Group 1, the times that LR

(D^{'})

is significantly better than the LR

(D)

are obviously less. This is because that the association degree between some features may be unaccurately assessed via less samples. It is worth pointing out that no case that LR

(D)

is significantly better than the LR

(D^{'})

happened at signification level

α = 5 %

. The results suggest that the proposed association-based representation is also effective on the datasets with the smaller sample size, especially, the classification algorithm coupled with AssoRep is much better robustness for small-scale data classification.

In summary, the proposed AssoRep algorithm has been demonstrated to be effective for different sample size datasets via Group 1 and Group 2. This indicates the AssoRep is robust for different sample size datasets, hence it can be safely applied in various tasks.

4.3 Experimental results on different classifiers

In this section, we evaluate the performance of AssoRep by combining it with five different classifiers including support vector machine (SVM) [51], k-nearest neighbors (kNN) [52], random forest (RF) [53], perceptron [54], gaussian naive bayes (GaussianNB), i.e.,

L \in {S V M, k N N, R F, P e r c e p t,

G a u s s i a n N B}

. The experimental results are reported in Tab.7 where

L (D)

and

L (D^{'})

denote that classifier

L

learns from the original data representation

D

and AssoRep data representation

D^{'}

, respectively; For each metric of each dataset, the best result of

L (D)

and

L (D^{'})

on same algorithm and all algorithms are marked with bold font and underline, respectively.

Tab.7 Classification performance comparison between original and association-based enhancement representation using different classifiers

Data	Accuracy		Precision		Recall		F1		Kappa
Data	SVM $(D)$	SVM $(D^{'})$	SVM $(D)$	SVM $(D^{'})$	SVM $(D)$	SVM $(D^{'})$	SVM $(D)$	SVM $(D^{'})$	SVM $(D)$	SVM $(D^{'})$
Iris	0.967±0.054	0.980±0.031	0.972±0.047	0.983±0.025	0.967±0.054	0.980±0.031	0.966±0.055	0.980±0.031	0.950±0.081	0.970±0.046
oocMer4D	0.787±0.033	0.832±0.028	0.787±0.062	0.821±0.036	0.718±0.032	0.803±0.029	0.734±0.036	0.806±0.029	0.476±0.073	0.615±0.057
Contrac	0.519±0.030	0.557±0.024	0.505±0.036	0.545±0.030	0.494±0.036	0.530±0.026	0.494±0.037	0.531±0.028	0.249±0.049	0.309±0.037
Abalone	0.642±0.025	0.654±0.023	0.644±0.029	0.653±0.030	0.640±0.025	0.651±0.023	0.635±0.026	0.647±0.026	0.464±0.038	0.481±0.034
Magic	0.792±0.005	0.852±0.005	0.781±0.005	0.851±0.006	0.748±0.007	0.818±0.006	0.759±0.006	0.830±0.005	0.520±0.012	0.662±0.011
Mean values	0.741	0.775 ( $↑$ 3.4%)	0.738	0.771 ( $↑$ 3.3%)	0.713	0.756 ( $↑$ 4.3%)	0.718	0.759 ( $↑$ 4.1%)	0.532	0.607 ( $↑$ 7.5%)
Data	Accuracy		Precision		Recall		F1		Kappa
Data	kNN $(D)$	kNN $(D^{'})$	kNN $(D)$	kNN $(D^{'})$	kNN $(D)$	kNN $(D^{'})$	kNN $(D)$	kNN $(D^{'})$	kNN $(D)$	kNN $(D^{'})$
Iris	0.953±0.052	0.960±0.044	0.960±0.045	0.964±0.042	0.953±0.052	0.960±0.044	0.953±0.053	0.960±0.044	0.930±0.078	0.940±0.066
oocMer4D	0.739±0.055	0.793±0.038	0.734±0.036	0.806±0.029	0.773±0.050	0.728±0.048	0.698±0.058	0.768±0.046	0.399±0.114	0.537±0.093
Contrac	0.489±0.024	0.501±0.023	0.470±0.028	0.485±0.025	0.467±0.026	0.485±0.025	0.465±0.026	0.482±0.024	0.203±0.035	0.227±0.035
Abalone	0.601±0.027	0.616±0.023	0.598±0.030	0.616±0.031	0.599±0.027	0.615±0.024	0.595±0.030	0.611±0.027	0.402±0.040	0.425±0.035
Magic	0.840±0.008	0.851±0.008	0.846±0.011	0.860±0.008	0.798±0.009	0.810±0.010	0.814±0.009	0.827±0.010	0.630±0.018	0.656±0.019
Mean values	0.724	0.744 ( $↑$ 2.0%)	0.722	0.746 ( $↑$ 2.4%)	0.718	0.720 ( $↑$ 0.2%)	0.705	0.730 ( $↑$ 2.5%)	0.513	0.557 ( $↑$ 4.4%)
Data	Accuracy		Precision		Recall		F1		Kappa
Data	RF $(D)$	RF $(D^{'})$	RF $(D)$	RF $(D^{'})$	RF $(D)$	RF $(D^{'})$	RF $(D)$	RF $(D^{'})$	RF $(D)$	RF $(D^{'})$
Iris	0.947±0.058	0.953±0.052	0.953±0.056	0.964±0.038	0.947±0.058	0.953±0.052	0.946±0.059	0.953±0.052	0.920±0.087	0.930±0.078
oocMer4D	0.761±0.034	0.787±0.032	0.730±0.039	0.764±0.040	0.728±0.048	0.747±0.037	0.728±0.043	0.753±0.036	0.456±0.086	0.507±0.073
Contrac	0.511±0.016	0.517±0.036	0.489±0.022	0.500±0.036	0.481±0.020	0.491±0.034	0.480±0.021	0.491±0.035	0.233±0.026	0.243±0.058
Abalone	0.604±0.027	0.624±0.028	0.603±0.032	0.625±0.032	0.602±0.028	0.622±0.028	0.600±0.030	0.619±0.029	0.406±0.041	0.436±0.042
Magic	0.870±0.005	0.860±0.007	0.871±0.007	0.860±0.010	0.840±0.007	0.828±0.008	0.852±0.006	0.840±0.008	0.705±0.013	0.681±0.016
Mean values	0.739	0.748 ( $↑$ 0.9%)	0.729	0.743 ( $↑$ 1.4%)	0.720	0.728 ( $↑$ 0.8%)	0.721	0.731 ( $↑$ 1.0%)	0.544	0.559 ( $↑$ 1.5%)
Data	Accuracy		Precision		Recall		F1		Kappa
Data	Percept $(D)$	Percept $(D^{'})$	Percept $(D)$	Percept $(D^{'})$	Percept $(D)$	Percept $(D^{'})$	Percept $(D)$	Percept $(D^{'})$	Percept $(D)$	Percept $(D^{'})$
Iris	0.873±0.081	0.973±0.033	0.910±0.059	0.978±0.027	0.873±0.081	0.973±0.033	0.865±0.089	0.973±0.033	0.810±0.122	0.960±0.049
oocMer4D	0.751±0.050	0.784±0.042	0.736±0.082	0.767±0.045	0.694±0.044	0.759±0.041	0.703±0.048	0.755±0.041	0.411±0.100	0.515±0.081
Contrac	0.452±0.034	0.517±0.040	0.434±0.051	0.502±0.051	0.424±0.039	0.487±0.039	0.407±0.047	0.483±0.044	0.142±0.056	0.244±0.063
Abalone	0.604±0.054	0.594±0.042	0.589±0.078	0.596±0.050	0.598±0.056	0.592±0.040	0.574±0.072	0.564±0.050	0.404±0.083	0.392±0.062
Magic	0.745±0.023	0.776±0.019	0.735±0.026	0.757±0.022	0.700±0.022	0.746±0.017	0.704±0.022	0.749±0.018	0.418±0.038	0.500±0.036
Mean values	0.685	0.729 ( $↑$ 4.4%)	0.681	0.720 ( $↑$ 3.9%)	0.658	0.711 ( $↑$ 5.3%)	0.651	0.705 ( $↑$ 5.4%)	0.437	0.522 ( $↑$ 8.5%)
Data	Accuracy		Precision		Recall		F1		Kappa
Data	GNB $(D)$	GNB $(D^{'})$	GNB $(D)$	GNB $(D^{'})$	GNB $(D)$	GNB $(D^{'})$	GNB $(D)$	GNB $(D^{'})$	GNB $(D)$	GNB $(D^{'})$
Iris	0.953±0.043	0.940±0.036	0.963±0.033	0.952±0.027	0.953±0.043	0.940±0.036	0.952±0.044	0.939±0.037	0.930±0.064	0.910±0.054
oocMer4D	0.593±0.052	0.675±0.080	0.599±0.040	0.680±0.060	0.610±0.045	0.696±0.070	0.580±0.049	0.663±0.076	0.193±0.083	0.353±0.133
Contrac	0.466±0.036	0.539±0.023	0.486±0.030	0.535±0.019	0.490±0.037	0.535±0.024	0.463±0.035	0.529±0.021	0.214±0.048	0.299±0.036
Abalone	0.572±0.062	0.603±0.033	0.566±0.068	0.626±0.034	0.568±0.060	0.604±0.031	0.558±0.063	0.601±0.034	0.357±0.092	0.407±0.048
Magic	0.727±0.006	0.763±0.009	0.721±0.010	0.750±0.011	0.647±0.007	0.709±0.012	0.653±0.008	0.719±0.012	0.329±0.014	0.445±0.023
Mean values	0.662	0.704 ( $↑$ 4.2%)	0.667	0.709 ( $↑$ 4.2%)	0.654	0.697 ( $↑$ 4.3%)	0.641	0.690 ( $↑$ 4.9%)	0.405	0.483 ( $↑$ 7.8%)

Based on Tab.7, the following conclusions can be made. (1) For each kind of classifier

L

, the mean value of

L (D^{'})

surpasses that of its opponent

L (D)

on all evaluation metrics. Especially, for the mean values of kappa metric that is a more proper metric to value the ability of a classifier for dealing with complex datasets like imbalance, SVM

(D^{'})

, Perceptron

(D^{'})

and GaussianNB

(D^{'})

achieve 7.56%, 8.52%, 7.82% improvement than those of SVM

(D)

, Perceptron

(D)

and GaussianNB

(D)

, respectively. (2)

L (D^{'})

wins 109 out of 125 experimental configurations (5 datastes

\times

5 methods

\times

5 metrics). (3)

L (D^{'})

achieves the best or comparable result on each data set.

In summary, the above results imply that association among features is indeed able to improve the discrimination ability of the original data.

4.4 Classification performance comparison with other feature enhancement methods

In this section, we compare AssoRep with six feature enhancement methods: AF [9], AF

_{X}

, CRAM

_{c}

(discrete version CRAM) [10], CRAM

_{d}

(continuous version CRAM) [10], FS

_{M I}

[34], and FS

_{L R}

[34]. Specifically, we first obtain enhanced features using above feature enhancement methods, and then compare their classification performance by passing them into the same classifier (here the logistic regression is used).

Benchmark denotes that the features are not enhanced using any methods. AF is the original association data reconstruction proposed in [9], and uses pDor as association measure method. AF

_{X}

is enhanced versions of the AF by concatenating the result and the original features

X

like CRAM

_{c}

and CRAM

_{d}

. CRAM

_{c}

and CRAM

_{d}

enhance the representation ability of data with some extra information including the recounting statistics on the class membership of neighboring as well as distance information between examples and their

k

nearest neighbors. The hype-parameter

k

in CRAM

_{c}

and CRAM

_{d}

takes 8 that is recommended by the paper [10]. FS

_{M I}

and FS

_{L R}

are two feature enhancement methods based on feature selection strategy. FS

_{M I}

selects importance features according to mutual information each feature vector and label vector, and the number of selected features is take from

{0.1 m, 0.2 m, \dots, 0.9 m}

where

m

is the number of features of the original data

X

. While FS

_{L R}

achieves the purpose using logistic regression algorithm, the selection strategy adopts the default settings in sklearn library. The experimental results are reported in Tab.8, in which the best result on each data set is marked with bold font.

Tab.8 Accuracy comparison between AssoRep with other feature enhancement methods

Data	Benchmark	AF	AF $_{X}$	CRAM $_{c}$	CRAM $_{d}$	FS $_{M I}$	FS $_{L R}$	AssoRep
Iris	0.907±0.053	0.927±0.055	0.953±0.043	0.953±0.043	0.953±0.043	0.947±0.050	0.940±0.055	0.973±0.033
oocMer4D	0.796±0.036	0.751±0.023	0.811±0.035	0.820±0.028	0.822±0.028	0.797±0.028	0.800±0.035	0.837±0.020
Contrac	0.507±0.042	0.568±0.052	0.566±0.058	0.519±0.035	0.517±0.035	0.507±0.030	0.519±0.041	0.568±0.055
Abalone	0.647±0.020	0.640±0.023	0.659±0.022	0.650±0.015	0.651±0.017	0.647±0.019	0.635±0.012	0.662±0.021
Magic	0.791±0.006	0.837±0.007	0.844±0.008	0.845±0.008	0.844±0.008	0.791±0.007	0.787±0.008	0.850±0.008
Annealing	0.873±0.027	0.893±0.017	0.910±0.024	0.910±0.024	0.911±0.021	0.880±0.024	0.863±0.017	0.951±0.014
ctg-10classes	0.768±0.032	0.802±0.030	0.800±0.026	0.817±0.023	0.813±0.023	0.771±0.027	0.751±0.030	0.834±0.027
oocTris2F	0.797±0.030	0.815±0.036	0.815±0.031	0.828±0.043	0.829±0.040	0.795±0.031	0.785±0.031	0.836±0.030
Mean values	0.7608 ( $↑$ 5.31%)	0.7791 ( $↑$ 3.48%)	0.7948 ( $↑$ 1.91%)	0.7927 ( $↑$ 2.012%)	0.7925 ( $↑$ 2.14%)	0.7669 ( $↑$ 4.70%)	0.7600 ( $↑$ 5.39%)	0.8139
Avg. rank	6.813	5.250	3.563	3.125	3.063	6.188	6.938	1.063

It is easy to see from Tab.8 that 1) All feature enhancement methods except FS

_{L R}

achieve the higher accuracy than the benchmark method, which highlights that the importance of feature enhancement strategy. 2) The AssoRep algorithm gets the highest accuracy values on all datasets. 3) The AssoRep algorithm achieves the improvement of 3.48% than the AF algorithm which indicates that the quality of association matrix plays an important role. 4) The mean accuracy of the AssoRep is higher 2.14% than the CRAM

_{d}

algorithm that rank the first in seven baseline methods. It is noteworthy that the CRAM

_{d}

uses the discriminative information from output space (label information) while the proposed AssoRep only uses information from the input (feature) space. Moreover, the new representation of CRAM

_{c}

and CRAM

_{d}

contains the original representation, which is helpful for performance improvement. This can be found the result that the performance of AF

_{X}

is higher than

A F

. 5) Compared to the FS

_{M I}

and FS

_{L R}

, AF, AF

_{X}

, CRAM

_{c}

, and CRAM

_{d}

get the better accuracy. This suggests that enhancing the feature by mining some new information from the original data may be more effective than only remove some weaker features. These interesting results indicate that the association-based representation learning is worth further studying.

To further assess the signification differences of the eight algorithms in term of the classification accuracy, we employ the Friedman test [55] that a favorable choice for comparisons of multiple algorithms over many datasets. It follows a Fisher distribution with

k - 1

numerator degrees of freedom and

(k - 1) (N - 1)

denominator degree of freedom, and is defined as:

F_{F} = \frac{(N - 1) χ_{F}^{2}}{N (k - 1) - χ_{F}^{2}}, w h e r e

χ_{F}^{2} = \frac{(12 N)}{k (k + 1)} (\sum_{i = 1}^{k} R_{i}^{2} - \frac{k (k + 1)}{4}),

where

k

and

N

denote the number of the compared algorithms and datasets, respectively.

R_{i}

is the average rank of algorithm

i

among all the datasets. The smaller the average rank value is, the better the corresponding algorithm is. Its null hypothesis is rejected if the returned

F_{F}

is higher than the specified the critical value.

As shown in Tab.9, the

F_{F}

20.610 is higher than the critical value 2.203 at signification level

α = 0.05

, the null hypothesis that the accuracy of all algorithms is equivalent in this paper is clearly rejected. This indicates that the classification performance of eight algorithms is significantly different. Hence, we need to further study relative performance among the comparing algorithms. To this end, the Nemenyi post hoc test that compares classifiers in a pairwise manner is adoped. In Nemenyi test, the performance of two algorithms is considered significantly different if the distance of the average ranks exceeds the following critical distance

Tab.9 Summary of the Friedman statistics $F_{F}$

Evaluation metric	$F_{F}$	Critical value $(α = 0.05)$
Accuracy	20.610	2.203

(22)

C D = q_{α} \sqrt{\frac{k (k + 1)}{6 N}},

where

q_{α = 0.05} = 3.031

when

k = 8

The CD diagram is often used to illustrate the rank relation among the comparing algorithms. In CD diagrams, the average rank of each algorithm is marked along the axis (the smaller the better). As shown in Fig.1, AssoRep ranks the first. It is significantly better than the AF, FS

_{M I}

and FS

_{L R}

, while CRAM

_{d}

has not a significant difference from those. This further validates the advantage of the proposed AssoRep.

Fig.1 Comparison between A and B (control algorithms, A and B denote the AssoRep and the baseline algorithm CRAM $_{d}$ with the best performance, and they are remarked with red star and blue star, respectively) against other comparing algorithms with the Nemenyi test. Algorithms are not connected with A (red line) and B (blue line) in the CD diagram are considered to have significantly different performance from the control algorithm (significance level $α = 0.05$ )

Full size|PPT slide

4.5 Efficiency analysis

This experiment aims to investigate the efficiency of the AssoRep algorithm via replacing the dCor with Pearson’s correlation coefficient (pCor), normalized mutual information (NMI), the maximal information coefficient (MIC) [18] and its improved version MIC

_{e}

[56]. The NMI, MIC and MIC

_{e}

have the very highly computational complexity, which brings some challenge for the comparison experiment. In this paper, we use minepy Python library that provides an efficient achievement of the MIC and MIC

_{e}

, while the sklearn toolkit is used to NMI. It is worth pointing out that the maximal neighborhood coefficient (MNC) [19] is not use due to its higher the computation complexity. The results are shown in Tab.10, in where the computation time is provided on AWArgsift-hist when

L

takes 1, while the computation time is recorded on other datasets when

L

takes 10.

Tab.10 Computation time (s) of the different association mining methods

Data	pCor	dCor	NMI	MIC	MIC $_{e}$
Iris	0.16	0.05	1.01	0.30	0.29
oocMer4D	0.36	5.90	75.37	71.23	70.37
Contrac	0.22	0.55	3.84	10.05	7.16
Abalone	0.26	1.24	3.71	10.82	11.08
Magic	1.02	7.50	7.07	58.84	59.11
AWArgsift-hist	178.90	618.08	3124.94	9956.50	15541.76

According to Tab.10, we can observe that 1) the pCor costs the least time, but its classification accuracy is lower than dCor shown in Tab.10; 2) Compared to the NMI, MIC and MIC

_{e}

that are extremely time-consuming, the computation time of the dCor is accepted. For example, for the dataset AWArgsift-hist, dCor costs about 618 seconds for calculating the association relationships of 1999000 paired features and training the logistic regression model. While NMI needs to about 3124 seconds, MIC

_{e}

costs about 4.32 hours, which are five times and 25 times of computation time that dCor costs, respectively. These results suggest that the taking the dCor as the association mining is an appropriate choice that is able to well balance effectiveness and efficiency.

5 Conclusion

We have proposed an association-based representation improvement method (AssoRep), which is able to well balance effectiveness and efficiency. Moreover, AssoRep has a better interpretability because the work mechanism of its each process is transparent than existing enhancing feature methods like multilayer perceptron, attention. The effectiveness of AssoRep has been validated by a lot of experimental results on classification tasks.

Although this work further prefects and riches the association data reconstruction domain, like AF [9], AssoRep only provides the vector-like improved representation. As a result, it can not fit the models that take tensor-like data as input like convolutional neural networks. Hence, tensorizing association-based representation is worthwhile studying in the future. Moreover, AssoRep equally treats the relationship between the paired features, it is worthwhile to generalize the AssoRep with cause and effect among features. Like MIC and MIC

_{e}

, dCor over estimates the strength of association between two features when the true relationship is very weak. Hence, it is urgent to study a solution to eliminate the bias of dCor.

Xinyan Liang received the PhD degree in computer science and technology from Shanxi University, China in 2022. He is currently a Lecturer at the Institute of Big Data Science and Industry, Shanxi University, China. He was a visiting scholar at The University of Hong Kong, China in 2018. His main research interests include multi-modal machine learning, evolutionary intelligence, and their applications. He has published several journal papers in his research fields, including IEEE TPAMI, IEEE TEVC, etc

Yuhua Qian received the MS and PhD degrees in computers with applications from Shanxi University, China in 2005 and 2011, respectively. He is currently a Professor with the Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, China. He is best known for multigranulation rough sets in learning from categorical data and granular computing. He is involved in research on machine learning, pattern recognition, feature selection, granular computing, and artificial intelligence. He has authored over 150 articles on these topics in international journals. He served on the Editorial Board of the International Journal of Knowledge-Based Organizations and Artificial Intelligence Research

Qian Guo received the PhD degree in computer science and technology from Shanxi University, China in 2022. She is currently a Lecturer at the School of Computer Science and Technology, Taiyuan University of Science and Technology, China. She was a visiting scholar at The University of Hong Kong, China in 2018. Her current research interests include logic learning, abstract reasoning, deep learning and their applications

Keyin Zheng received a BS degree in information and computing science and Master’s degree in pattern recognition and intelligent system at school of Mathematical Sciences from Shanxi University, China in 2012 and 2015, respectively. She is a PhD candidate at Institute of Big Data Science and Industry, Shanxi University, China. Her research interest includes concept learning and machine learning

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Zhu Y, Geng Y, Li Y, Qiang J, Wu X . Representation learning: serial-autoencoder for personalized recommendation. Frontiers of Computer Science, 2024, 18( 4): 184316

[2]	Bengio Y, Courville A, Vincent P . Representation learning: a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35( 8): 1798–1828

[3]	Jia B B, Liu J Y, Hang J Y, Zhang M L . Learning label-specific features for decomposition-based multi-class classification. Frontiers of Computer Science, 2023, 17( 6): 176348

[4]	Zhang M L, Fang J P, Wang Y B . BiLabel-specific features for multi-label classification. ACM Transactions on Knowledge Discovery from Data, 2021, 16( 1): 18

[5]	Yang M, Liu Q, Sun X, Shi N, Xue H . Towards kernelizing the classifier for hyperbolic data. Frontiers of Computer Science, 2024, 18( 1): 181301

[6]	Dong X, Luo T, Fan R, Zhuge W, Hou C . Active label distribution learning via kernel maximum mean discrepancy. Frontiers of Computer Science, 2023, 17( 4): 174327

[7]	Zhang Y, Jiang L, Li C . Attribute augmentation-based label integration for crowdsourcing. Frontiers of Computer Science, 2023, 17( 5): 175331

[8]	Troncoso-García A R, Martínez-Ballesteros M, Martínez-Álvarez F, Troncoso A . A new approach based on association rules to add explainability to time series forecasting models. Information Fusion, 2023, 94: 169–180

[9]	Liang X, Qian Y, Guo Q, Cheng H, Liang J . AF: an association-based fusion method for multi-modal classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44( 12): 9236–9254

[10]	Jia B B, Zhang M L . Multi-dimensional classification via kNN feature augmentation. Pattern Recognition, 2020, 106: 107423

[11]	Deng M, Yang W, Chen C, Liu C . Exploring associations between streetscape factors and crime behaviors using Google Street View images. Frontiers of Computer Science, 2022, 16( 4): 164316

[12]	Guo Q, Qian Y, Liang X . GLRM: logical pattern mining in the case of inconsistent data distribution based on multigranulation strategy. International Journal of Approximate Reasoning, 2022, 143: 78–101

[13]	Guo Q, Qian Y, Liang X, She Y, Li D, Liang J . Logic could be learned from images. International Journal of Machine Learning and Cybernetics, 2021, 12( 12): 3397–3414

[14]	Kuzma J. Basic Statistics for the Health Sciences. Palo Alto: Mayfield Publishing Company, 1984, 158–169

[15]	Spearman C . The proof and measurement of association between two things. The American Journal of Psychology, 1904, 15( 1): 72–101

[16]	Kendall M G . A new measure of rank correlation. Biometrika, 1938, 30( 1-2): 81–93

[17]	Székely G J, Rizzo M L, Bakirov N K . Measuring and testing dependence by correlation of distances. The Annals of Statistics, 2007, 35( 6): 2769–2794

[18]	Reshef D N, Reshef Y A, Finucane H K, Grossman S R, Mcvean G, Turnbaugh P J, Lander E S, Mitzenmacher M, Sabeti P C . Detecting novel associations in large data sets. Science, 2011, 334( 6062): 1518–1524

[19]	Cheng H, Qian Y, Hu Z, Liang J . Association mining method based on neighborhood perspective. SCIENTIA SINICA Informationis, 2020, 50( 6): 824–844

[20]	Zhu Y, Kwok J T, Zhou Z H . Multi-label learning with global and local label correlation. IEEE Transactions on Knowledge and Data Engineering, 2018, 30( 6): 1081–1094

[21]	Xu N, Shu J, Zheng R, Geng X, Meng D, Zhang M L . Variational label enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45( 5): 6537–6551

[22]	Zhang M L, Zhou Z H . A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 2014, 26( 8): 1819–1837

[23]	Zhang M L, Li Y K, Liu X Y, Geng X . Binary relevance for multi-label learning: an overview. Frontiers of Computer Science, 2018, 12( 2): 191–202

[24]	Kou Y, Lin G, Qian Y, Liao S . A novel multi-label feature selection method with association rules and rough set. Information Sciences, 2023, 624: 299–323

[25]	Zhang Y, Zhu H, Song Z, Koniusz P, King I. Spectral feature augmentation for graph contrastive learning and beyond. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. 2023, 11289−11297

[26]	Gao Z, Wu Y, Jia Y, Harandi M. Hyperbolic feature augmentation via distribution estimation and infinite sampling on manifolds. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 34421–34435

[27]	Zhang M L, Wu L . LIFT: multi-label learning with label-specific features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37( 1): 107–120

[28]	Zheng S, Yuan W, Guan D . Heterogeneous information network embedding with incomplete multi-view fusion. Frontiers of Computer Science, 2022, 16( 5): 165611

[29]	Wang B, Li H, Wei B, Kang Z, Li C . Nighttime image dehazing using color cast removal and dual path multi-scale fusion strategy. Frontiers of Computer Science, 2022, 16( 4): 164706

[30]	Wang Z, Li L, Xue Y, Jiang C, Wang J, Sun K, Ma H . FeNet: feature enhancement network for lightweight remote-sensing image super-resolution. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5622112

[31]	Wang W, Zhang M L. Partial label learning with discrimination augmentation. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022, 1920−1928

[32]	Gong C, Wang D, Li M, Chandra V, Liu Q. KeepAugment: a simple information-preserving data augmentation approach. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 1055−1064

[33]	Wang M, Han H, Huang Z, Xie J . Unsupervised spectral feature selection algorithms for high dimensional data. Frontiers of Computer Science, 2023, 17( 5): 175330

[34]	Liu J, Chai C, Luo Y, Lou Y, Feng J, Tang N. Feature augmentation with reinforcement learning. In: Proceedings of the 38th IEEE International Conference on Data Engineering. 2022, 3360−3372

[35]	Li H, Xu C, Ma L, Bo H, Zhang D . MODENN: a shallow broad neural network model based on multi-order descartes expansion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44( 12): 9417–9433

[36]	Taylor R . Interpretation of the correlation coefficient: a basic review. Journal of Diagnostic Medical Sonography, 1990, 6( 1): 35–39

[37]	Spearman C . The proof and measurement of association between two things. The American Journal of Psychology, 1987, 100( 3-4): 441–471

[38]	Spearman C . The proof and measurement of association between two things. International Journal of Epidemiology, 2010, 39( 5): 1137–1150

[39]	Puth M T, Neuhäuser M, Ruxton G D . Effective use of Spearman’s and Kendall’s correlation coefficients for association between two measured traits. Animal Behaviour, 2015, 102: 77–84

[40]	Shannon C E . A mathematical theory of communication. The Bell system Technical Journal, 1948, 27( 3): 379–423

[41]	Cheng H, Qian Y, Guo Y, Zheng K, Zhang Q . Neighborhood information-based method for multivariate association mining. IEEE Transactions on Knowledge and Data Engineering, 2023, 35( 6): 6126–6135

[42]	Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6000−6010

[43]	Shen W X, Zeng X, Zhu F, Wang Y L, Qin C, Tan Y, Jiang Y Y, Chen Y Z . Out-of-the-box deep learning prediction of pharmaceutical properties by broadly learned knowledge-based molecular representations. Nature Machine Intelligence, 2021, 3( 4): 334–343

[44]	Liang X, Guo Q, Qian Y, Ding W, Zhang Q . Evolutionary deep fusion method and its application in chemical structure recognition. IEEE Transactions on Evolutionary Computation, 2021, 25( 5): 883–893

[45]	Gretton A, Bousquet O, Smola A, Schölkopf B. Measuring statistical dependence with hilbert-schmidt norms. In: Proceedings of the 16th International Conference on Algorithmic Learning Theory. 2005, 63−77

[46]	Fernández-Delgado M, Cernadas E, Barro S, Amorim D. Do we need hundreds of classifiers to solve real world classification problems? The Journal of Machine Learning Research, 2014, 15(1): 3133–3181

[47]	Lampert C H, Nickisch H, Harmeling S . Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36( 3): 453–465

[48]	Arevalo J, Solorio T, Montes-y-Gómez M, Gonzalez F A . Gated multimodal networks. Neural Computing and Applications, 2020, 32( 14): 10209–10228

[49]	Zhang Y, Cao C, Cheng J, Lu H . EgoGesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Transactions on Multimedia, 2018, 20( 5): 1038–1050

[50]	Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É . Scikit-learn: machine learning in python. The Journal of Machine Learning Research, 2011, 12: 2825–2830

[51]	Cortes C, Vapnik V . Support-vector networks. Machine Learning, 1995, 20( 3): 273–297

[52]	Cover M, Hart E . Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 1967, 13( 1): 21–27

[53]	Breiman L . Random forests. Machine Learning, 2001, 45( 1): 5–32

[54]	Freund Y, Schapire R E . Large margin classification using the perceptron algorithm. Machine Learning, 1999, 37( 3): 277–296

[55]	Demšar J . Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 2006, 7: 1–30

[56]	Reshef Y A, Reshef D N, Finucane H K, Sabeti P C, Mitzenmacher M . Measuring dependence powerfully and equitably. The Journal of Machine Learning Research, 2016, 17( 1): 7406–7468

Acknowledgements

This work was supported by the National Key R&D Program of China (No. 2021ZD0112400), the National Natural Science Foundation of China (Grant Nos. 62306171, 62136005, 61976129, 62106132, 61906114, 61906115), the Science and Technology Major Project of Shanxi (No. 202201020101006), the Young Scientists Fund of the Natural Science Foundation of Shanxi (Nos. 202203021222183, 20210302124549), the Open Project Foundation of Intelligent Information Processing Key Laboratory of Shanxi Province (Nos. CICIP2023005, CICIP202205), the Science and Technology Innovation Plan for Colleges and Universities of Shanxi Province (2022L296), and Taiyuan University of Science and Technology Doctoral Research Start-up Fund Project (20222106).

Competing interests

The authors declare that they have no competing interests or financial conflicts to disclose.

RIGHTS & PERMISSIONS

2025 Higher Education Press

AI Summary AI Mindmap

PDF(1511 KB)

Supplementary files

FCS-23396-OF-XL_suppl_1 (253 KB)

864

Accesses

Citations

Detail

Sections

Recommended

Abstract
Graphical abstract
Keywords
Cite this article
1 Introduction
2 Related work
3 The AssoRep method
3.1 Relationship boosting
3.2 Association mining
3.2.1 Choice of association mining method
3.2.2 Computing the association in-between features
4 Experiment
4.1 Evaluation metrics
4.2 Experimental results on 120 benchmark datasets with different sample size
4.2.1 Results on the Group 1
Tab.1 Characteristics of the first group of datasets whose sample sizes are larger than 700 (Group 1)
Tab.2 Classification performance comparison between LR(D) and LR(D′) on benchmark datasets L1-L40
Tab.3 Classification performance comparison between LR(D) and LR(D′) on benchmark datasets L41-L60
4.2.2 Results on Group 2
Tab.4 Characteristics of the second group of datasets whose the numbers are smaller than 700 (Group 2)
Tab.5 Classification performance comparison between LR(D) and LR(D′) on benchmark datasets S1-S40
Tab.6 Classification performance comparison between LR(D) and LR(D′) on benchmark datasets S41-S60
4.3 Experimental results on different classifiers
Tab.7 Classification performance comparison between original and association-based enhancement representation using different classifiers
4.4 Classification performance comparison with other feature enhancement methods
Tab.8 Accuracy comparison between AssoRep with other feature enhancement methods
Tab.9 Summary of the Friedman statistics FF
Fig.1 Comparison between A and B (control algorithms, A and B denote the AssoRep and the baseline algorithm CRAMd with the best performance, and they are remarked with red star and blue star, respectively) against other comparing algorithms with the Nemenyi test. Algorithms are not connected with A (red line) and B (blue line) in the CD diagram are considered to have significantly different performance from the control algorithm (significance level α=0.05)
4.5 Efficiency analysis
Tab.10 Computation time (s) of the different association mining methods
5 Conclusion
References
Acknowledgements
Competing interests
RIGHTS & PERMISSIONS

Received	Accepted	Published
10 May 2023	30 Oct 2023	15 Jan 2025
Just Accepted Date	Issue Date
01 Nov 2023	12 Mar 2024

About the journal

Browse

Authors & reviewers

Abstract

Graphical abstract

Keywords

Cite this article

1 Introduction

2 Related work

3 The AssoRep method

3.1 Relationship boosting

3.2 Association mining

3.2.1 Choice of association mining method

3.2.2 Computing the association in-between features

4 Experiment

4.1 Evaluation metrics

4.2 Experimental results on 120 benchmark datasets with different sample size

4.2.1 Results on the Group 1

Tab.1 Characteristics of the first group of datasets whose sample sizes are larger than 700 (Group 1)

Tab.2 Classification performance comparison between LR(D) and LR(D′) on benchmark datasets L1-L40

Tab.3 Classification performance comparison between LR(D) and LR(D′) on benchmark datasets L41-L60

4.2.2 Results on Group 2

Tab.4 Characteristics of the second group of datasets whose the numbers are smaller than 700 (Group 2)

Tab.5 Classification performance comparison between LR(D) and LR(D′) on benchmark datasets S1-S40

Tab.6 Classification performance comparison between LR(D) and LR(D′) on benchmark datasets S41-S60

4.3 Experimental results on different classifiers

Tab.7 Classification performance comparison between original and association-based enhancement representation using different classifiers

4.4 Classification performance comparison with other feature enhancement methods

Tab.8 Accuracy comparison between AssoRep with other feature enhancement methods

Tab.9 Summary of the Friedman statistics FF

4.5 Efficiency analysis

Tab.10 Computation time (s) of the different association mining methods

5 Conclusion

{{custom_sec.title}}

{{custom_sec.title}}

References

Acknowledgements

Competing interests

RIGHTS & PERMISSIONS

**4.2.1 Results on the Group 1**

**Tab.1 Characteristics of the first group of datasets whose sample sizes are larger than 700 (Group 1)**

Tab.2 Classification performance comparison between LR $(D)$ and LR $(D^{'})$ on benchmark datasets L1-L40

Tab.3 Classification performance comparison between LR $(D)$ and LR $(D^{'})$ on benchmark datasets L41-L60

**4.2.2 Results on Group 2**

**Tab.4 Characteristics of the second group of datasets whose the numbers are smaller than 700 (Group 2)**

Tab.5 Classification performance comparison between LR $(D)$ and LR $(D^{'})$ on benchmark datasets S1-S40

Tab.6 Classification performance comparison between LR $(D)$ and LR $(D^{'})$ on benchmark datasets S41-S60

Tab.9 Summary of the Friedman statistics $F_{F}$