A data representation method using distance correlation

Xinyan LIANG, Yuhua QIAN, Qian GUO, Keyin ZHENG

Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (1) : 191303.

PDF(1511 KB)
Front. Comput. Sci. All Journals
PDF(1511 KB)
Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (1) : 191303. DOI: 10.1007/s11704-023-3396-y
Excellent Young Computer Scientists Forum
RESEARCH ARTICLE

A data representation method using distance correlation

Author information +
History +

Abstract

Association in-between features has been demonstrated to improve the representation ability of data. However, the original association data reconstruction method may face two issues: the dimension of reconstructed data is undoubtedly higher than that of original data, and adopted association measure method does not well balance effectiveness and efficiency. To address above two issues, this paper proposes a novel association-based representation improvement method, named as AssoRep. AssoRep first obtains the association between features via distance correlation method that has some advantages than Pearson’s correlation coefficient. Then an improved matrix is formed via stacking the association value of any two features. Next, an improved feature representation is obtained by aggregating the original feature with the enhancement matrix. Finally, the improved feature representation is mapped to a low-dimensional space via principal component analysis. The effectiveness of AssoRep is validated on 120 datasets and the fruits further prefect our previous work on the association data reconstruction.

Graphical abstract

Keywords

association / representation / distance correlation / classification

Cite this article

Download citation ▾
Xinyan LIANG, Yuhua QIAN, Qian GUO, Keyin ZHENG. A data representation method using distance correlation. Front. Comput. Sci., 2025, 19(1): 191303 https://doi.org/10.1007/s11704-023-3396-y

1 Introduction

The success of deep learning [1, 2], multi-label learning [3, 4], kernel learning [5, 6] shows that learning with enhanced features instead of the original features maybe more effective. For example, the multilayer perceptron and attention have been designed to enhance the representation ability of data in an implicit manner, achieving the performance improvement of machine learning models [7]. However, their poor interpretability strongly limits their application in the trusted domain. In this article, the interpretability denotes the transparency of the model specifically related to humans’ ability to understand it [8]. Hence, it is necessary to develop an interpretable representation enhancement method. Recently, some researchers have attempted to enhance the representation ability of data by fully mining and utilizing the latent information in data with some transparent techniques [9, 10].
The association information that characterizes the relationship among features/variables is a kind of important latent information of data. The datasets to analyze are mostly collected from real applications, they often contain important and rich association relationship forms [1113]. However, most researchers in machine learning domain prefer obtaining the independent feature representation by putting the orthogonal constraint on a new feature space for some reasons such as feature decoupling, simplicity in modeling. This strategy means that the association among features is removed, which not only causes information waste but also maybe not a good strategy for learning on association data. Ours recent work (the method is named as AF) [9] applies association among features calculated using Pearson’s correlation coefficient (pCor) to data reconstruction, finding that association in-between features can improve the representation ability of data. However, AF has two limitations:
1. Data representation obtained by AF is high dimension or sparse. AF consists of feature boosting process and association-based fusion process. In order to model the high-order information of features and improve the nonlinear representation ability of original data, feature boosting process adopted a simple but effective way of adding the power of each feature value into the original feature space. This process indeed achieves their goal, but it also causes a tricky problem that the dimension of new representation must be higher than that of the original data representation. For example, if the dimension of given data set is 100, the dimension of new representation will be 1000 when the parameter L takes value 10. The issue of curse of dimensionality limits AF application to the high-dimensional data. Hence, it is desirable to develop an association-based data reconstruction method that can generate a lower-dimensional data representation.
2. pCor used to capture the association between features by AF does not well balance effectiveness and efficiency. AF’s one core task is to measure the association degree between two feature vectors. Some association measure methods like pCor is computationally efficient, but some methods themselves have some limitations. For example, pCor’s value dose not accurately reveal whether two features are independent; moreover, pCor is only appropriate for calculating association between two feature vectors with the same dimension. Others like MIC, MNC can mine more relationships, but it is computationally inefficient. Overall, the association computed by simple association measure methods is inaccurate, while advanced methods are computationally inefficient. Hence, it is necessary to explore a more practical association measure method that can balance both effectiveness and efficiency to association-based data reconstruction task.
Based on the above analysis, ours aim is to develop a novel association data reconstruction that well balances efficiency and effectiveness by using more proper association measure method and low dimensional embedding techniques. To this end, we develop an association-based representation enhancement, which is shortened to AssoRep. AssoRep first obtains the association between features via distance correlation method that has some advantages than Pearson’s correlation coefficient. Then an improved matrix is formed via stacking the association value of any two features. Next, an improved feature representation is obtained by aggregating the original feature with the enhancement matrix. Finally, the improved feature representation is mapped to a low-dimensional space via principal component analysis. It is noted that the work mechanism of AssoRep’s each process is transparent.
The contributions of this work are as follows:
1. We introduce a fresh perspective on data representation improvement through association between features, which perfects the relationship-based learning that mainly focuses on relationships among samples such as graph neural network and spectral clustering.
2. A novel distance correlation-based data representation method is proposed, and it well balances effectiveness and efficiency compared to its counterpart AF [9].
3. The experimental results on 120 benchmark show that the proposed AssoRep outperforms the other methods in term of five popular evaluation metrics widely used for classification in most cases.
The remainder of this paper is organized as follows: Section 2 reviews the related works including learning with association and feature argumentation. Section 3 details the AssoRep, a representation framework for the associated data. Section 4 details the experimental setup and the results on the classification task. In Section 5, the conclusions and future work are presented.

2 Related work

Our work falls into the category of association mining, learning based on association and feature enhancement. To show the place of our work, we shall simply review them as follows.
Association mining: To measure the association among variables, the scholars have proposed lots of methods. For example, the well-known Pearson correlation coefficient was designed for measuring the strength of linear trend between two variables [14]; Spearman’s rho [15] and Kendall’s tau [16] were developed for measuring the degree of monotonic trend between two variables. For identifying the complex association relationships among variables such as trigonometric function, inverse trigonometric function some advanced methods have been developed such as distance corrlation (DC) [17], maximal information coefficient (MIC) [18], and maximal neighborhood coefficient (MNC) [19].
Learning with association: The association has been proven to be a kind of effective latent information for performance improvement or other aims on some machine learning tasks, especially multi-label learning [2022]. For example, to enable binary relevance with label correlation exploitation abilities, the researches have proposed the chaining structure, the stacking structure and the controlling structure based on three assumptions: random label correlations, full-order label correlations and pruned label correlations, respectively [23]. Recently, association is also been applied to other tasks. For example, Kou et al. [24] developed a mining label association rules method for automatically mining the mixed order correlation among labels, and then applied the correlations to multi-label feature select task. Troncoso et al. [8] explained models for time series forecasting with the help of numeric association rules. Although the above methods achieve success for different aims in various tasks, most works are from multi-label learning task, and consider the association among labels.
Feature argumentation: The feature argumentation generally serves two purposes: producing new samples and boosting the representation ability. The former generates more diverse and discriminative features by noise injection [25], sampling on hyperbolic normal distribution [26], each generated feature corresponds to a new examples. Similar to our work, the latter aims to re-represent the examples based on the original features and extra information like distance information [27], multi-view features [28], or multi-scale information [29]. For example, Jia et al. [10] achieved a performance improvement of multi-dimensional classification on the augmented feature space that consists of counting statistics on the class membership of neighboring as well as distance information between examples and their k nearest neighbors via kNN techniques. Wang et al. [30] induced an enhanced feature representation by fusing multi-scale discriminative information from different layers of the convolutional neural network into a single feature vector. Wang et al. [31] enriched the feature space using confidence-rated class prototype features to replenish discriminative characteristics of the underlying ground-truth labels for partial label training examples. Its benefits were demonstrated in many applications such as multi-label learning [10], multi-modal classification [9], and multi-camera tracking [32]. Another kind of feature augmentation method is the feature selection that removes the unimportant features to achieve the purpose [33]. For example, Liu et al. [34] removed weaker features from multiple candidate sets based on an exploration-exploitation strategy reinforcement learning. It is worth noticing that some existing feature argumentation methods like the CRAMc use the discriminative information from output space (label information). In this paper, we introduce a fresh perspective on data representation improvement that only uses information association from the input (feature) space.

3 The AssoRep method

This article proposes a framework of enhancing representation via association and name it as AssoRep. AssoRep includes (1) relationship boosting, (2) association mining, and (3) association embedding.
Let X be a set with n examples and Y be its corresponding label set. Then a dataset can be represented as
D=(X,Y),
where X={x1,x2,,xn}Rn×m where xi={xi1,xi2,, xim}Rm denotes ith example, n and m are the numbers of examples and features, respectively; Y={y1,y2,,yn}Rn where yi is xi’ label.
Let F be the feature vector set of the data set D. Then it is written as follows
F={f1,f2,,fm},
where fi={x1i,x2i,,xni}Rn denotes the ith feature vector from X.

3.1 Relationship boosting

Its aim is to enrich features by adding transform terms using different transform functions. This process can be viewed as the first enhancement for X. In this article, the power functions with different integer order are used to this end. The effectiveness of boosting relationship with power functions has been validated by some works such as [9, 35].
Let BRn×mL be the relationship boosting data representation of X. Given a set of power functions ϕ={ϕ1(x),ϕ2(x),,ϕL(x)} where L is the maximal order, ϕt(x)=xt,t{1,2,,L}, we obtain B as follows:
1. For each feature vector fiF, compute its transform values using the power functions ϕ and represent these transform values as the following matrix form
Bi=[ϕ1(fi),ϕ2(fi),,ϕL(fi)]Rn×L.
2. Concatenate the transform values of m feature vectors from F as follows,
B=[B1,B2,,Bm],
where
Bi=[ϕ1(x1i)ϕ2(x1i)ϕL(x1i)ϕ1(x2i)ϕ2(x2i)ϕL(x2i)ϕ1(xni)ϕ2(xni)ϕL(xni)].

3.2 Association mining

The purpose of this article is to enhance the representation ability of given datasets via association information between feature vectors. Hence, one core task is to measure the association degree between two feature vectors, and the choice of association mining methods is important.

3.2.1 Choice of association mining method

If we view every feature as a variable in statistic, then in correlation analysis, the methods which are used to measure correlation coefficient can be adopted for mining association among features. A basic aspects of the correlation analysis can see the literature [36]. In the following, we briefly introduce some correlation analysis methods and detail the distance correlation that is used in our work.
The widely-used Pearson correlation coefficient (pCor), also named as Pearson product-moment correlation coefficient, can give the strength of linear trend between two variables [14]. Spearman’s rho [15] that was reprinted and reflected more than once (see [37, 38]) and Kendall’s tau [16] are two rank order correlation coefficients. Both of them are often used to measure the degree of monotonic trend between two variables. A comparison analysis between Spearman’s rho and Kendall’s tau can be seen in literature [39].
Mutual information, a frequently-used mathematical theory, is often used to construct the association measurement tools [40]. For example, in 2011, David et al. thought if a relationship exists between two variables, then a grid can be drawn on the scatter plot of the two variables that partitions the data to encapsulate that relationship. Based on the idea, they proposed the maximal information coefficient (MIC) where these grid partitions are applied to estimate mutual information [18]. With the inspiration of MIC, Cheng et al. developed effective bivariate and multivariate association mining techniques by replacing the example with its neighbor points from the perspective of neighborhood information [19, 41]. They show the powerful ability of capturing various kinds of functional relationships.
The above mentioned methods are either with their own shortcomings (i.e., pCor) or typically computationally intensive (i.e., MIC, MNC). With the trade-off between measurement effectiveness and computational complexity, we choose the distance correlation (dCor) [17], a correlation analysis method based on characterize function, as the mining association information tool. Given two feature vectors XRp and YRq, where p and q are the dimention of the two vectors, the distance covariance of two feature vectors V, distance correlation [17] between two random variables is defined by
R2(X,Y)={V2(X,Y)V2(X)V2(Y),V2(X)V2(Y)>0;0,V2(X)V2(Y)=0.
where
V2(X,Y)=fX,Y(t,s)fX(t)fY(s)2=1cpcqRp+q|fX,Y(t,s)fX(t)fY(s)|2|t|p1+p|s|q1+qdtds,
V2(X)=∥fX,X(t,s)fX(t)fX(s)2,
V2(Y)=∥fY,Y(t,s)fY(t)fY(s)2,
where fX, fY and fX,Y denote the characteristic function of X, Y and the joint characteristic function between both of them, respectively.
The distance correlation possesses the following features:
0R1;
R(X,Y) is defined for X and Y in arbitrary dimensions, while the widely-used Pearson’s correlation coefficient (pCor) must be same. That is to say, the constraint p=q has to be meet for pCor but not for dCor;
R(X,Y)=0 characterizes independence of X and Y while the pCor is not;
● Compared with MIC, MNC etc., it is computationally efficient.

3.2.2 Computing the association in-between features

It aims to obtain an association matrix as enhancement matrix via stacking the association values of any two feature vectors where the association between features is computed via distance correlation method.
To measure the association between any two feature vectors in a given data set, empirical distance correlation (dCor) [17] is introduced due to its good properties described above, especially than Pearson’s correlation coefficient.
Let Fϕ be the feature vector set of B that is the relationship boosting data representation of X. Then it can be denoted as
Fϕ={h1,h2,,hmL},
where hj={ϕt(x1i),ϕt(x2i),,ϕt(xni)}Rn denotes the jth feature vector from B shown in Eq. (2), where j=L(i1)+t.
Given two feature vectors hi={x1i,x2i,,xni}Fϕ and hj={x1j,x2j,,xnj}Fϕ, where n is the number of examples. The empirical distance covariance of the two feature vectors Vn(hi,hj) is defined by
Vn2(hi,hj)=1n2k,l=1nAklBkl,
where Akl=aklak¯al¯a¯, Bkl=bklbk¯bl¯b¯ and each term of them are computed as follows:
akl=|xkixli|p,bkl=|xkjxlj|p,
ak¯=1nl=1nakl,bk¯=1nl=1nbkl,
al¯=1nk=1nakl,bl¯=1nk=1nbkl,
a¯=1n2k,l=1nakl,b¯=1n2k,l=1nbkl.
Similarly, Vn(hi) or Vn(hj) can be defined as
Vn2(hi)=1n2k,l=1nAkl2,
Vn2(hj)=1n2k,l=1nBkl2.
Based on Eqs. (9), (14), and (15), the two feature vectors empirical distance correlation Rn(hi,hj) can be obtained with Eq. (16)
Rn2(hi,hj)={Vn2(hi,hj)Vn2(hi)Vn2(hj),Vn2(hi)Vn2(hj)>0;0,Vn2(hi)Vn2(hj)=0.
With Eq. (16), the enhancement matrix can be obtained and represented as
R=[Rn2(h1,h1)Rn2(h1,h2)Rn2(h1,hmL)Rn2(h2,h1)Rn2(h2,h2)Rn2(h2,hmL)Rn2(hmL,h1)Rn2(hmL,h2)Rn2(hmL,hmL)].
(3) Association embedding: It aims to further enhance feature representation of X by aggregating the first enhancement result B with the enhancement matrix R.
Let Bij and Rij denote the ith row and jth column of the matrix B and R respectively, X be the final enhanced data representation. Then the element Xij of the ith row and jth column of the matrix X can be computed by
Xij=m´=0m111!Rik1Bik1+m´=0m112!Rik2Bik2++m´=0m11(n)!RiknBikn+=k=1wkRkjBkj=k=1mLwkRkjBkj+ϵk=1mLwkRkjBkj,
where w=[w1,w2,,wk,,w(mL1),wmL]= [1/(1!),1/(2!),,1/(L!)m]RmL, kl=m´L+l, ϵ is an infinitesimal.
Further, let Rj denote the jth column of the matrix R, Bi denote the ith row of the matrix R, and denote the element-wise product. Then Xij can be computed in the form of vector inter product by
XijBi(wTRj).
Let W=[w;w;;w]Rn×mL, then E can be computed in the form of matrix multiplication by
XB(WR).
The behavior of Eq. (18) is similar to the self-attention mechanism [42]. Specifically, the association matrix R in Eq. (18) corresponds to the similarity A of the query matrix K and key matrix Q in the self-attention mechanism, i.e., A=QKT. Aij denotes that the similarity between feature i and j, and the similarity based on the inner product of vectors can be thought as a measure of the linear relationship; While Rij denotes that association between feature i and j, and its values can more complex relationship via some advanced association mining technique. AV corresponds to BR where V denotes the values in the self-attention mechanism. Noting that the power functions in relationship boosting process make the feature values dramatically. Inspired by Taylor’s Formula, a reweighting strategy is used to relieve the problem, i.e., WR. The vast success of self attention in various tasks have proven the effectiveness of the mechanism.
AssoRep algorithm only is a presentation method and its output is X. So, to finish some downstream tasks such as classification, clustering, the AssoRep algorithm must combine with existing machine learning algorithms. The combining process is very simple, we do not need any modification for existing machine learning algorithms. In this following, we gave the steps in the context of supervised learning.
For supervised learning task, we first need to combine the enhanced representation E and the label set Y, and obtain a new data set D. It can be represented as
D=(X,Y).
Let L(D) be a supervised machine learning model to be combined and it takes D as input. Then we only let L take D as input, i.e., L(D), the process of combining AssoRep algorithm with the supervised algorithm L is achieved. We can instantiate L with different classifiers such as logistic regression, support vector machine and random forest.
It should be noticed that the relationship boosting process in AssoRep algorithm causes dimension increment of the new representation obtained by AssoRep. To address this issue, principal component analysis (PCA) is used.
In summary, the efficiency of AssoRep comes from two aspects. The first is that dCor is high-efficiency than NMI, MIC, and MICe. The second is that the dimension of the new representation is reduced with PCA. With these advantage, the AssoRep has many potential applications such as drug properties prediction, recommended system. Taking the drug properties prediction for example, there exist the complex relationships among different types structure descriptors [43], these relationship information can be fully used to improve the molecular representations via the AssoRep.

4 Experiment

This section aims to validate the effectiveness of AssoRep on classification task from four perspective: comparison analysis on datasets with different sample size, generality coupled with the existing classification algorithms, comparison with other other feature enhancement methods, and efficiency analysis on different association mining methods. For most datasets, 10-fold cross validation is adopted for all approaches to compute the mean of each performance metric. For few of datasets, the classification algorithms are very unstable when 10-fold cross validation is adopted, according to demands, 2×5-fold or 5×2-fold cross validation is adopted.

4.1 Evaluation metrics

To measure the performance of a classification result, we employ five frequently-used metrics [44]: accuracy (AC), precision (PE), recall (RE), F1 score, and kappa (K). The larger values of these five evaluation measures indicate a better classification performance. They are defined as follows.
{AC=TP+TNn,PE=TPTP+FP,RE=TPTP+FN,F1=2PE×REPE+RE,K=pope1pe,
where
TP denotes the number of true positives;
TN denotes the number of true negatives;
FP denotes the number of false positives;
FN denotes the number of false negatives;
n=TP+TN+FN+FP;
po=AC is the empirical probability of agreement on the label assigned to any sample (the observed agreement ratio), and pe=(TP+FN)×(TP+FP)n+(FP+TN)×(FN+TN)n is the expected agreement when both annotators assign labels randomly.

4.2 Experimental results on 120 benchmark datasets with different sample size

The quality of the association matrix in Eq. (17) is key for performance guarantee of the AssoRep. Given any two random variables, more observation values (sample size) of the two random variables are, the more accurate the association degree measured via one association mining method is [45]. Hence, the main factor that influences on the performance of AssoRep is the sample size of datasets. For comprehensively showing the behavior of the AssoRep on the datasets with different sample size, we report its results on 120 datasets whose sample size vary from 10 to 67557. Based on the sample size, these datasets are equally divided into two groups:
Group 1: The number of sample n is larger than 700;
Group 2: The number of sample n is smaller than 700.
To a fair comparison, 115 datasets out of 120 directly use the pre-processed ones by Fernandez et al. [46]. AWArgsift-hist [47], MM-IMDB-T [48], MM-IMDB-I [48], and Gesture-R [49] are used as vector features for adaptation to logistic regression algorithm.
All experiments are carried out in Python 3.6 on a server with an AMD EPYC 7542 32-Core Processor with 755 G RAM. The combined algorithms are from the Scikit-learn python library [50].

4.2.1 Results on the Group 1

In this experiment, we aim to validate the effectiveness of AssoRep on 60 datasets with larger sample size. Tab.1 displays the detailed characteristics of each dataset including number of examples (n), number of features (d), and number of class labels (L). As shown in Tab.1, the sample size n varies from 748 to 67557. Specifically, let L(D) and L(D) be the algorithms that learn from the original data representation and AssoRep data representation, respectively. Then, L takes value the logistic regression algorithm (LR), we compare LR(D) with LR(D) on 60 benchmark datasets. The experimental results are shown in Tab.2 and Tab.3 where LR(D) and LR(D) denote that classifier LR learns from the original data representation D and AssoRep data representation D, respectively. For each metric of each data set, the best result between LR(D) and LR(D) is marked with the bold font.
Tab.1 Characteristics of the first group of datasets whose sample sizes are larger than 700 (Group 1)
ID Dataset n d L ID Dataset n d L ID Dataset n d L
L1 abalone 4177 8 3 L2 adult 48842 14 2 L3 annealing 798 38 6
L4 bank 4521 17 2 L5 blood 748 4 2 L6 car 1728 6 4
L7 ctg-10classes 2126 21 10 L8 ctg-3classes 2126 21 3 L9 chess-krvk 28056 6 18
L10 chess-krvkp 3196 36 2 L11 connect-4 67557 42 2 L12 contrac 1473 9 3
L13 energy-y1 768 8 3 L14 wav-mfcc 15352 80 1215 L15 led-display 1000 7 10
L16 letter 20000 16 26 L17 magic 19020 10 2 L18 mammographic 961 5 2
L19 molec-biol-splice 3190 60 3 L20 monks-3 3190 6 2 L21 mushroom 8124 21 2
L22 musk-2 6598 166 2 L23 nursery 12960 8 5 L24 oocMerl2F 1022 25 3
L25 oocMerl4D 1022 41 2 L26 oocTris2F 912 25 2 L27 oocTris5B 912 32 3
L28 optical 3823 62 10 L29 ozone 2536 72 2 L30 page-blocks 5473 10 5
L31 pendigits 7494 16 10 L32 pima 768 5 2 L33 plant-margin 1600 64 100
L34 plant-shape 1600 64 100 L35 plant-texture 1600 36 100 L89 ringnorm 7400 20 2
L37 semeion 1593 256 10 L38 spambase 4601 57 2 L39 st-german-credit 1000 24 2
L40 st-image 2310 18 7 L41 st-landsat 4435 36 6 L42 st-shuttle 43500 9 7
L43 st-vehicle 846 18 4 L44 steel-plates 1941 27 7 L45 thyroid 3772 21 3
L46 tic-tac-toe 958 9 2 L47 titanic 2201 3 2 L48 twonorm 7400 20 2
L49 wall-following 5456 24 4 L50 waveform 5000 21 3 L51 wine-quality-red 1599 11 6
L52 wine-quality-white 4898 11 7 L53 yeast 1484 8 10 L54 robotnavigation 5456 25 4
L55 AWArgsift-hist 3048 2000 10 L56 UJIndoorLoc 21048 520 5 L57 MM-IMDB-T 7799 600 2
L58 MM-IMDB-I 7799 2048 2 L59 YouTubeFaces4 5074 838 31 L60 Gesture-R 4977 2048 83
Tab.2 Classification performance comparison between LR(D) and LR(D) on benchmark datasets L1-L40
DataAccuracyPrecisionRecallF1Kappa
LR(D)LR(D)LR(D)LR(D)LR(D)LR(D)LR(D)LR(D)LR(D)LR(D)
L1 0.647±0.020 0.662±0.021 0.636±0.021 0.652±0.022 0.642±0.020 0.658±0.021 0.636±0.021 0.653±0.021 0.469±0.031 0.492±0.031
L2 0.843±0.007 0.852±0.007 0.796±0.010 0.809±0.012 0.738±0.016 0.757±0.014 0.759±0.014 0.777±0.013 0.521±0.028 0.557±0.025
L3 0.873±0.027 0.951±0.014 0.792±0.116 0.924±0.077 0.641±0.111 0.888±0.080 0.678±0.113 0.901±0.077 0.620±0.087 0.871±0.040
L4 0.895±0.007 0.897±0.005 0.761±0.040 0.760±0.022 0.612±0.030 0.641±0.029 0.644±0.035 0.675±0.027 0.301±0.066 0.357±0.052
L5 0.772±0.015 0.786±0.019 0.705±0.099 0.713±0.054 0.549±0.023 0.608±0.033 0.535±0.036 0.621±0.040 0.135±0.059 0.267±0.074
L6 0.794±0.027 0.881±0.026 0.577±0.123 0.809±0.090 0.444±0.062 0.630±0.068 0.470±0.079 0.667±0.075 0.498±0.072 0.737±0.060
L7 0.768±0.032 0.834±0.027 0.762±0.054 0.837±0.041 0.639±0.039 0.782±0.039 0.668±0.042 0.796±0.035 0.720±0.039 0.802±0.033
L8 0.894±0.018 0.912±0.018 0.827±0.048 0.861±0.044 0.775±0.046 0.827±0.035 0.796±0.042 0.841±0.035 0.701±0.054 0.754±0.053
L9 0.282±0.008 0.351±0.010 0.242±0.038 0.337±0.041 0.203±0.011 0.299±0.018 0.188±0.010 0.294±0.021 0.179±0.009 0.264±0.012
L10 0.970±0.012 0.971±0.014 0.970±0.013 0.971±0.014 0.970±0.012 0.971±0.014 0.970±0.012 0.971±0.014 0.940±0.025 0.941±0.027
L11 0.754±0.000 0.830±0.004 0.720±0.091 0.784±0.006 0.502±0.001 0.728±0.006 0.434±0.002 0.748±0.006 0.005±0.002 0.499±0.012
L12 0.507±0.042 0.568±0.055 0.491±0.053 0.550±0.066 0.472±0.044 0.531±0.055 0.474±0.047 0.533±0.058 0.221±0.067 0.318±0.085
L13 0.874±0.013 0.881±0.012 0.847±0.026 0.862±0.022 0.786±0.020 0.796±0.020 0.795±0.023 0.807±0.024 0.792±0.021 0.804±0.020
L14 0.231±0.011 0.281±0.007 0.137±0.009 0.164±0.007 0.166±0.010 0.209±0.008 0.142±0.009 0.174±0.007 0.229±0.011 0.280±0.007
L15 0.735±0.040 0.735±0.040 0.745±0.039 0.745±0.039 0.736±0.040 0.736±0.040 0.731±0.038 0.731±0.038 0.705±0.045 0.705±0.045
L16 0.723±0.013 0.846±0.009 0.725±0.013 0.849±0.009 0.721±0.013 0.845±0.009 0.720±0.013 0.846±0.009 0.712±0.014 0.840±0.010
L17 0.791±0.006 0.850±0.008 0.782±0.009 0.845±0.009 0.745±0.007 0.820±0.011 0.756±0.007 0.829±0.010 0.517±0.014 0.660±0.019
L18 0.823±0.035 0.832±0.035 0.825±0.035 0.834±0.034 0.823±0.035 0.831±0.034 0.822±0.035 0.831±0.035 0.645±0.070 0.663±0.069
L19 0.835±0.018 0.951±0.013 0.819±0.020 0.942±0.014 0.831±0.021 0.949±0.012 0.824±0.020 0.945±0.013 0.735±0.029 0.920±0.021
L20 0.761±0.123 0.930±0.067 0.777±0.127 0.937±0.063 0.761±0.124 0.930±0.067 0.757±0.125 0.929±0.068 0.521±0.246 0.859±0.135
L21 0.947±0.009 1.000±0.000 0.947±0.009 1.000±0.000 0.946±0.009 1.000±0.000 0.947±0.009 1.000±0.000 0.893±0.018 1.000±0.000
L22 0.949±0.005 0.945±0.005 0.921±0.011 0.921±0.012 0.878±0.015 0.858±0.016 0.898±0.011 0.885±0.012 0.795±0.021 0.771±0.023
L23 0.899±0.007 0.916±0.007 0.649±0.056 0.660±0.056 0.664±0.057 0.676±0.057 0.656±0.056 0.668±0.056 0.851±0.010 0.876±0.010
L24 0.918±0.021 0.930±0.021 0.881±0.046 0.923±0.034 0.893±0.054 0.919±0.038 0.883±0.045 0.919±0.032 0.823±0.045 0.847±0.047
L25 0.796±0.036 0.837±0.020 0.788±0.051 0.819±0.038 0.731±0.045 0.803±0.028 0.746±0.047 0.809±0.030 0.499±0.092 0.619±0.061
L26 0.797±0.030 0.836±0.030 0.800±0.033 0.834±0.029 0.787±0.031 0.829±0.035 0.789±0.031 0.830±0.032 0.580±0.061 0.661±0.064
L27 0.924±0.021 0.930±0.024 0.866±0.151 0.915±0.109 0.828±0.140 0.897±0.109 0.840±0.141 0.900±0.107 0.846±0.044 0.858±0.050
L28 0.964±0.016 0.968±0.013 0.965±0.016 0.969±0.013 0.964±0.016 0.968±0.013 0.964±0.016 0.968±0.013 0.960±0.018 0.965±0.015
L29 0.969±0.008 0.966±0.011 0.570±0.173 0.743±0.176 0.533±0.074 0.584±0.045 0.542±0.104 0.611±0.060 0.092±0.205 0.226±0.120
L30 0.954±0.003 0.959±0.004 0.862±0.043 0.842±0.049 0.659±0.039 0.701±0.029 0.725±0.044 0.753±0.030 0.720±0.023 0.763±0.024
L31 0.943±0.010 0.983±0.004 0.943±0.010 0.983±0.005 0.943±0.010 0.983±0.004 0.942±0.010 0.983±0.005 0.937±0.011 0.981±0.005
L32 0.779±0.029 0.779±0.029 0.768±0.039 0.768±0.039 0.734±0.029 0.734±0.029 0.743±0.031 0.743±0.031 0.490±0.062 0.490±0.062
L33 0.747±0.025 0.798±0.022 0.724±0.025 0.779±0.019 0.750±0.025 0.796±0.022 0.714±0.023 0.767±0.020 0.745±0.026 0.796±0.023
L34 0.509±0.032 0.564±0.038 0.444±0.033 0.502±0.045 0.518±0.030 0.569±0.035 0.446±0.033 0.501±0.042 0.504±0.032 0.560±0.038
L35 0.809±0.018 0.839±0.028 0.789±0.028 0.823±0.047 0.810±0.022 0.841±0.033 0.776±0.025 0.811±0.040 0.807±0.018 0.837±0.028
L36 0.760±0.016 0.986±0.005 0.763±0.015 0.986±0.005 0.760±0.016 0.986±0.005 0.759±0.016 0.986±0.005 0.520±0.032 0.972±0.010
L37 0.890±0.031 0.927±0.019 0.896±0.032 0.932±0.018 0.889±0.031 0.927±0.019 0.889±0.032 0.927±0.019 0.878±0.035 0.919±0.021
L38 0.925±0.011 0.932±0.008 0.925±0.012 0.931±0.010 0.919±0.012 0.927±0.008 0.921±0.012 0.929±0.009 0.843±0.023 0.857±0.018
L39 0.761±0.040 0.771±0.040 0.716±0.053 0.733±0.054 0.684±0.060 0.685±0.062 0.691±0.062 0.695±0.065 0.389±0.117 0.400±0.122
L40 0.913±0.019 0.929±0.014 0.917±0.016 0.930±0.013 0.913±0.019 0.929±0.014 0.913±0.018 0.928±0.013 0.898±0.022 0.917±0.016
Tab.3 Classification performance comparison between LR(D) and LR(D) on benchmark datasets L41-L60
DataAccuracyPrecisionRecallF1Kappa
LR(D)LR(D)LR(D)LR(D)LR(D)LR(D)LR(D)LR(D)LR(D)LR(D)
L41 0.838±0.009 0.887±0.009 0.803±0.029 0.867±0.012 0.757±0.009 0.858±0.008 0.751±0.011 0.861±0.009 0.797±0.011 0.860±0.011
L42 0.930±0.002 0.992±0.001 0.522±0.079 0.780±0.116 0.488±0.078 0.633±0.093 0.501±0.078 0.653±0.087 0.783±0.008 0.978±0.003
L43 0.792±0.024 0.819±0.024 0.790±0.029 0.821±0.024 0.794±0.025 0.821±0.024 0.786±0.027 0.819±0.025 0.723±0.032 0.759±0.032
L44 0.706±0.026 0.744±0.023 0.731±0.049 0.773±0.028 0.695±0.042 0.756±0.043 0.702±0.048 0.758±0.033 0.619±0.033 0.671±0.029
L45 0.950±0.004 0.960±0.010 0.876±0.063 0.897±0.074 0.664±0.041 0.705±0.088 0.672±0.025 0.763±0.083 0.528±0.005 0.644±0.099
L46 0.983±0.016 0.983±0.016 0.988±0.012 0.988±0.012 0.976±0.023 0.976±0.023 0.981±0.018 0.981±0.018 0.962±0.037 0.962±0.037
L47 0.776±0.019 0.778±0.019 0.760±0.025 0.763±0.025 0.700±0.029 0.703±0.028 0.714±0.030 0.718±0.030 0.437±0.056 0.444±0.055
L48 0.979±0.006 0.979±0.006 0.979±0.006 0.979±0.006 0.979±0.006 0.979±0.006 0.979±0.006 0.979±0.006 0.957±0.012 0.957±0.012
L49 0.688±0.013 0.922±0.011 0.690±0.036 0.921±0.016 0.593±0.023 0.918±0.019 0.622±0.027 0.919±0.016 0.514±0.021 0.882±0.017
L50 0.869±0.015 0.869±0.015 0.869±0.015 0.869±0.015 0.869±0.015 0.869±0.015 0.868±0.015 0.868±0.015 0.803±0.023 0.803±0.023
L51 0.592±0.030 0.604±0.044 0.277±0.043 0.296±0.034 0.253±0.017 0.280±0.028 0.246±0.022 0.281±0.031 0.316±0.052 0.351±0.073
L52 0.537±0.014 0.541±0.018 0.289±0.047 0.365±0.157 0.228±0.018 0.257±0.049 0.221±0.018 0.264±0.064 0.234±0.026 0.260±0.029
L53 0.588±0.044 0.611±0.030 0.568±0.092 0.552±0.065 0.485±0.059 0.533±0.050 0.499±0.064 0.529±0.054 0.458±0.059 0.493±0.041
L54 0.688±0.013 0.900±0.014 0.690±0.036 0.903±0.021 0.593±0.023 0.893±0.019 0.622±0.027 0.897±0.018 0.514±0.021 0.849±0.022
L55 0.137±0.011 0.192±0.019 0.109±0.014 0.154±0.017 0.109±0.010 0.157±0.017 0.103±0.009 0.149±0.015 0.113±0.012 0.170±0.020
L56 0.930±0.005 0.981±0.002 0.933±0.005 0.983±0.003 0.932±0.007 0.982±0.003 0.932±0.005 0.982±0.003 0.909±0.007 0.976±0.003
L57 0.709±0.021 0.725±0.018 0.708±0.022 0.725±0.018 0.707±0.022 0.722±0.019 0.707±0.022 0.722±0.019 0.415±0.043 0.445±0.037
L58 0.612±0.021 0.644±0.014 0.610±0.021 0.645±0.014 0.608±0.021 0.638±0.014 0.608±0.021 0.637±0.014 0.218±0.042 0.279±0.028
L59 0.470±0.026 0.496±0.020 0.492±0.032 0.515±0.024 0.443±0.031 0.479±0.018 0.453±0.030 0.486±0.018 0.412±0.030 0.441±0.020
L60 0.928±0.005 0.936±0.007 0.937±0.005 0.943±0.006 0.928±0.006 0.936±0.007 0.928±0.005 0.935±0.007 0.928±0.006 0.935±0.007
The following observations can be made from Tab.2 and Tab.3:
1. LR(D) is statistically much better than LR(D) in term of each performance metric. In these 60 datasets, LR(D) gets the much higher values of accuracy, precision, recall, F1 and kappa for 55, 53, 55, 55, and 55 datasets, respectively, while LR(D) only get the best ones for 2, 2, 1, 1, and 1 datasets, respectively. Even for the best cases for LR(D), the classification performance of LR(D) is very close to those of the LR(D). It is worth noting that LR(D) can statistically and clearly improve the each index on most of the datasets. For example, LR(D) achieves a larger improvement of 0.986−0.760=0.226, 0.986−0.763=0.223, 0.986−0.760=0.226, 0.986−0.759=0.227, 0.972−0.520=0.452 on the dataset L36 in term of the accuracy, precision, recall, F1 and kappa, respectively. Especially, based on the new representation obtained by the AssoRep on the L21 dataset, all performance metrics of LR increase from 0.947, 0.947, 0.946, 0.947, and 0.893 to 1 respectively.
2. Moreover, the AssoRep method tends to perform better on the original data representation with a lower performance. For example, when the representation ability of the dataset L9 is enhanced via the AssoRep, its accuracy markedly increases from the 0.282 to 0.351; while the AssoRep has not obtained a performance improvement on the datasets L10 with the accuracy of 0.970 and L48 with the accuracy of 0.979.
Furthermore, we apply the paired t-test to assess whether the LR(D) performs significantly better than the LR(D). Specifically, given two compared algorithms a and b, an evaluation metric m. We run each algorithm k times, algorithms a gets k evaluation metric values m1a,m2a,,mka in terms of m, algorithms b gets k evaluation metric values m1b,m2b,,mkb in terms of m. The mean value and standard deviation value of 1,2,,k are denoted as μ and σ, respectively, where i=miamib. It follows a t distribution with k1 numerator degrees of freedom, deified as
τt=|kμσ|
In this paper, its null hypothesis that algorithms a and b have the same performance is rejected if the returned p-value is less than the specified significance level 5%. The results are recorded in Tab.2 and Tab.3, in which , , and denote that AssoRep is better/tied/worse than the corresponding methods by the paired t-test with confidence level 5%, respectively.
As shown in Tab.2 and Tab.3, LR(D) is significantly better than the LR(D) on 40, 41, 45, 46, and 45 of 60 datasets, while no case that LR(D) is significantly better than the LR(D) happened at signification level α=5%. The results validate that enhancing representation with association among features is indeed effective on the datasets with the larger sample size.

4.2.2 Results on Group 2

The experiment aims to show the behavior of AssoRep on smaller sample size data. To this end, we use 60 datasets shown in Tab.4 where the detailed characteristics of each dataset including number of examples (n), number of features (d), and number of class labels (L) are displayed. As shown in Tab.4, the sample size n varies from 10 to 690. The experimental settings are the same as that on Group 1. The experimental results are reported in Tab.5 and Tab.6.
Tab.4 Characteristics of the second group of datasets whose the numbers are smaller than 700 (Group 2)
ID Dataset n d L ID Dataset n d L ID Dataset n d L
S1 ac-inflam 120 6 2 S2 acute-nephritis 120 6 2 S3 arrhythmia 452 262 13
S4 audiology-std 226 59 18 S5 balance-scale 625 4 3 S6 balloons 16 4 2
S7 breast-cancer 286 9 2 S8 conn-bench-sonar 208 60 2 S9 conn-bench-vowel 528 11 11
S10 credit-approval 690 15 2 S11 cylinder-bands 512 35 2 S12 dermatology 366 34 6
S13 echocardiogram 131 10 2 S14 ecoli 336 7 8 S15 fertility 100 9 2
S16 flag 194 28 8 S17 glass 214 9 6 S18 haberman-survival 306 3 2
S19 hayes-roth 132 3 3 S20 heart-cleveland 303 13 5 S21 heart-hungarian 294 12 2
S22 heart-switzerland 123 12 2 S23 heart-va 200 12 5 S24 hepatitis 155 19 2
S25 hill-valley 606 100 2 S26 horse-colic 300 25 2 S27 ilpd-indian-liver 583 9 2
S28 image-segmentation 210 19 7 S29 ionosphere 351 33 2 S30 iris 150 4 3
S31 lenses 24 4 3 S32 low-res-spect 531 100 9 S33 lung-cancer 32 56 3
S34 lymphography 148 18 4 S35 molec-biol-promoter 106 57 2 S36 monks-1 124 6 2
S37 monks-2 169 6 2 S38 musk-1 476 166 2 S39 parkinsons 195 22 2
S40 pb-MATERIAL 106 4 3 S41 pb-REL-L 103 4 3 S42 pb-SPAN 92 4 3
S43 pb-T-OR-D 102 4 3 S44 pb-TYPE 105 4 3 S45 planning 182 12 2
S46 post-operative 90 8 3 S47 primary-tumor 330 17 15 S48 seeds 210 7 3
S49 soybean 307 35 18 S50 spect 80 22 2 S51 spectf 80 44 2
S52 st-australian-credit 690 14 2 S53 st-heart 270 13 2 S54 synthetic-control 600 60 6
S55 teaching 151 5 3 S56 trains 10 28 2 S57 vc-2classes 310 6 2
S58 vc-3classes 310 6 3 S59 wine 179 13 3 S60 zoo 101 16 7
Tab.5 Classification performance comparison between LR(D) and LR(D) on benchmark datasets S1-S40
DataAccuracyPrecisionRecallF1Kappa
LR(D)LR(D)LR(D)LR(D)LR(D)LR(D)LR(D)LR(D)LR(D)LR(D)
S1 1.000±0.000 1.000±0.000 1.000±0.000 1.000±0.000 1.000±0.000 1.000±0.000 1.000±0.000 1.000±0.000 1.000±0.000 1.000±0.000
S2 1.000±0.000 1.000±0.000 1.000±0.000 1.000±0.000 1.000±0.000 1.000±0.000 1.000±0.000 1.000±0.000 1.000±0.000 1.000±0.000
S3 0.661±0.048 0.695±0.037 0.427±0.084 0.452±0.071 0.384±0.083 0.415±0.058 0.389±0.080 0.414±0.054 0.457±0.082 0.494±0.066
S4 0.696±0.032 0.789±0.029 0.493±0.047 0.542±0.061 0.490±0.067 0.582±0.082 0.473±0.045 0.549±0.067 0.648±0.038 0.750±0.035
S5 0.862±0.028 0.922±0.005 0.580±0.016 0.615±0.003 0.624±0.021 0.667±0.000 0.599±0.019 0.640±0.002 0.745±0.052 0.855±0.008
S6 0.619±0.048 0.730±0.159 0.480±0.195 0.601±0.315 0.588±0.088 0.688±0.188 0.515±0.152 0.623±0.260 0.171±0.171 0.385±0.385
S7 0.711±0.036 0.727±0.031 0.630±0.090 0.671±0.057 0.575±0.049 0.613±0.034 0.569±0.067 0.619±0.039 0.175±0.116 0.257±0.076
S8 0.773±0.027 0.788±0.023 0.775±0.027 0.792±0.025 0.770±0.027 0.785±0.023 0.770±0.027 0.786±0.023 0.542±0.054 0.573±0.046
S9 0.557±0.046 0.801±0.019 0.552±0.066 0.816±0.013 0.556±0.050 0.801±0.019 0.537±0.058 0.797±0.019 0.512±0.051 0.781±0.021
S10 0.858±0.041 0.861±0.040 0.861±0.038 0.864±0.037 0.863±0.038 0.867±0.038 0.857±0.041 0.861±0.039 0.717±0.080 0.723±0.078
S11 0.733±0.056 0.748±0.053 0.730±0.068 0.742±0.058 0.702±0.061 0.727±0.057 0.705±0.066 0.729±0.057 0.418±0.125 0.462±0.113
S12 0.978±0.027 0.978±0.027 0.981±0.022 0.981±0.022 0.976±0.030 0.976±0.030 0.975±0.030 0.975±0.030 0.972±0.034 0.972±0.034
S13 0.812±0.063 0.818±0.052 0.819±0.097 0.828±0.079 0.749±0.069 0.755±0.058 0.766±0.075 0.773±0.063 0.540±0.149 0.5353±0.124
S14 0.868±0.015 0.872±0.016 0.642±0.030 0.646±0.029 0.637±0.019 0.643±0.018 0.633±0.032 0.639±0.032 0.817±0.021 0.822±0.023
S15 0.854±0.047 0.850±0.053 0.438±0.012 0.438±0.013 0.485±0.026 0.483±0.029 0.460±0.014 0.459±0.016 −0.030±0.048 -0.035±0.052
S16 0.487±0.061 0.530±0.085 0.289±0.042 0.340±0.078 0.310±0.051 0.363±0.076 0.290±0.042 0.341±0.076 0.351±0.078 0.412±0.104
S17 0.620±0.054 0.660±0.051 0.484±0.093 0.532±0.084 0.483±0.075 0.542±0.078 0.472±0.076 0.525±0.076 0.462±0.076 0.520±0.069
S18 0.737±0.016 0.751±0.033 0.679±0.105 0.687±0.081 0.548±0.029 0.597±0.032 0.527±0.050 0.603±0.040 0.120±0.065 0.233±0.078
S19 0.544±0.062 0.844±0.032 0.554±0.061 0.881±0.027 0.584±0.069 0.856±0.035 0.546±0.060 0.860±0.032 0.302±0.099 0.759±0.050
S20 0.583±0.028 0.589±0.030 0.303±0.054 0.329±0.078 0.310±0.042 0.318±0.040 0.301±0.046 0.314±0.048 0.306±0.049 0.311±0.054
S21 0.824±0.039 0.839±0.034 0.813±0.041 0.829±0.035 0.800±0.048 0.819±0.043 0.804±0.045 0.822±0.040 0.6100±0.090 0.645±0.079
S22 0.371±0.050 0.392±0.048 0.231±0.021 0.232±0.032 0.241±0.030 0.241±0.030 0.232±0.026 0.226±0.031 0.090±0.067 0.094±0.069
S23 0.326±0.058 0.336±0.071 0.255±0.064 0.303±0.082 0.272±0.059 0.303±0.067 0.256±0.058 0.294±0.071 0.111±0.078 0.127±0.093
S24 0.810±0.039 0.840±0.032 0.711±0.066 0.765±0.053 0.723±0.085 0.728±0.069 0.713±0.073 0.736±0.067 0.428±0.146 0.476±0.125
S25 0.660±0.032 0.700±0.030 0.775±0.011 0.787±0.029 0.656±0.032 0.696±0.030 0.615±0.050 0.672±0.040 0.314±0.065 0.395±0.061
S26 0.798±0.035 0.827±0.024 0.786±0.044 0.822±0.028 0.777±0.034 0.800±0.031 0.780±0.036 0.807±0.029 0.560±0.073 0.616±0.057
S27 0.716±0.011 0.725±0.010 0.626±0.028 0.653±0.034 0.563±0.018 0.558±0.015 0.557±0.024 0.545±0.023 0.154±0.040 0.147±0.037
S28 0.864±0.016 0.872±0.029 0.872±0.022 0.875±0.030 0.864±0.016 0.872±0.029 0.860±0.018 0.870±0.030 0.841±0.019 0.851±0.034
S29 0.880±0.046 0.920±0.042 0.891±0.048 0.935±0.040 0.851±0.055 0.894±0.052 0.863±0.053 0.908±0.049 0.729±0.104 0.818±0.096
S30 0.907±0.053 0.973±0.033 0.924±0.041 0.978±0.027 0.907±0.053 0.973±0.033 0.904±0.058 0.973±0.033 0.860±0.080 0.960±0.049
S31 0.764±0.057 0.792±0.080 0.717±0.125 0.782±0.085 0.716±0.116 0.774±0.107 0.671±0.108 0.743±0.094 0.574±0.103 0.637±0.131
S32 0.712±0.034 0.737±0.023 0.735±0.039 0.775±0.021 0.712±0.034 0.737±0.023 0.701±0.037 0.731±0.025 0.691±0.037 0.718±0.025
S33 0.434±0.121 0.488±0.081 0.446±0.140 0.538±0.103 0.455±0.115 0.496±0.082 0.430±0.123 0.489±0.081 0.157±0.172 0.219±0.122
S34 0.823±0.039 0.849±0.025 0.677±0.079 0.676±0.012 0.675±0.081 0.674±0.015 0.672±0.078 0.673±0.014 0.661±0.076 0.707±0.050
S35 0.781±0.042 0.834±0.034 0.786±0.044 0.836±0.035 0.781±0.043 0.834±0.033 0.780±0.043 0.834±0.033 0.562±0.085 0.668±0.067
S36 0.669±0.070 0.726±0.068 0.674±0.074 0.742±0.070 0.669±0.071 0.725±0.070 0.666±0.071 0.718±0.080 0.337±0.141 0.450±0.141
S37 0.550±0.072 0.561±0.096 0.407±0.133 0.448±0.160 0.459±0.066 0.481±0.100 0.403±0.073 0.445±0.118 −0.089±0.145 -0.043±0.219
S38 0.857±0.088 0.891±0.039 0.858±0.052 0.891±0.039 0.857±0.088 0.894±0.038 0.855±0.055 0.890±0.039 0.711±0.108 0.780±0.078
S39 0.852±0.055 0.882±0.056 0.822±0.080 0.855±0.076 0.780±0.061 0.830±0.066 0.794±0.068 0.839±0.069 0.591±0.136 0.678±0.137
S40 0.853±0.039 0.859±0.035 0.556±0.097 0.542±0.055 0.597±0.070 0.619±0.049 0.566±0.070 0.573±0.049 0.610±0.090 0.625±0.077
Tab.6 Classification performance comparison between LR(D) and LR(D) on benchmark datasets S41-S60
DataAccuracyPrecisionRecallF1Kappa
LR(D)LR(D)LR(D)LR(D)LR(D)LR(D)LR(D)LR(D)LR(D)LR(D)
S41 0.652±0.075 0.675±0.081 0.485±0.089 0.475±0.066 0.508±0.071 0.516±0.068 0.487±0.077 0.486±0.067 0.371±0.138 0.402±0.151
S42 0.693±0.063 0.713±0.050 0.730±0.091 0.725±0.073 0.645±0.044 0.653 ±0.068 0.652±0.053 0.664±0.067 0.481±0.086 0.506±0.094
S43 0.868±0.037 0.882±0.052 0.683±0.202 0.759±0.170 0.615±0.100 0.706±0.140 0.625±0.124 0.712±0.137 0.270±0.230 0.431±0.269
S44 0.590±0.037 0.644±0.035 0.414±0.075 0.579±0.055 0.432±0.043 0.530±0.043 0.404±0.046 0.524±0.039 0.422±0.048 0.513±0.048
S45 0.709±0.021 0.715±0.015 0.356±0.008 0.357±0.008 0.496±0.012 0.500±0.000 0.415±0.007 0.417±0.005 −0.010±0.031 0.000±0.000
S46 0.680±0.079 0.680±0.079 0.334±0.078 0.334±0.078 0.450±0.081 0.450±0.081 0.382±0.077 0.382±0.077 −0.040±0.110 −0.040±0.110
S47 0.503±0.100 0.509±0.041 0.341±0.118 0.331±0.081 0.370±0.117 0.383±0.081 0.335±0.114 0.337±0.082 0.428±0.118 0.439±0.076
S48 0.933±0.044 0.971±0.032 0.939±0.042 0.976±0.026 0.933±0.044 0.971±0.032 0.932±0.044 0.971±0.032 0.900±0.065 0.957±0.047
S49 0.892±0.065 0.892±0.056 0.907±0.078 0.900±0.048 0.913±0.064 0.906±0.048 0.901±0.072 0.895±0.050 0.881±0.072 0.881±0.061
S50 0.645±0.057 0.686±0.043 0.581±0.073 0.636±0.064 0.571±0.064 0.592±0.042 0.570±0.070 0.589±0.052 0.149±0.134 0.204±0.093
S51 0.752±0.055 0.785±0.064 0.766±0.057 0.801±0.070 0.751±0.056 0.784±0.065 0.748±0.058 0.782±0.066 0.503±0.111 0.569±0.129
S52 0.667±0.011 0.668±0.012 0.458±0.085 0.562±0.044 0.497±0.008 0.522±0.015 0.418±0.012 0.484±0.025 −0.018±0.021 0.054±0.038
S53 0.839±0.054 0.850±0.051 0.842±0.057 0.854±0.051 0.836±0.055 0.845±0.052 0.836±0.056 0.847±0.053 0.673±0.111 0.694±0.104
S54 0.940±0.024 0.960±0.021 0.946±0.022 0.965±0.019 0.940±0.024 0.960±0.021 0.938±0.025 0.960±0.021 0.928±0.029 0.952±0.026
S55 0.506±0.048 0.511±0.025 0.508±0.049 0.518±0.027 0.509±0.049 0.513±0.025 0.497±0.047 0.509±0.024 0.261±0.072 0.268±0.038
S56 0.774±0.180 0.900±0.134 0.775±0.210 0.933±0.088 0.742±0.188 0.900±0.128 0.716±0.204 0.891±0.144 0.470±0.383 0.799±0.258
S57 0.842±0.042 0.848±0.058 0.824±0.049 0.829±0.064 0.823±0.058 0.828±0.078 0.819±0.050 0.825±0.071 0.639±0.101 0.651±0.141
S58 0.852±0.050 0.858±0.050 0.815±0.071 0.826±0.071 0.803±0.069 0.813±0.069 0.804±0.070 0.814±0.070 0.761±0.082 0.772±0.082
S59 0.983±0.026 0.994±0.017 0.983±0.026 0.994±0.017 0.986±0.021 0.995±0.014 0.983±0.025 0.994±0.017 0.974±0.039 0.992±0.025
S60 0.951±0.022 0.954±0.015 0.940±0.034 0.916±0.052 0.891±0.041 0.901±0.038 0.896±0.045 0.892±0.044 0.935±0.029 0.940±0.020
From Tab.5 and Tab.6, we have the following observations:
1. For each performance metric, LR(D) tends to obtain better performance, which is consists with the results on Group 1. For example, the accuracy, precision, recall, F1 and kappa increase from 0.557, 0.552, 0.556, 0.537, and 0.512 to 0.801, 0.816, 0.801, 0.797, and 0.781, achieving the performance improvement of 24.4%, 26.4%, 24.5%, 26.0%, and 26.9%, respectively on the dataset S9. Overall, LR(D) wins 258 times, ties 21 times, losses 21 times in the 300 experimental configurations (5 metrics × 60 datasets).
2. LR(D) tends to have much smaller standard deviations than LR(D), which suggests that the LR(D) is much better robustness for small-scale data classification.
3. The AssoRep method often performs better on the original data representation with a lower performance. For example, when the representation ability of the dataset S19 is enhanced via the AssoRep, its accuracy is improved from the 0.544 to 0.844. This suggests that the association information between features is a good auxiliary information for representation learning.
Furthermore, we test whether the LR(D) performs significantly better than the LR(D) via the paired t-test. As shown in Tab.5 and Tab.6, LR(D) is significantly better than the LR(D) on 13, 14, 12, 12, and 13 datasets at signification level α=5%. Compared to the results on Group 1, the times that LR(D) is significantly better than the LR(D) are obviously less. This is because that the association degree between some features may be unaccurately assessed via less samples. It is worth pointing out that no case that LR(D) is significantly better than the LR(D) happened at signification level α=5%. The results suggest that the proposed association-based representation is also effective on the datasets with the smaller sample size, especially, the classification algorithm coupled with AssoRep is much better robustness for small-scale data classification.
In summary, the proposed AssoRep algorithm has been demonstrated to be effective for different sample size datasets via Group 1 and Group 2. This indicates the AssoRep is robust for different sample size datasets, hence it can be safely applied in various tasks.

4.3 Experimental results on different classifiers

In this section, we evaluate the performance of AssoRep by combining it with five different classifiers including support vector machine (SVM) [51], k-nearest neighbors (kNN) [52], random forest (RF) [53], perceptron [54], gaussian naive bayes (GaussianNB), i.e., L{SVM,kNN,RF,Percept,GaussianNB}. The experimental results are reported in Tab.7 where L(D) and L(D) denote that classifier L learns from the original data representation D and AssoRep data representation D, respectively; For each metric of each dataset, the best result of L(D) and L(D) on same algorithm and all algorithms are marked with bold font and underline, respectively.
Tab.7 Classification performance comparison between original and association-based enhancement representation using different classifiers
DataAccuracyPrecisionRecallF1Kappa
SVM(D)SVM(D)SVM(D)SVM(D)SVM(D)SVM(D)SVM(D)SVM(D)SVM(D)SVM(D)
Iris 0.967±0.054 0.980±0.031 0.972±0.047 0.983±0.025 0.967±0.054 0.980±0.031 0.966±0.055 0.980±0.031 0.950±0.081 0.970±0.046
oocMer4D 0.787±0.033 0.832±0.028 0.787±0.062 0.821±0.036 0.718±0.032 0.803±0.029 0.734±0.036 0.806±0.029 0.476±0.073 0.615±0.057
Contrac 0.519±0.030 0.557±0.024 0.505±0.036 0.545±0.030 0.494±0.036 0.530±0.026 0.494±0.037 0.531±0.028 0.249±0.049 0.309±0.037
Abalone 0.642±0.025 0.654±0.023 0.644±0.029 0.653±0.030 0.640±0.025 0.651±0.023 0.635±0.026 0.647±0.026 0.464±0.038 0.481±0.034
Magic 0.792±0.005 0.852±0.005 0.781±0.005 0.851±0.006 0.748±0.007 0.818±0.006 0.759±0.006 0.830±0.005 0.520±0.012 0.662±0.011
Mean values 0.741 0.775 (3.4%) 0.738 0.771 (3.3%) 0.713 0.756 (4.3%) 0.718 0.759 (4.1%) 0.532 0.607 (7.5%)
Data Accuracy Precision Recall F1 Kappa
kNN(D) kNN(D) kNN(D) kNN(D) kNN(D) kNN(D) kNN(D) kNN(D) kNN(D) kNN(D)
Iris 0.953±0.052 0.960±0.044 0.960±0.045 0.964±0.042 0.953±0.052 0.960±0.044 0.953±0.053 0.960±0.044 0.930±0.078 0.940±0.066
oocMer4D 0.739±0.055 0.793±0.038 0.734±0.036 0.806±0.029 0.773±0.050 0.728±0.048 0.698±0.058 0.768±0.046 0.399±0.114 0.537±0.093
Contrac 0.489±0.024 0.501±0.023 0.470±0.028 0.485±0.025 0.467±0.026 0.485±0.025 0.465±0.026 0.482±0.024 0.203±0.035 0.227±0.035
Abalone 0.601±0.027 0.616±0.023 0.598±0.030 0.616±0.031 0.599±0.027 0.615±0.024 0.595±0.030 0.611±0.027 0.402±0.040 0.425±0.035
Magic 0.840±0.008 0.851±0.008 0.846±0.011 0.860±0.008 0.798±0.009 0.810±0.010 0.814±0.009 0.827±0.010 0.630±0.018 0.656±0.019
Mean values 0.724 0.744 (2.0%) 0.722 0.746 (2.4%) 0.718 0.720 (0.2%) 0.705 0.730 (2.5%) 0.513 0.557 (4.4%)
Data Accuracy Precision Recall F1 Kappa
RF(D) RF(D) RF(D) RF(D) RF(D) RF(D) RF(D) RF(D) RF(D) RF(D)
Iris 0.947±0.058 0.953±0.052 0.953±0.056 0.964±0.038 0.947±0.058 0.953±0.052 0.946±0.059 0.953±0.052 0.920±0.087 0.930±0.078
oocMer4D 0.761±0.034 0.787±0.032 0.730±0.039 0.764±0.040 0.728±0.048 0.747±0.037 0.728±0.043 0.753±0.036 0.456±0.086 0.507±0.073
Contrac 0.511±0.016 0.517±0.036 0.489±0.022 0.500±0.036 0.481±0.020 0.491±0.034 0.480±0.021 0.491±0.035 0.233±0.026 0.243±0.058
Abalone 0.604±0.027 0.624±0.028 0.603±0.032 0.625±0.032 0.602±0.028 0.622±0.028 0.600±0.030 0.619±0.029 0.406±0.041 0.436±0.042
Magic 0.870±0.005 0.860±0.007 0.871±0.007 0.860±0.010 0.840±0.007 0.828±0.008 0.852±0.006 0.840±0.008 0.705±0.013 0.681±0.016
Mean values 0.739 0.748 (0.9%) 0.729 0.743 (1.4%) 0.720 0.728 (0.8%) 0.721 0.731 (1.0%) 0.544 0.559 (1.5%)
Data Accuracy Precision Recall F1 Kappa
Percept(D) Percept(D) Percept(D) Percept(D) Percept(D) Percept(D) Percept(D) Percept(D) Percept(D) Percept(D)
Iris 0.873±0.081 0.973±0.033 0.910±0.059 0.978±0.027 0.873±0.081 0.973±0.033 0.865±0.089 0.973±0.033 0.810±0.122 0.960±0.049
oocMer4D 0.751±0.050 0.784±0.042 0.736±0.082 0.767±0.045 0.694±0.044 0.759±0.041 0.703±0.048 0.755±0.041 0.411±0.100 0.515±0.081
Contrac 0.452±0.034 0.517±0.040 0.434±0.051 0.502±0.051 0.424±0.039 0.487±0.039 0.407±0.047 0.483±0.044 0.142±0.056 0.244±0.063
Abalone 0.604±0.054 0.594±0.042 0.589±0.078 0.596±0.050 0.598±0.056 0.592±0.040 0.574±0.072 0.564±0.050 0.404±0.083 0.392±0.062
Magic 0.745±0.023 0.776±0.019 0.735±0.026 0.757±0.022 0.700±0.022 0.746±0.017 0.704±0.022 0.749±0.018 0.418±0.038 0.500±0.036
Mean values 0.685 0.729 (4.4%) 0.681 0.720 (3.9%) 0.658 0.711 (5.3%) 0.651 0.705 (5.4%) 0.437 0.522 (8.5%)
Data Accuracy Precision Recall F1 Kappa
GNB(D) GNB(D) GNB(D) GNB(D) GNB(D) GNB(D) GNB(D) GNB(D) GNB(D) GNB(D)
Iris 0.953±0.043 0.940±0.036 0.963±0.033 0.952±0.027 0.953±0.043 0.940±0.036 0.952±0.044 0.939±0.037 0.930±0.064 0.910±0.054
oocMer4D 0.593±0.052 0.675±0.080 0.599±0.040 0.680±0.060 0.610±0.045 0.696±0.070 0.580±0.049 0.663±0.076 0.193±0.083 0.353±0.133
Contrac 0.466±0.036 0.539±0.023 0.486±0.030 0.535±0.019 0.490±0.037 0.535±0.024 0.463±0.035 0.529±0.021 0.214±0.048 0.299±0.036
Abalone 0.572±0.062 0.603±0.033 0.566±0.068 0.626±0.034 0.568±0.060 0.604±0.031 0.558±0.063 0.601±0.034 0.357±0.092 0.407±0.048
Magic 0.727±0.006 0.763±0.009 0.721±0.010 0.750±0.011 0.647±0.007 0.709±0.012 0.653±0.008 0.719±0.012 0.329±0.014 0.445±0.023
Mean values 0.662 0.704 (4.2%) 0.667 0.709 (4.2%) 0.654 0.697 (4.3%) 0.641 0.690 (4.9%) 0.405 0.483 (7.8%)
Based on Tab.7, the following conclusions can be made. (1) For each kind of classifier L, the mean value of L(D) surpasses that of its opponent L(D) on all evaluation metrics. Especially, for the mean values of kappa metric that is a more proper metric to value the ability of a classifier for dealing with complex datasets like imbalance, SVM(D), Perceptron(D) and GaussianNB(D) achieve 7.56%, 8.52%, 7.82% improvement than those of SVM(D), Perceptron(D) and GaussianNB(D), respectively. (2) L(D) wins 109 out of 125 experimental configurations (5 datastes × 5 methods × 5 metrics). (3) L(D) achieves the best or comparable result on each data set.
In summary, the above results imply that association among features is indeed able to improve the discrimination ability of the original data.

4.4 Classification performance comparison with other feature enhancement methods

In this section, we compare AssoRep with six feature enhancement methods: AF [9], AFX, CRAMc (discrete version CRAM) [10], CRAMd (continuous version CRAM) [10], FSMI [34], and FSLR [34]. Specifically, we first obtain enhanced features using above feature enhancement methods, and then compare their classification performance by passing them into the same classifier (here the logistic regression is used).
Benchmark denotes that the features are not enhanced using any methods. AF is the original association data reconstruction proposed in [9], and uses pDor as association measure method. AFX is enhanced versions of the AF by concatenating the result and the original features X like CRAMc and CRAMd. CRAMc and CRAMd enhance the representation ability of data with some extra information including the recounting statistics on the class membership of neighboring as well as distance information between examples and their k nearest neighbors. The hype-parameter k in CRAMc and CRAMd takes 8 that is recommended by the paper [10]. FSMI and FSLR are two feature enhancement methods based on feature selection strategy. FSMI selects importance features according to mutual information each feature vector and label vector, and the number of selected features is take from {0.1m,0.2m,,0.9m} where m is the number of features of the original dataX. While FSLR achieves the purpose using logistic regression algorithm, the selection strategy adopts the default settings in sklearn library. The experimental results are reported in Tab.8, in which the best result on each data set is marked with bold font.
Tab.8 Accuracy comparison between AssoRep with other feature enhancement methods
Data Benchmark AF AFX CRAMc CRAMd FSMI FSLR AssoRep
Iris 0.907±0.053 0.927±0.055 0.953±0.043 0.953±0.043 0.953±0.043 0.947±0.050 0.940±0.055 0.973±0.033
oocMer4D 0.796±0.036 0.751±0.023 0.811±0.035 0.820±0.028 0.822±0.028 0.797±0.028 0.800±0.035 0.837±0.020
Contrac 0.507±0.042 0.568±0.052 0.566±0.058 0.519±0.035 0.517±0.035 0.507±0.030 0.519±0.041 0.568±0.055
Abalone 0.647±0.020 0.640±0.023 0.659±0.022 0.650±0.015 0.651±0.017 0.647±0.019 0.635±0.012 0.662±0.021
Magic 0.791±0.006 0.837±0.007 0.844±0.008 0.845±0.008 0.844±0.008 0.791±0.007 0.787±0.008 0.850±0.008
Annealing 0.873±0.027 0.893±0.017 0.910±0.024 0.910±0.024 0.911±0.021 0.880±0.024 0.863±0.017 0.951±0.014
ctg-10classes 0.768±0.032 0.802±0.030 0.800±0.026 0.817±0.023 0.813±0.023 0.771±0.027 0.751±0.030 0.834±0.027
oocTris2F 0.797±0.030 0.815±0.036 0.815±0.031 0.828±0.043 0.829±0.040 0.795±0.031 0.785±0.031 0.836±0.030
Mean values 0.7608 (5.31%) 0.7791 (3.48%) 0.7948 (1.91%) 0.7927 (2.012%) 0.7925 (2.14%) 0.7669 (4.70%) 0.7600 (5.39%) 0.8139
Avg. rank 6.813 5.250 3.563 3.125 3.063 6.188 6.938 1.063
It is easy to see from Tab.8 that 1) All feature enhancement methods except FSLR achieve the higher accuracy than the benchmark method, which highlights that the importance of feature enhancement strategy. 2) The AssoRep algorithm gets the highest accuracy values on all datasets. 3) The AssoRep algorithm achieves the improvement of 3.48% than the AF algorithm which indicates that the quality of association matrix plays an important role. 4) The mean accuracy of the AssoRep is higher 2.14% than the CRAMd algorithm that rank the first in seven baseline methods. It is noteworthy that the CRAMd uses the discriminative information from output space (label information) while the proposed AssoRep only uses information from the input (feature) space. Moreover, the new representation of CRAMc and CRAMd contains the original representation, which is helpful for performance improvement. This can be found the result that the performance of AFX is higher than AF. 5) Compared to the FSMI and FSLR, AF, AFX, CRAMc, and CRAMd get the better accuracy. This suggests that enhancing the feature by mining some new information from the original data may be more effective than only remove some weaker features. These interesting results indicate that the association-based representation learning is worth further studying.
To further assess the signification differences of the eight algorithms in term of the classification accuracy, we employ the Friedman test [55] that a favorable choice for comparisons of multiple algorithms over many datasets. It follows a Fisher distribution with k1 numerator degrees of freedom and (k1)(N1) denominator degree of freedom, and is defined as:
FF=(N1)χF2N(k1)χF2,where
χF2=(12N)k(k+1)(i=1kRi2k(k+1)4),
where k and N denote the number of the compared algorithms and datasets, respectively. Ri is the average rank of algorithm i among all the datasets. The smaller the average rank value is, the better the corresponding algorithm is. Its null hypothesis is rejected if the returned FF is higher than the specified the critical value.
As shown in Tab.9, the FF 20.610 is higher than the critical value 2.203 at signification level α=0.05, the null hypothesis that the accuracy of all algorithms is equivalent in this paper is clearly rejected. This indicates that the classification performance of eight algorithms is significantly different. Hence, we need to further study relative performance among the comparing algorithms. To this end, the Nemenyi post hoc test that compares classifiers in a pairwise manner is adoped. In Nemenyi test, the performance of two algorithms is considered significantly different if the distance of the average ranks exceeds the following critical distance
Tab.9 Summary of the Friedman statistics FF
Evaluation metric FF Critical value (α=0.05)
Accuracy 20.610 2.203
CD=qαk(k+1)6N,
where qα=0.05=3.031 when k=8.
The CD diagram is often used to illustrate the rank relation among the comparing algorithms. In CD diagrams, the average rank of each algorithm is marked along the axis (the smaller the better). As shown in Fig.1, AssoRep ranks the first. It is significantly better than the AF, FSMI and FSLR, while CRAMd has not a significant difference from those. This further validates the advantage of the proposed AssoRep.
Fig.1 Comparison between A and B (control algorithms, A and B denote the AssoRep and the baseline algorithm CRAMd with the best performance, and they are remarked with red star and blue star, respectively) against other comparing algorithms with the Nemenyi test. Algorithms are not connected with A (red line) and B (blue line) in the CD diagram are considered to have significantly different performance from the control algorithm (significance level α=0.05)

Full size|PPT slide

4.5 Efficiency analysis

This experiment aims to investigate the efficiency of the AssoRep algorithm via replacing the dCor with Pearson’s correlation coefficient (pCor), normalized mutual information (NMI), the maximal information coefficient (MIC) [18] and its improved version MICe [56]. The NMI, MIC and MICe have the very highly computational complexity, which brings some challenge for the comparison experiment. In this paper, we use minepy Python library that provides an efficient achievement of the MIC and MICe, while the sklearn toolkit is used to NMI. It is worth pointing out that the maximal neighborhood coefficient (MNC) [19] is not use due to its higher the computation complexity. The results are shown in Tab.10, in where the computation time is provided on AWArgsift-hist when L takes 1, while the computation time is recorded on other datasets when L takes 10.
Tab.10 Computation time (s) of the different association mining methods
Data pCor dCor NMI MIC MICe
Iris 0.16 0.05 1.01 0.30 0.29
oocMer4D 0.36 5.90 75.37 71.23 70.37
Contrac 0.22 0.55 3.84 10.05 7.16
Abalone 0.26 1.24 3.71 10.82 11.08
Magic 1.02 7.50 7.07 58.84 59.11
AWArgsift-hist 178.90 618.08 3124.94 9956.50 15541.76
According to Tab.10, we can observe that 1) the pCor costs the least time, but its classification accuracy is lower than dCor shown in Tab.10; 2) Compared to the NMI, MIC and MICe that are extremely time-consuming, the computation time of the dCor is accepted. For example, for the dataset AWArgsift-hist, dCor costs about 618 seconds for calculating the association relationships of 1999000 paired features and training the logistic regression model. While NMI needs to about 3124 seconds, MICe costs about 4.32 hours, which are five times and 25 times of computation time that dCor costs, respectively. These results suggest that the taking the dCor as the association mining is an appropriate choice that is able to well balance effectiveness and efficiency.

5 Conclusion

We have proposed an association-based representation improvement method (AssoRep), which is able to well balance effectiveness and efficiency. Moreover, AssoRep has a better interpretability because the work mechanism of its each process is transparent than existing enhancing feature methods like multilayer perceptron, attention. The effectiveness of AssoRep has been validated by a lot of experimental results on classification tasks.
Although this work further prefects and riches the association data reconstruction domain, like AF [9], AssoRep only provides the vector-like improved representation. As a result, it can not fit the models that take tensor-like data as input like convolutional neural networks. Hence, tensorizing association-based representation is worthwhile studying in the future. Moreover, AssoRep equally treats the relationship between the paired features, it is worthwhile to generalize the AssoRep with cause and effect among features. Like MIC and MICe, dCor over estimates the strength of association between two features when the true relationship is very weak. Hence, it is urgent to study a solution to eliminate the bias of dCor.

Xinyan Liang received the PhD degree in computer science and technology from Shanxi University, China in 2022. He is currently a Lecturer at the Institute of Big Data Science and Industry, Shanxi University, China. He was a visiting scholar at The University of Hong Kong, China in 2018. His main research interests include multi-modal machine learning, evolutionary intelligence, and their applications. He has published several journal papers in his research fields, including IEEE TPAMI, IEEE TEVC, etc

Yuhua Qian received the MS and PhD degrees in computers with applications from Shanxi University, China in 2005 and 2011, respectively. He is currently a Professor with the Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, China. He is best known for multigranulation rough sets in learning from categorical data and granular computing. He is involved in research on machine learning, pattern recognition, feature selection, granular computing, and artificial intelligence. He has authored over 150 articles on these topics in international journals. He served on the Editorial Board of the International Journal of Knowledge-Based Organizations and Artificial Intelligence Research

Qian Guo received the PhD degree in computer science and technology from Shanxi University, China in 2022. She is currently a Lecturer at the School of Computer Science and Technology, Taiyuan University of Science and Technology, China. She was a visiting scholar at The University of Hong Kong, China in 2018. Her current research interests include logic learning, abstract reasoning, deep learning and their applications

Keyin Zheng received a BS degree in information and computing science and Master’s degree in pattern recognition and intelligent system at school of Mathematical Sciences from Shanxi University, China in 2012 and 2015, respectively. She is a PhD candidate at Institute of Big Data Science and Industry, Shanxi University, China. Her research interest includes concept learning and machine learning

References

[1]
Zhu Y, Geng Y, Li Y, Qiang J, Wu X . Representation learning: serial-autoencoder for personalized recommendation. Frontiers of Computer Science, 2024, 18( 4): 184316
[2]
Bengio Y, Courville A, Vincent P . Representation learning: a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35( 8): 1798–1828
[3]
Jia B B, Liu J Y, Hang J Y, Zhang M L . Learning label-specific features for decomposition-based multi-class classification. Frontiers of Computer Science, 2023, 17( 6): 176348
[4]
Zhang M L, Fang J P, Wang Y B . BiLabel-specific features for multi-label classification. ACM Transactions on Knowledge Discovery from Data, 2021, 16( 1): 18
[5]
Yang M, Liu Q, Sun X, Shi N, Xue H . Towards kernelizing the classifier for hyperbolic data. Frontiers of Computer Science, 2024, 18( 1): 181301
[6]
Dong X, Luo T, Fan R, Zhuge W, Hou C . Active label distribution learning via kernel maximum mean discrepancy. Frontiers of Computer Science, 2023, 17( 4): 174327
[7]
Zhang Y, Jiang L, Li C . Attribute augmentation-based label integration for crowdsourcing. Frontiers of Computer Science, 2023, 17( 5): 175331
[8]
Troncoso-García A R, Martínez-Ballesteros M, Martínez-Álvarez F, Troncoso A . A new approach based on association rules to add explainability to time series forecasting models. Information Fusion, 2023, 94: 169–180
[9]
Liang X, Qian Y, Guo Q, Cheng H, Liang J . AF: an association-based fusion method for multi-modal classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44( 12): 9236–9254
[10]
Jia B B, Zhang M L . Multi-dimensional classification via kNN feature augmentation. Pattern Recognition, 2020, 106: 107423
[11]
Deng M, Yang W, Chen C, Liu C . Exploring associations between streetscape factors and crime behaviors using Google Street View images. Frontiers of Computer Science, 2022, 16( 4): 164316
[12]
Guo Q, Qian Y, Liang X . GLRM: logical pattern mining in the case of inconsistent data distribution based on multigranulation strategy. International Journal of Approximate Reasoning, 2022, 143: 78–101
[13]
Guo Q, Qian Y, Liang X, She Y, Li D, Liang J . Logic could be learned from images. International Journal of Machine Learning and Cybernetics, 2021, 12( 12): 3397–3414
[14]
Kuzma J. Basic Statistics for the Health Sciences. Palo Alto: Mayfield Publishing Company, 1984, 158–169
[15]
Spearman C . The proof and measurement of association between two things. The American Journal of Psychology, 1904, 15( 1): 72–101
[16]
Kendall M G . A new measure of rank correlation. Biometrika, 1938, 30( 1-2): 81–93
[17]
Székely G J, Rizzo M L, Bakirov N K . Measuring and testing dependence by correlation of distances. The Annals of Statistics, 2007, 35( 6): 2769–2794
[18]
Reshef D N, Reshef Y A, Finucane H K, Grossman S R, Mcvean G, Turnbaugh P J, Lander E S, Mitzenmacher M, Sabeti P C . Detecting novel associations in large data sets. Science, 2011, 334( 6062): 1518–1524
[19]
Cheng H, Qian Y, Hu Z, Liang J . Association mining method based on neighborhood perspective. SCIENTIA SINICA Informationis, 2020, 50( 6): 824–844
[20]
Zhu Y, Kwok J T, Zhou Z H . Multi-label learning with global and local label correlation. IEEE Transactions on Knowledge and Data Engineering, 2018, 30( 6): 1081–1094
[21]
Xu N, Shu J, Zheng R, Geng X, Meng D, Zhang M L . Variational label enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45( 5): 6537–6551
[22]
Zhang M L, Zhou Z H . A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 2014, 26( 8): 1819–1837
[23]
Zhang M L, Li Y K, Liu X Y, Geng X . Binary relevance for multi-label learning: an overview. Frontiers of Computer Science, 2018, 12( 2): 191–202
[24]
Kou Y, Lin G, Qian Y, Liao S . A novel multi-label feature selection method with association rules and rough set. Information Sciences, 2023, 624: 299–323
[25]
Zhang Y, Zhu H, Song Z, Koniusz P, King I. Spectral feature augmentation for graph contrastive learning and beyond. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. 2023, 11289−11297
[26]
Gao Z, Wu Y, Jia Y, Harandi M. Hyperbolic feature augmentation via distribution estimation and infinite sampling on manifolds. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 34421–34435
[27]
Zhang M L, Wu L . LIFT: multi-label learning with label-specific features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37( 1): 107–120
[28]
Zheng S, Yuan W, Guan D . Heterogeneous information network embedding with incomplete multi-view fusion. Frontiers of Computer Science, 2022, 16( 5): 165611
[29]
Wang B, Li H, Wei B, Kang Z, Li C . Nighttime image dehazing using color cast removal and dual path multi-scale fusion strategy. Frontiers of Computer Science, 2022, 16( 4): 164706
[30]
Wang Z, Li L, Xue Y, Jiang C, Wang J, Sun K, Ma H . FeNet: feature enhancement network for lightweight remote-sensing image super-resolution. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5622112
[31]
Wang W, Zhang M L. Partial label learning with discrimination augmentation. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022, 1920−1928
[32]
Gong C, Wang D, Li M, Chandra V, Liu Q. KeepAugment: a simple information-preserving data augmentation approach. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 1055−1064
[33]
Wang M, Han H, Huang Z, Xie J . Unsupervised spectral feature selection algorithms for high dimensional data. Frontiers of Computer Science, 2023, 17( 5): 175330
[34]
Liu J, Chai C, Luo Y, Lou Y, Feng J, Tang N. Feature augmentation with reinforcement learning. In: Proceedings of the 38th IEEE International Conference on Data Engineering. 2022, 3360−3372
[35]
Li H, Xu C, Ma L, Bo H, Zhang D . MODENN: a shallow broad neural network model based on multi-order descartes expansion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44( 12): 9417–9433
[36]
Taylor R . Interpretation of the correlation coefficient: a basic review. Journal of Diagnostic Medical Sonography, 1990, 6( 1): 35–39
[37]
Spearman C . The proof and measurement of association between two things. The American Journal of Psychology, 1987, 100( 3-4): 441–471
[38]
Spearman C . The proof and measurement of association between two things. International Journal of Epidemiology, 2010, 39( 5): 1137–1150
[39]
Puth M T, Neuhäuser M, Ruxton G D . Effective use of Spearman’s and Kendall’s correlation coefficients for association between two measured traits. Animal Behaviour, 2015, 102: 77–84
[40]
Shannon C E . A mathematical theory of communication. The Bell system Technical Journal, 1948, 27( 3): 379–423
[41]
Cheng H, Qian Y, Guo Y, Zheng K, Zhang Q . Neighborhood information-based method for multivariate association mining. IEEE Transactions on Knowledge and Data Engineering, 2023, 35( 6): 6126–6135
[42]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6000−6010
[43]
Shen W X, Zeng X, Zhu F, Wang Y L, Qin C, Tan Y, Jiang Y Y, Chen Y Z . Out-of-the-box deep learning prediction of pharmaceutical properties by broadly learned knowledge-based molecular representations. Nature Machine Intelligence, 2021, 3( 4): 334–343
[44]
Liang X, Guo Q, Qian Y, Ding W, Zhang Q . Evolutionary deep fusion method and its application in chemical structure recognition. IEEE Transactions on Evolutionary Computation, 2021, 25( 5): 883–893
[45]
Gretton A, Bousquet O, Smola A, Schölkopf B. Measuring statistical dependence with hilbert-schmidt norms. In: Proceedings of the 16th International Conference on Algorithmic Learning Theory. 2005, 63−77
[46]
Fernández-Delgado M, Cernadas E, Barro S, Amorim D. Do we need hundreds of classifiers to solve real world classification problems? The Journal of Machine Learning Research, 2014, 15(1): 3133–3181
[47]
Lampert C H, Nickisch H, Harmeling S . Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36( 3): 453–465
[48]
Arevalo J, Solorio T, Montes-y-Gómez M, Gonzalez F A . Gated multimodal networks. Neural Computing and Applications, 2020, 32( 14): 10209–10228
[49]
Zhang Y, Cao C, Cheng J, Lu H . EgoGesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Transactions on Multimedia, 2018, 20( 5): 1038–1050
[50]
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É . Scikit-learn: machine learning in python. The Journal of Machine Learning Research, 2011, 12: 2825–2830
[51]
Cortes C, Vapnik V . Support-vector networks. Machine Learning, 1995, 20( 3): 273–297
[52]
Cover M, Hart E . Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 1967, 13( 1): 21–27
[53]
Breiman L . Random forests. Machine Learning, 2001, 45( 1): 5–32
[54]
Freund Y, Schapire R E . Large margin classification using the perceptron algorithm. Machine Learning, 1999, 37( 3): 277–296
[55]
Demšar J . Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 2006, 7: 1–30
[56]
Reshef Y A, Reshef D N, Finucane H K, Sabeti P C, Mitzenmacher M . Measuring dependence powerfully and equitably. The Journal of Machine Learning Research, 2016, 17( 1): 7406–7468

Acknowledgements

This work was supported by the National Key R&D Program of China (No. 2021ZD0112400), the National Natural Science Foundation of China (Grant Nos. 62306171, 62136005, 61976129, 62106132, 61906114, 61906115), the Science and Technology Major Project of Shanxi (No. 202201020101006), the Young Scientists Fund of the Natural Science Foundation of Shanxi (Nos. 202203021222183, 20210302124549), the Open Project Foundation of Intelligent Information Processing Key Laboratory of Shanxi Province (Nos. CICIP2023005, CICIP202205), the Science and Technology Innovation Plan for Colleges and Universities of Shanxi Province (2022L296), and Taiyuan University of Science and Technology Doctoral Research Start-up Fund Project (20222106).

Competing interests

The authors declare that they have no competing interests or financial conflicts to disclose.

RIGHTS & PERMISSIONS

2025 Higher Education Press
AI Summary AI Mindmap
PDF(1511 KB)

Supplementary files

FCS-23396-OF-XL_suppl_1 (253 KB)

864

Accesses

2

Citations

Detail

Sections
Recommended

/