1 Introduction
The success of deep learning [
1,
2], multi-label learning [
3,
4], kernel learning [
5,
6] shows that learning with enhanced features instead of the original features maybe more effective. For example, the multilayer perceptron and attention have been designed to enhance the representation ability of data in an implicit manner, achieving the performance improvement of machine learning models [
7]. However, their poor interpretability strongly limits their application in the trusted domain. In this article, the interpretability denotes the transparency of the model specifically related to humans’ ability to understand it [
8]. Hence, it is necessary to develop an interpretable representation enhancement method. Recently, some researchers have attempted to enhance the representation ability of data by fully mining and utilizing the latent information in data with some transparent techniques [
9,
10].
The association information that characterizes the relationship among features/variables is a kind of important latent information of data. The datasets to analyze are mostly collected from real applications, they often contain important and rich association relationship forms [
11–
13]. However, most researchers in machine learning domain prefer obtaining the independent feature representation by putting the orthogonal constraint on a new feature space for some reasons such as feature decoupling, simplicity in modeling. This strategy means that the association among features is removed, which not only causes information waste but also maybe not a good strategy for learning on association data. Ours recent work (the method is named as AF) [
9] applies association among features calculated using Pearson’s correlation coefficient (pCor) to data reconstruction, finding that association in-between features can improve the representation ability of data. However, AF has two limitations:
1. Data representation obtained by AF is high dimension or sparse. AF consists of feature boosting process and association-based fusion process. In order to model the high-order information of features and improve the nonlinear representation ability of original data, feature boosting process adopted a simple but effective way of adding the power of each feature value into the original feature space. This process indeed achieves their goal, but it also causes a tricky problem that the dimension of new representation must be higher than that of the original data representation. For example, if the dimension of given data set is 100, the dimension of new representation will be 1000 when the parameter takes value 10. The issue of curse of dimensionality limits AF application to the high-dimensional data. Hence, it is desirable to develop an association-based data reconstruction method that can generate a lower-dimensional data representation.
2. pCor used to capture the association between features by AF does not well balance effectiveness and efficiency. AF’s one core task is to measure the association degree between two feature vectors. Some association measure methods like pCor is computationally efficient, but some methods themselves have some limitations. For example, pCor’s value dose not accurately reveal whether two features are independent; moreover, pCor is only appropriate for calculating association between two feature vectors with the same dimension. Others like MIC, MNC can mine more relationships, but it is computationally inefficient. Overall, the association computed by simple association measure methods is inaccurate, while advanced methods are computationally inefficient. Hence, it is necessary to explore a more practical association measure method that can balance both effectiveness and efficiency to association-based data reconstruction task.
Based on the above analysis, ours aim is to develop a novel association data reconstruction that well balances efficiency and effectiveness by using more proper association measure method and low dimensional embedding techniques. To this end, we develop an association-based representation enhancement, which is shortened to AssoRep. AssoRep first obtains the association between features via distance correlation method that has some advantages than Pearson’s correlation coefficient. Then an improved matrix is formed via stacking the association value of any two features. Next, an improved feature representation is obtained by aggregating the original feature with the enhancement matrix. Finally, the improved feature representation is mapped to a low-dimensional space via principal component analysis. It is noted that the work mechanism of AssoRep’s each process is transparent.
The contributions of this work are as follows:
1. We introduce a fresh perspective on data representation improvement through association between features, which perfects the relationship-based learning that mainly focuses on relationships among samples such as graph neural network and spectral clustering.
2. A novel distance correlation-based data representation method is proposed, and it well balances effectiveness and efficiency compared to its counterpart AF [
9].
3. The experimental results on 120 benchmark show that the proposed AssoRep outperforms the other methods in term of five popular evaluation metrics widely used for classification in most cases.
The remainder of this paper is organized as follows: Section 2 reviews the related works including learning with association and feature argumentation. Section 3 details the AssoRep, a representation framework for the associated data. Section 4 details the experimental setup and the results on the classification task. In Section 5, the conclusions and future work are presented.
2 Related work
Our work falls into the category of association mining, learning based on association and feature enhancement. To show the place of our work, we shall simply review them as follows.
Association mining: To measure the association among variables, the scholars have proposed lots of methods. For example, the well-known Pearson correlation coefficient was designed for measuring the strength of linear trend between two variables [
14]; Spearman’s rho [
15] and Kendall’s tau [
16] were developed for measuring the degree of monotonic trend between two variables. For identifying the complex association relationships among variables such as trigonometric function, inverse trigonometric function some advanced methods have been developed such as distance corrlation (DC) [
17], maximal information coefficient (MIC) [
18], and maximal neighborhood coefficient (MNC) [
19].
Learning with association: The association has been proven to be a kind of effective latent information for performance improvement or other aims on some machine learning tasks, especially multi-label learning [
20–
22]. For example, to enable binary relevance with label correlation exploitation abilities, the researches have proposed the chaining structure, the stacking structure and the controlling structure based on three assumptions: random label correlations, full-order label correlations and pruned label correlations, respectively [
23]. Recently, association is also been applied to other tasks. For example, Kou et al. [
24] developed a mining label association rules method for automatically mining the mixed order correlation among labels, and then applied the correlations to multi-label feature select task. Troncoso et al. [
8] explained models for time series forecasting with the help of numeric association rules. Although the above methods achieve success for different aims in various tasks, most works are from multi-label learning task, and consider the association among labels.
Feature argumentation: The feature argumentation generally serves two purposes: producing new samples and boosting the representation ability. The former generates more diverse and discriminative features by noise injection [
25], sampling on hyperbolic normal distribution [
26], each generated feature corresponds to a new examples. Similar to our work, the latter aims to re-represent the examples based on the original features and extra information like distance information [
27], multi-view features [
28], or multi-scale information [
29]. For example, Jia et al. [
10] achieved a performance improvement of multi-dimensional classification on the augmented feature space that consists of counting statistics on the class membership of neighboring as well as distance information between examples and their
nearest neighbors via
NN techniques. Wang et al. [
30] induced an enhanced feature representation by fusing multi-scale discriminative information from different layers of the convolutional neural network into a single feature vector. Wang et al. [
31] enriched the feature space using confidence-rated class prototype features to replenish discriminative characteristics of the underlying ground-truth labels for partial label training examples. Its benefits were demonstrated in many applications such as multi-label learning [
10], multi-modal classification [
9], and multi-camera tracking [
32]. Another kind of feature augmentation method is the feature selection that removes the unimportant features to achieve the purpose [
33]. For example, Liu et al. [
34] removed weaker features from multiple candidate sets based on an exploration-exploitation strategy reinforcement learning. It is worth noticing that some existing feature argumentation methods like the CRAM
use the discriminative information from output space (label information). In this paper, we introduce a fresh perspective on data representation improvement that only uses information association from the input (feature) space.
3 The AssoRep method
This article proposes a framework of enhancing representation via association and name it as AssoRep. AssoRep includes (1) relationship boosting, (2) association mining, and (3) association embedding.
Let be a set with examples and be its corresponding label set. Then a dataset can be represented as
where where denotes th example, and are the numbers of examples and features, respectively; where is ’ label.
Let be the feature vector set of the data set . Then it is written as follows
where denotes the th feature vector from .
3.1 Relationship boosting
Its aim is to enrich features by adding transform terms using different transform functions. This process can be viewed as the first enhancement for
. In this article, the power functions with different integer order are used to this end. The effectiveness of boosting relationship with power functions has been validated by some works such as [
9,
35].
Let be the relationship boosting data representation of . Given a set of power functions where is the maximal order, , we obtain as follows:
1. For each feature vector , compute its transform values using the power functions and represent these transform values as the following matrix form
2. Concatenate the transform values of feature vectors from as follows,
where
3.2 Association mining
The purpose of this article is to enhance the representation ability of given datasets via association information between feature vectors. Hence, one core task is to measure the association degree between two feature vectors, and the choice of association mining methods is important.
3.2.1 Choice of association mining method
If we view every feature as a variable in statistic, then in correlation analysis, the methods which are used to measure correlation coefficient can be adopted for mining association among features. A basic aspects of the correlation analysis can see the literature [
36]. In the following, we briefly introduce some correlation analysis methods and detail the distance correlation that is used in our work.
The widely-used Pearson correlation coefficient (pCor), also named as Pearson product-moment correlation coefficient, can give the strength of linear trend between two variables [
14]. Spearman’s rho [
15] that was reprinted and reflected more than once (see [
37,
38]) and Kendall’s tau [
16] are two rank order correlation coefficients. Both of them are often used to measure the degree of monotonic trend between two variables. A comparison analysis between Spearman’s rho and Kendall’s tau can be seen in literature [
39].
Mutual information, a frequently-used mathematical theory, is often used to construct the association measurement tools [
40]. For example, in 2011, David et al. thought if a relationship exists between two variables, then a grid can be drawn on the scatter plot of the two variables that partitions the data to encapsulate that relationship. Based on the idea, they proposed the maximal information coefficient (MIC) where these grid partitions are applied to estimate mutual information [
18]. With the inspiration of MIC, Cheng et al. developed effective bivariate and multivariate association mining techniques by replacing the example with its neighbor points from the perspective of neighborhood information [
19,
41]. They show the powerful ability of capturing various kinds of functional relationships.
The above mentioned methods are either with their own shortcomings (i.e., pCor) or typically computationally intensive (i.e., MIC, MNC). With the trade-off between measurement effectiveness and computational complexity, we choose the distance correlation (dCor) [
17], a correlation analysis method based on characterize function, as the mining association information tool. Given two feature vectors
and
, where
and
are the dimention of the two vectors, the distance covariance of two feature vectors
, distance correlation [
17] between two random variables is defined by
where
where , and denote the characteristic function of , and the joint characteristic function between both of them, respectively.
The distance correlation possesses the following features:
● ;
● is defined for and in arbitrary dimensions, while the widely-used Pearson’s correlation coefficient (pCor) must be same. That is to say, the constraint has to be meet for pCor but not for dCor;
● characterizes independence of and while the pCor is not;
● Compared with MIC, MNC etc., it is computationally efficient.
3.2.2 Computing the association in-between features
It aims to obtain an association matrix as enhancement matrix via stacking the association values of any two feature vectors where the association between features is computed via distance correlation method.
To measure the association between any two feature vectors in a given data set, empirical distance correlation (dCor) [
17] is introduced due to its good properties described above, especially than Pearson’s correlation coefficient.
Let be the feature vector set of that is the relationship boosting data representation of . Then it can be denoted as
where denotes the th feature vector from shown in Eq. (2), where .
Given two feature vectors and , where is the number of examples. The empirical distance covariance of the two feature vectors is defined by
where , and each term of them are computed as follows:
Similarly, or can be defined as
Based on Eqs. (9), (14), and (15), the two feature vectors empirical distance correlation can be obtained with Eq. (16)
With Eq. (16), the enhancement matrix can be obtained and represented as
(3) Association embedding: It aims to further enhance feature representation of by aggregating the first enhancement result with the enhancement matrix .
Let and denote the th row and th column of the matrix and respectively, be the final enhanced data representation. Then the element of the th row and th column of the matrix can be computed by
where , , is an infinitesimal.
Further, let denote the th column of the matrix , denote the th row of the matrix , and denote the element-wise product. Then can be computed in the form of vector inter product by
Let , then can be computed in the form of matrix multiplication by
The behavior of Eq. (18) is similar to the self-attention mechanism [
42]. Specifically, the association matrix
in Eq. (18) corresponds to the similarity
of the query matrix
and key matrix
in the self-attention mechanism, i.e.,
.
denotes that the similarity between feature
and
, and the similarity based on the inner product of vectors can be thought as a measure of the linear relationship; While
denotes that association between feature
and
, and its values can more complex relationship via some advanced association mining technique.
corresponds to
where
denotes the values in the self-attention mechanism. Noting that the power functions in relationship boosting process make the feature values dramatically. Inspired by Taylor’s Formula, a reweighting strategy is used to relieve the problem, i.e.,
. The vast success of self attention in various tasks have proven the effectiveness of the mechanism.
AssoRep algorithm only is a presentation method and its output is . So, to finish some downstream tasks such as classification, clustering, the AssoRep algorithm must combine with existing machine learning algorithms. The combining process is very simple, we do not need any modification for existing machine learning algorithms. In this following, we gave the steps in the context of supervised learning.
For supervised learning task, we first need to combine the enhanced representation and the label set , and obtain a new data set . It can be represented as
Let be a supervised machine learning model to be combined and it takes as input. Then we only let take as input, i.e., , the process of combining AssoRep algorithm with the supervised algorithm is achieved. We can instantiate with different classifiers such as logistic regression, support vector machine and random forest.
It should be noticed that the relationship boosting process in AssoRep algorithm causes dimension increment of the new representation obtained by AssoRep. To address this issue, principal component analysis (PCA) is used.
In summary, the efficiency of AssoRep comes from two aspects. The first is that dCor is high-efficiency than NMI, MIC, and MIC
. The second is that the dimension of the new representation is reduced with PCA. With these advantage, the AssoRep has many potential applications such as drug properties prediction, recommended system. Taking the drug properties prediction for example, there exist the complex relationships among different types structure descriptors [
43], these relationship information can be fully used to improve the molecular representations via the AssoRep.
4 Experiment
This section aims to validate the effectiveness of AssoRep on classification task from four perspective: comparison analysis on datasets with different sample size, generality coupled with the existing classification algorithms, comparison with other other feature enhancement methods, and efficiency analysis on different association mining methods. For most datasets, 10-fold cross validation is adopted for all approaches to compute the mean of each performance metric. For few of datasets, the classification algorithms are very unstable when 10-fold cross validation is adopted, according to demands, -fold or -fold cross validation is adopted.
4.1 Evaluation metrics
To measure the performance of a classification result, we employ five frequently-used metrics [
44]: accuracy (
), precision (
), recall (
),
score, and kappa (
). The larger values of these five evaluation measures indicate a better classification performance. They are defined as follows.
where
● denotes the number of true positives;
● denotes the number of true negatives;
● denotes the number of false positives;
● denotes the number of false negatives;
● ;
● is the empirical probability of agreement on the label assigned to any sample (the observed agreement ratio), and is the expected agreement when both annotators assign labels randomly.
4.2 Experimental results on 120 benchmark datasets with different sample size
The quality of the association matrix in Eq. (17) is key for performance guarantee of the AssoRep. Given any two random variables, more observation values (sample size) of the two random variables are, the more accurate the association degree measured via one association mining method is [
45]. Hence, the main factor that influences on the performance of AssoRep is the sample size of datasets. For comprehensively showing the behavior of the AssoRep on the datasets with different sample size, we report its results on 120 datasets whose sample size vary from 10 to 67557. Based on the sample size, these datasets are equally divided into two groups:
● Group 1: The number of sample is larger than 700;
● Group 2: The number of sample is smaller than 700.
To a fair comparison, 115 datasets out of 120 directly use the pre-processed ones by Fernandez et al. [
46]. AWArgsift-hist [
47], MM-IMDB-T [
48], MM-IMDB-I [
48], and Gesture-R [
49] are used as vector features for adaptation to logistic regression algorithm.
All experiments are carried out in Python 3.6 on a server with an AMD EPYC 7542 32-Core Processor with 755 G RAM. The combined algorithms are from the Scikit-learn python library [
50].
4.2.1 Results on the Group 1
In this experiment, we aim to validate the effectiveness of AssoRep on 60 datasets with larger sample size. Tab.1 displays the detailed characteristics of each dataset including number of examples (), number of features (), and number of class labels (). As shown in Tab.1, the sample size varies from 748 to 67557. Specifically, let and be the algorithms that learn from the original data representation and AssoRep data representation, respectively. Then, takes value the logistic regression algorithm (LR), we compare LR with LR on 60 benchmark datasets. The experimental results are shown in Tab.2 and Tab.3 where LR and LR denote that classifier LR learns from the original data representation and AssoRep data representation , respectively. For each metric of each data set, the best result between LR and LR is marked with the bold font.
Tab.1 Characteristics of the first group of datasets whose sample sizes are larger than 700 (Group 1) |
ID | Dataset | | | | | ID | Dataset | | | | | ID | Dataset | | | |
L1 | abalone | 4177 | 8 | 3 | | L2 | adult | 48842 | 14 | 2 | | L3 | annealing | 798 | 38 | 6 |
L4 | bank | 4521 | 17 | 2 | L5 | blood | 748 | 4 | 2 | L6 | car | 1728 | 6 | 4 |
L7 | ctg-10classes | 2126 | 21 | 10 | L8 | ctg-3classes | 2126 | 21 | 3 | L9 | chess-krvk | 28056 | 6 | 18 |
L10 | chess-krvkp | 3196 | 36 | 2 | L11 | connect-4 | 67557 | 42 | 2 | L12 | contrac | 1473 | 9 | 3 |
L13 | energy-y1 | 768 | 8 | 3 | L14 | wav-mfcc | 15352 | 80 | 1215 | L15 | led-display | 1000 | 7 | 10 |
L16 | letter | 20000 | 16 | 26 | L17 | magic | 19020 | 10 | 2 | L18 | mammographic | 961 | 5 | 2 |
L19 | molec-biol-splice | 3190 | 60 | 3 | L20 | monks-3 | 3190 | 6 | 2 | L21 | mushroom | 8124 | 21 | 2 |
L22 | musk-2 | 6598 | 166 | 2 | L23 | nursery | 12960 | 8 | 5 | L24 | oocMerl2F | 1022 | 25 | 3 |
L25 | oocMerl4D | 1022 | 41 | 2 | L26 | oocTris2F | 912 | 25 | 2 | L27 | oocTris5B | 912 | 32 | 3 |
L28 | optical | 3823 | 62 | 10 | L29 | ozone | 2536 | 72 | 2 | L30 | page-blocks | 5473 | 10 | 5 |
L31 | pendigits | 7494 | 16 | 10 | L32 | pima | 768 | 5 | 2 | L33 | plant-margin | 1600 | 64 | 100 |
L34 | plant-shape | 1600 | 64 | 100 | L35 | plant-texture | 1600 | 36 | 100 | L89 | ringnorm | 7400 | 20 | 2 |
L37 | semeion | 1593 | 256 | 10 | L38 | spambase | 4601 | 57 | 2 | L39 | st-german-credit | 1000 | 24 | 2 |
L40 | st-image | 2310 | 18 | 7 | L41 | st-landsat | 4435 | 36 | 6 | L42 | st-shuttle | 43500 | 9 | 7 |
L43 | st-vehicle | 846 | 18 | 4 | L44 | steel-plates | 1941 | 27 | 7 | L45 | thyroid | 3772 | 21 | 3 |
L46 | tic-tac-toe | 958 | 9 | 2 | L47 | titanic | 2201 | 3 | 2 | L48 | twonorm | 7400 | 20 | 2 |
L49 | wall-following | 5456 | 24 | 4 | L50 | waveform | 5000 | 21 | 3 | L51 | wine-quality-red | 1599 | 11 | 6 |
L52 | wine-quality-white | 4898 | 11 | 7 | L53 | yeast | 1484 | 8 | 10 | L54 | robotnavigation | 5456 | 25 | 4 |
L55 | AWArgsift-hist | 3048 | 2000 | 10 | L56 | UJIndoorLoc | 21048 | 520 | 5 | L57 | MM-IMDB-T | 7799 | 600 | 2 |
L58 | MM-IMDB-I | 7799 | 2048 | 2 | L59 | YouTubeFaces4 | 5074 | 838 | 31 | L60 | Gesture-R | 4977 | 2048 | 83 |
Tab.2 Classification performance comparison between LR and LR on benchmark datasets L1-L40 |
Data | Accuracy | | Precision | | Recall | | F1 | | Kappa |
LR | LR | | LR | LR | | LR | LR | | LR | LR | | LR | LR |
L1 | 0.647±0.020 | 0.662±0.021 | | 0.636±0.021 | 0.652±0.022 | | 0.642±0.020 | 0.658±0.021 | | 0.636±0.021 | 0.653±0.021 | | 0.469±0.031 | 0.492±0.031 |
L2 | 0.843±0.007 | 0.852±0.007 | | 0.796±0.010 | 0.809±0.012 | | 0.738±0.016 | 0.757±0.014 | | 0.759±0.014 | 0.777±0.013 | | 0.521±0.028 | 0.557±0.025 |
L3 | 0.873±0.027 | 0.951±0.014 | | 0.792±0.116 | 0.924±0.077 | | 0.641±0.111 | 0.888±0.080 | | 0.678±0.113 | 0.901±0.077 | | 0.620±0.087 | 0.871±0.040 |
L4 | 0.895±0.007 | 0.897±0.005 | | 0.761±0.040 | 0.760±0.022 | | 0.612±0.030 | 0.641±0.029 | | 0.644±0.035 | 0.675±0.027 | | 0.301±0.066 | 0.357±0.052 |
L5 | 0.772±0.015 | 0.786±0.019 | | 0.705±0.099 | 0.713±0.054 | | 0.549±0.023 | 0.608±0.033 | | 0.535±0.036 | 0.621±0.040 | | 0.135±0.059 | 0.267±0.074 |
L6 | 0.794±0.027 | 0.881±0.026 | | 0.577±0.123 | 0.809±0.090 | | 0.444±0.062 | 0.630±0.068 | | 0.470±0.079 | 0.667±0.075 | | 0.498±0.072 | 0.737±0.060 |
L7 | 0.768±0.032 | 0.834±0.027 | | 0.762±0.054 | 0.837±0.041 | | 0.639±0.039 | 0.782±0.039 | | 0.668±0.042 | 0.796±0.035 | | 0.720±0.039 | 0.802±0.033 |
L8 | 0.894±0.018 | 0.912±0.018 | | 0.827±0.048 | 0.861±0.044 | | 0.775±0.046 | 0.827±0.035 | | 0.796±0.042 | 0.841±0.035 | | 0.701±0.054 | 0.754±0.053 |
L9 | 0.282±0.008 | 0.351±0.010 | | 0.242±0.038 | 0.337±0.041 | | 0.203±0.011 | 0.299±0.018 | | 0.188±0.010 | 0.294±0.021 | | 0.179±0.009 | 0.264±0.012 |
L10 | 0.970±0.012 | 0.971±0.014 | | 0.970±0.013 | 0.971±0.014 | | 0.970±0.012 | 0.971±0.014 | | 0.970±0.012 | 0.971±0.014 | | 0.940±0.025 | 0.941±0.027 |
L11 | 0.754±0.000 | 0.830±0.004 | | 0.720±0.091 | 0.784±0.006 | | 0.502±0.001 | 0.728±0.006 | | 0.434±0.002 | 0.748±0.006 | | 0.005±0.002 | 0.499±0.012 |
L12 | 0.507±0.042 | 0.568±0.055 | | 0.491±0.053 | 0.550±0.066 | | 0.472±0.044 | 0.531±0.055 | | 0.474±0.047 | 0.533±0.058 | | 0.221±0.067 | 0.318±0.085 |
L13 | 0.874±0.013 | 0.881±0.012 | | 0.847±0.026 | 0.862±0.022 | | 0.786±0.020 | 0.796±0.020 | | 0.795±0.023 | 0.807±0.024 | | 0.792±0.021 | 0.804±0.020 |
L14 | 0.231±0.011 | 0.281±0.007 | | 0.137±0.009 | 0.164±0.007 | | 0.166±0.010 | 0.209±0.008 | | 0.142±0.009 | 0.174±0.007 | | 0.229±0.011 | 0.280±0.007 |
L15 | 0.735±0.040 | 0.735±0.040 | | 0.745±0.039 | 0.745±0.039 | | 0.736±0.040 | 0.736±0.040 | | 0.731±0.038 | 0.731±0.038 | | 0.705±0.045 | 0.705±0.045 |
L16 | 0.723±0.013 | 0.846±0.009 | | 0.725±0.013 | 0.849±0.009 | | 0.721±0.013 | 0.845±0.009 | | 0.720±0.013 | 0.846±0.009 | | 0.712±0.014 | 0.840±0.010 |
L17 | 0.791±0.006 | 0.850±0.008 | | 0.782±0.009 | 0.845±0.009 | | 0.745±0.007 | 0.820±0.011 | | 0.756±0.007 | 0.829±0.010 | | 0.517±0.014 | 0.660±0.019 |
L18 | 0.823±0.035 | 0.832±0.035 | | 0.825±0.035 | 0.834±0.034 | | 0.823±0.035 | 0.831±0.034 | | 0.822±0.035 | 0.831±0.035 | | 0.645±0.070 | 0.663±0.069 |
L19 | 0.835±0.018 | 0.951±0.013 | | 0.819±0.020 | 0.942±0.014 | | 0.831±0.021 | 0.949±0.012 | | 0.824±0.020 | 0.945±0.013 | | 0.735±0.029 | 0.920±0.021 |
L20 | 0.761±0.123 | 0.930±0.067 | | 0.777±0.127 | 0.937±0.063 | | 0.761±0.124 | 0.930±0.067 | | 0.757±0.125 | 0.929±0.068 | | 0.521±0.246 | 0.859±0.135 |
L21 | 0.947±0.009 | 1.000±0.000 | | 0.947±0.009 | 1.000±0.000 | | 0.946±0.009 | 1.000±0.000 | | 0.947±0.009 | 1.000±0.000 | | 0.893±0.018 | 1.000±0.000 |
L22 | 0.949±0.005 | 0.945±0.005 | | 0.921±0.011 | 0.921±0.012 | | 0.878±0.015 | 0.858±0.016 | | 0.898±0.011 | 0.885±0.012 | | 0.795±0.021 | 0.771±0.023 |
L23 | 0.899±0.007 | 0.916±0.007 | | 0.649±0.056 | 0.660±0.056 | | 0.664±0.057 | 0.676±0.057 | | 0.656±0.056 | 0.668±0.056 | | 0.851±0.010 | 0.876±0.010 |
L24 | 0.918±0.021 | 0.930±0.021 | | 0.881±0.046 | 0.923±0.034 | | 0.893±0.054 | 0.919±0.038 | | 0.883±0.045 | 0.919±0.032 | | 0.823±0.045 | 0.847±0.047 |
L25 | 0.796±0.036 | 0.837±0.020 | | 0.788±0.051 | 0.819±0.038 | | 0.731±0.045 | 0.803±0.028 | | 0.746±0.047 | 0.809±0.030 | | 0.499±0.092 | 0.619±0.061 |
L26 | 0.797±0.030 | 0.836±0.030 | | 0.800±0.033 | 0.834±0.029 | | 0.787±0.031 | 0.829±0.035 | | 0.789±0.031 | 0.830±0.032 | | 0.580±0.061 | 0.661±0.064 |
L27 | 0.924±0.021 | 0.930±0.024 | | 0.866±0.151 | 0.915±0.109 | | 0.828±0.140 | 0.897±0.109 | | 0.840±0.141 | 0.900±0.107 | | 0.846±0.044 | 0.858±0.050 |
L28 | 0.964±0.016 | 0.968±0.013 | | 0.965±0.016 | 0.969±0.013 | | 0.964±0.016 | 0.968±0.013 | | 0.964±0.016 | 0.968±0.013 | | 0.960±0.018 | 0.965±0.015 |
L29 | 0.969±0.008 | 0.966±0.011 | | 0.570±0.173 | 0.743±0.176 | | 0.533±0.074 | 0.584±0.045 | | 0.542±0.104 | 0.611±0.060 | | 0.092±0.205 | 0.226±0.120 |
L30 | 0.954±0.003 | 0.959±0.004 | | 0.862±0.043 | 0.842±0.049 | | 0.659±0.039 | 0.701±0.029 | | 0.725±0.044 | 0.753±0.030 | | 0.720±0.023 | 0.763±0.024 |
L31 | 0.943±0.010 | 0.983±0.004 | | 0.943±0.010 | 0.983±0.005 | | 0.943±0.010 | 0.983±0.004 | | 0.942±0.010 | 0.983±0.005 | | 0.937±0.011 | 0.981±0.005 |
L32 | 0.779±0.029 | 0.779±0.029 | | 0.768±0.039 | 0.768±0.039 | | 0.734±0.029 | 0.734±0.029 | | 0.743±0.031 | 0.743±0.031 | | 0.490±0.062 | 0.490±0.062 |
L33 | 0.747±0.025 | 0.798±0.022 | | 0.724±0.025 | 0.779±0.019 | | 0.750±0.025 | 0.796±0.022 | | 0.714±0.023 | 0.767±0.020 | | 0.745±0.026 | 0.796±0.023 |
L34 | 0.509±0.032 | 0.564±0.038 | | 0.444±0.033 | 0.502±0.045 | | 0.518±0.030 | 0.569±0.035 | | 0.446±0.033 | 0.501±0.042 | | 0.504±0.032 | 0.560±0.038 |
L35 | 0.809±0.018 | 0.839±0.028 | | 0.789±0.028 | 0.823±0.047 | | 0.810±0.022 | 0.841±0.033 | | 0.776±0.025 | 0.811±0.040 | | 0.807±0.018 | 0.837±0.028 |
L36 | 0.760±0.016 | 0.986±0.005 | | 0.763±0.015 | 0.986±0.005 | | 0.760±0.016 | 0.986±0.005 | | 0.759±0.016 | 0.986±0.005 | | 0.520±0.032 | 0.972±0.010 |
L37 | 0.890±0.031 | 0.927±0.019 | | 0.896±0.032 | 0.932±0.018 | | 0.889±0.031 | 0.927±0.019 | | 0.889±0.032 | 0.927±0.019 | | 0.878±0.035 | 0.919±0.021 |
L38 | 0.925±0.011 | 0.932±0.008 | | 0.925±0.012 | 0.931±0.010 | | 0.919±0.012 | 0.927±0.008 | | 0.921±0.012 | 0.929±0.009 | | 0.843±0.023 | 0.857±0.018 |
L39 | 0.761±0.040 | 0.771±0.040 | | 0.716±0.053 | 0.733±0.054 | | 0.684±0.060 | 0.685±0.062 | | 0.691±0.062 | 0.695±0.065 | | 0.389±0.117 | 0.400±0.122 |
L40 | 0.913±0.019 | 0.929±0.014 | | 0.917±0.016 | 0.930±0.013 | | 0.913±0.019 | 0.929±0.014 | | 0.913±0.018 | 0.928±0.013 | | 0.898±0.022 | 0.917±0.016 |
Tab.3 Classification performance comparison between LR and LR on benchmark datasets L41-L60 |
Data | Accuracy | | Precision | | Recall | | F1 | | Kappa |
LR | LR | | LR | LR | | LR | LR | | LR | LR | | LR | LR |
L41 | 0.838±0.009 | 0.887±0.009 | | 0.803±0.029 | 0.867±0.012 | | 0.757±0.009 | 0.858±0.008 | | 0.751±0.011 | 0.861±0.009 | | 0.797±0.011 | 0.860±0.011 |
L42 | 0.930±0.002 | 0.992±0.001 | | 0.522±0.079 | 0.780±0.116 | | 0.488±0.078 | 0.633±0.093 | | 0.501±0.078 | 0.653±0.087 | | 0.783±0.008 | 0.978±0.003 |
L43 | 0.792±0.024 | 0.819±0.024 | | 0.790±0.029 | 0.821±0.024 | | 0.794±0.025 | 0.821±0.024 | | 0.786±0.027 | 0.819±0.025 | | 0.723±0.032 | 0.759±0.032 |
L44 | 0.706±0.026 | 0.744±0.023 | | 0.731±0.049 | 0.773±0.028 | | 0.695±0.042 | 0.756±0.043 | | 0.702±0.048 | 0.758±0.033 | | 0.619±0.033 | 0.671±0.029 |
L45 | 0.950±0.004 | 0.960±0.010 | | 0.876±0.063 | 0.897±0.074 | | 0.664±0.041 | 0.705±0.088 | | 0.672±0.025 | 0.763±0.083 | | 0.528±0.005 | 0.644±0.099 |
L46 | 0.983±0.016 | 0.983±0.016 | | 0.988±0.012 | 0.988±0.012 | | 0.976±0.023 | 0.976±0.023 | | 0.981±0.018 | 0.981±0.018 | | 0.962±0.037 | 0.962±0.037 |
L47 | 0.776±0.019 | 0.778±0.019 | | 0.760±0.025 | 0.763±0.025 | | 0.700±0.029 | 0.703±0.028 | | 0.714±0.030 | 0.718±0.030 | | 0.437±0.056 | 0.444±0.055 |
L48 | 0.979±0.006 | 0.979±0.006 | | 0.979±0.006 | 0.979±0.006 | | 0.979±0.006 | 0.979±0.006 | | 0.979±0.006 | 0.979±0.006 | | 0.957±0.012 | 0.957±0.012 |
L49 | 0.688±0.013 | 0.922±0.011 | | 0.690±0.036 | 0.921±0.016 | | 0.593±0.023 | 0.918±0.019 | | 0.622±0.027 | 0.919±0.016 | | 0.514±0.021 | 0.882±0.017 |
L50 | 0.869±0.015 | 0.869±0.015 | | 0.869±0.015 | 0.869±0.015 | | 0.869±0.015 | 0.869±0.015 | | 0.868±0.015 | 0.868±0.015 | | 0.803±0.023 | 0.803±0.023 |
L51 | 0.592±0.030 | 0.604±0.044 | | 0.277±0.043 | 0.296±0.034 | | 0.253±0.017 | 0.280±0.028 | | 0.246±0.022 | 0.281±0.031 | | 0.316±0.052 | 0.351±0.073 |
L52 | 0.537±0.014 | 0.541±0.018 | | 0.289±0.047 | 0.365±0.157 | | 0.228±0.018 | 0.257±0.049 | | 0.221±0.018 | 0.264±0.064 | | 0.234±0.026 | 0.260±0.029 |
L53 | 0.588±0.044 | 0.611±0.030 | | 0.568±0.092 | 0.552±0.065 | | 0.485±0.059 | 0.533±0.050 | | 0.499±0.064 | 0.529±0.054 | | 0.458±0.059 | 0.493±0.041 |
L54 | 0.688±0.013 | 0.900±0.014 | | 0.690±0.036 | 0.903±0.021 | | 0.593±0.023 | 0.893±0.019 | | 0.622±0.027 | 0.897±0.018 | | 0.514±0.021 | 0.849±0.022 |
L55 | 0.137±0.011 | 0.192±0.019 | | 0.109±0.014 | 0.154±0.017 | | 0.109±0.010 | 0.157±0.017 | | 0.103±0.009 | 0.149±0.015 | | 0.113±0.012 | 0.170±0.020 |
L56 | 0.930±0.005 | 0.981±0.002 | | 0.933±0.005 | 0.983±0.003 | | 0.932±0.007 | 0.982±0.003 | | 0.932±0.005 | 0.982±0.003 | | 0.909±0.007 | 0.976±0.003 |
L57 | 0.709±0.021 | 0.725±0.018 | | 0.708±0.022 | 0.725±0.018 | | 0.707±0.022 | 0.722±0.019 | | 0.707±0.022 | 0.722±0.019 | | 0.415±0.043 | 0.445±0.037 |
L58 | 0.612±0.021 | 0.644±0.014 | | 0.610±0.021 | 0.645±0.014 | | 0.608±0.021 | 0.638±0.014 | | 0.608±0.021 | 0.637±0.014 | | 0.218±0.042 | 0.279±0.028 |
L59 | 0.470±0.026 | 0.496±0.020 | | 0.492±0.032 | 0.515±0.024 | | 0.443±0.031 | 0.479±0.018 | | 0.453±0.030 | 0.486±0.018 | | 0.412±0.030 | 0.441±0.020 |
L60 | 0.928±0.005 | 0.936±0.007 | | 0.937±0.005 | 0.943±0.006 | | 0.928±0.006 | 0.936±0.007 | | 0.928±0.005 | 0.935±0.007 | | 0.928±0.006 | 0.935±0.007 |
The following observations can be made from Tab.2 and Tab.3:
1. LR is statistically much better than LR in term of each performance metric. In these 60 datasets, LR gets the much higher values of accuracy, precision, recall, F1 and kappa for 55, 53, 55, 55, and 55 datasets, respectively, while LR only get the best ones for 2, 2, 1, 1, and 1 datasets, respectively. Even for the best cases for LR, the classification performance of LR is very close to those of the LR. It is worth noting that LR can statistically and clearly improve the each index on most of the datasets. For example, LR achieves a larger improvement of 0.986−0.760=0.226, 0.986−0.763=0.223, 0.986−0.760=0.226, 0.986−0.759=0.227, 0.972−0.520=0.452 on the dataset L36 in term of the accuracy, precision, recall, F1 and kappa, respectively. Especially, based on the new representation obtained by the AssoRep on the L21 dataset, all performance metrics of LR increase from 0.947, 0.947, 0.946, 0.947, and 0.893 to 1 respectively.
2. Moreover, the AssoRep method tends to perform better on the original data representation with a lower performance. For example, when the representation ability of the dataset L9 is enhanced via the AssoRep, its accuracy markedly increases from the 0.282 to 0.351; while the AssoRep has not obtained a performance improvement on the datasets L10 with the accuracy of 0.970 and L48 with the accuracy of 0.979.
Furthermore, we apply the paired -test to assess whether the LR performs significantly better than the LR. Specifically, given two compared algorithms and , an evaluation metric . We run each algorithm times, algorithms gets evaluation metric values in terms of , algorithms gets evaluation metric values in terms of . The mean value and standard deviation value of are denoted as and , respectively, where . It follows a t distribution with numerator degrees of freedom, deified as
In this paper, its null hypothesis that algorithms and have the same performance is rejected if the returned -value is less than the specified significance level 5%. The results are recorded in Tab.2 and Tab.3, in which , , and denote that AssoRep is better/tied/worse than the corresponding methods by the paired -test with confidence level 5%, respectively.
As shown in Tab.2 and Tab.3, LR is significantly better than the LR on 40, 41, 45, 46, and 45 of 60 datasets, while no case that LR is significantly better than the LR happened at signification level . The results validate that enhancing representation with association among features is indeed effective on the datasets with the larger sample size.
4.2.2 Results on Group 2
The experiment aims to show the behavior of AssoRep on smaller sample size data. To this end, we use 60 datasets shown in Tab.4 where the detailed characteristics of each dataset including number of examples (), number of features (), and number of class labels () are displayed. As shown in Tab.4, the sample size varies from 10 to 690. The experimental settings are the same as that on Group 1. The experimental results are reported in Tab.5 and Tab.6.
Tab.4 Characteristics of the second group of datasets whose the numbers are smaller than 700 (Group 2) |
ID | Dataset | | | | | ID | Dataset | | | | | ID | Dataset | | | |
S1 | ac-inflam | 120 | 6 | 2 | | S2 | acute-nephritis | 120 | 6 | 2 | | S3 | arrhythmia | 452 | 262 | 13 |
S4 | audiology-std | 226 | 59 | 18 | S5 | balance-scale | 625 | 4 | 3 | S6 | balloons | 16 | 4 | 2 |
S7 | breast-cancer | 286 | 9 | 2 | S8 | conn-bench-sonar | 208 | 60 | 2 | S9 | conn-bench-vowel | 528 | 11 | 11 |
S10 | credit-approval | 690 | 15 | 2 | S11 | cylinder-bands | 512 | 35 | 2 | S12 | dermatology | 366 | 34 | 6 |
S13 | echocardiogram | 131 | 10 | 2 | S14 | ecoli | 336 | 7 | 8 | S15 | fertility | 100 | 9 | 2 |
S16 | flag | 194 | 28 | 8 | S17 | glass | 214 | 9 | 6 | S18 | haberman-survival | 306 | 3 | 2 |
S19 | hayes-roth | 132 | 3 | 3 | S20 | heart-cleveland | 303 | 13 | 5 | S21 | heart-hungarian | 294 | 12 | 2 |
S22 | heart-switzerland | 123 | 12 | 2 | S23 | heart-va | 200 | 12 | 5 | S24 | hepatitis | 155 | 19 | 2 |
S25 | hill-valley | 606 | 100 | 2 | S26 | horse-colic | 300 | 25 | 2 | S27 | ilpd-indian-liver | 583 | 9 | 2 |
S28 | image-segmentation | 210 | 19 | 7 | S29 | ionosphere | 351 | 33 | 2 | S30 | iris | 150 | 4 | 3 |
S31 | lenses | 24 | 4 | 3 | S32 | low-res-spect | 531 | 100 | 9 | S33 | lung-cancer | 32 | 56 | 3 |
S34 | lymphography | 148 | 18 | 4 | S35 | molec-biol-promoter | 106 | 57 | 2 | S36 | monks-1 | 124 | 6 | 2 |
S37 | monks-2 | 169 | 6 | 2 | S38 | musk-1 | 476 | 166 | 2 | S39 | parkinsons | 195 | 22 | 2 |
S40 | pb-MATERIAL | 106 | 4 | 3 | S41 | pb-REL-L | 103 | 4 | 3 | S42 | pb-SPAN | 92 | 4 | 3 |
S43 | pb-T-OR-D | 102 | 4 | 3 | S44 | pb-TYPE | 105 | 4 | 3 | S45 | planning | 182 | 12 | 2 |
S46 | post-operative | 90 | 8 | 3 | S47 | primary-tumor | 330 | 17 | 15 | S48 | seeds | 210 | 7 | 3 |
S49 | soybean | 307 | 35 | 18 | S50 | spect | 80 | 22 | 2 | S51 | spectf | 80 | 44 | 2 |
S52 | st-australian-credit | 690 | 14 | 2 | S53 | st-heart | 270 | 13 | 2 | S54 | synthetic-control | 600 | 60 | 6 |
S55 | teaching | 151 | 5 | 3 | S56 | trains | 10 | 28 | 2 | S57 | vc-2classes | 310 | 6 | 2 |
S58 | vc-3classes | 310 | 6 | 3 | S59 | wine | 179 | 13 | 3 | S60 | zoo | 101 | 16 | 7 |
Tab.5 Classification performance comparison between LR and LR on benchmark datasets S1-S40 |
Data | Accuracy | | Precision | | Recall | | F1 | | Kappa |
LR | LR | | LR | LR | | LR | LR | | LR | LR | | LR | LR |
S1 | 1.000±0.000 | 1.000±0.000 | | 1.000±0.000 | 1.000±0.000 | | 1.000±0.000 | 1.000±0.000 | | 1.000±0.000 | 1.000±0.000 | | 1.000±0.000 | 1.000±0.000 |
S2 | 1.000±0.000 | 1.000±0.000 | | 1.000±0.000 | 1.000±0.000 | | 1.000±0.000 | 1.000±0.000 | | 1.000±0.000 | 1.000±0.000 | | 1.000±0.000 | 1.000±0.000 |
S3 | 0.661±0.048 | 0.695±0.037 | | 0.427±0.084 | 0.452±0.071 | | 0.384±0.083 | 0.415±0.058 | | 0.389±0.080 | 0.414±0.054 | | 0.457±0.082 | 0.494±0.066 |
S4 | 0.696±0.032 | 0.789±0.029 | | 0.493±0.047 | 0.542±0.061 | | 0.490±0.067 | 0.582±0.082 | | 0.473±0.045 | 0.549±0.067 | | 0.648±0.038 | 0.750±0.035 |
S5 | 0.862±0.028 | 0.922±0.005 | | 0.580±0.016 | 0.615±0.003 | | 0.624±0.021 | 0.667±0.000 | | 0.599±0.019 | 0.640±0.002 | | 0.745±0.052 | 0.855±0.008 |
S6 | 0.619±0.048 | 0.730±0.159 | | 0.480±0.195 | 0.601±0.315 | | 0.588±0.088 | 0.688±0.188 | | 0.515±0.152 | 0.623±0.260 | | 0.171±0.171 | 0.385±0.385 |
S7 | 0.711±0.036 | 0.727±0.031 | | 0.630±0.090 | 0.671±0.057 | | 0.575±0.049 | 0.613±0.034 | | 0.569±0.067 | 0.619±0.039 | | 0.175±0.116 | 0.257±0.076 |
S8 | 0.773±0.027 | 0.788±0.023 | | 0.775±0.027 | 0.792±0.025 | | 0.770±0.027 | 0.785±0.023 | | 0.770±0.027 | 0.786±0.023 | | 0.542±0.054 | 0.573±0.046 |
S9 | 0.557±0.046 | 0.801±0.019 | | 0.552±0.066 | 0.816±0.013 | | 0.556±0.050 | 0.801±0.019 | | 0.537±0.058 | 0.797±0.019 | | 0.512±0.051 | 0.781±0.021 |
S10 | 0.858±0.041 | 0.861±0.040 | | 0.861±0.038 | 0.864±0.037 | | 0.863±0.038 | 0.867±0.038 | | 0.857±0.041 | 0.861±0.039 | | 0.717±0.080 | 0.723±0.078 |
S11 | 0.733±0.056 | 0.748±0.053 | | 0.730±0.068 | 0.742±0.058 | | 0.702±0.061 | 0.727±0.057 | | 0.705±0.066 | 0.729±0.057 | | 0.418±0.125 | 0.462±0.113 |
S12 | 0.978±0.027 | 0.978±0.027 | | 0.981±0.022 | 0.981±0.022 | | 0.976±0.030 | 0.976±0.030 | | 0.975±0.030 | 0.975±0.030 | | 0.972±0.034 | 0.972±0.034 |
S13 | 0.812±0.063 | 0.818±0.052 | | 0.819±0.097 | 0.828±0.079 | | 0.749±0.069 | 0.755±0.058 | | 0.766±0.075 | 0.773±0.063 | | 0.540±0.149 | 0.5353±0.124 |
S14 | 0.868±0.015 | 0.872±0.016 | | 0.642±0.030 | 0.646±0.029 | | 0.637±0.019 | 0.643±0.018 | | 0.633±0.032 | 0.639±0.032 | | 0.817±0.021 | 0.822±0.023 |
S15 | 0.854±0.047 | 0.850±0.053 | | 0.438±0.012 | 0.438±0.013 | | 0.485±0.026 | 0.483±0.029 | | 0.460±0.014 | 0.459±0.016 | | −0.030±0.048 | -0.035±0.052 |
S16 | 0.487±0.061 | 0.530±0.085 | | 0.289±0.042 | 0.340±0.078 | | 0.310±0.051 | 0.363±0.076 | | 0.290±0.042 | 0.341±0.076 | | 0.351±0.078 | 0.412±0.104 |
S17 | 0.620±0.054 | 0.660±0.051 | | 0.484±0.093 | 0.532±0.084 | | 0.483±0.075 | 0.542±0.078 | | 0.472±0.076 | 0.525±0.076 | | 0.462±0.076 | 0.520±0.069 |
S18 | 0.737±0.016 | 0.751±0.033 | | 0.679±0.105 | 0.687±0.081 | | 0.548±0.029 | 0.597±0.032 | | 0.527±0.050 | 0.603±0.040 | | 0.120±0.065 | 0.233±0.078 |
S19 | 0.544±0.062 | 0.844±0.032 | | 0.554±0.061 | 0.881±0.027 | | 0.584±0.069 | 0.856±0.035 | | 0.546±0.060 | 0.860±0.032 | | 0.302±0.099 | 0.759±0.050 |
S20 | 0.583±0.028 | 0.589±0.030 | | 0.303±0.054 | 0.329±0.078 | | 0.310±0.042 | 0.318±0.040 | | 0.301±0.046 | 0.314±0.048 | | 0.306±0.049 | 0.311±0.054 |
S21 | 0.824±0.039 | 0.839±0.034 | | 0.813±0.041 | 0.829±0.035 | | 0.800±0.048 | 0.819±0.043 | | 0.804±0.045 | 0.822±0.040 | | 0.6100±0.090 | 0.645±0.079 |
S22 | 0.371±0.050 | 0.392±0.048 | | 0.231±0.021 | 0.232±0.032 | | 0.241±0.030 | 0.241±0.030 | | 0.232±0.026 | 0.226±0.031 | | 0.090±0.067 | 0.094±0.069 |
S23 | 0.326±0.058 | 0.336±0.071 | | 0.255±0.064 | 0.303±0.082 | | 0.272±0.059 | 0.303±0.067 | | 0.256±0.058 | 0.294±0.071 | | 0.111±0.078 | 0.127±0.093 |
S24 | 0.810±0.039 | 0.840±0.032 | | 0.711±0.066 | 0.765±0.053 | | 0.723±0.085 | 0.728±0.069 | | 0.713±0.073 | 0.736±0.067 | | 0.428±0.146 | 0.476±0.125 |
S25 | 0.660±0.032 | 0.700±0.030 | | 0.775±0.011 | 0.787±0.029 | | 0.656±0.032 | 0.696±0.030 | | 0.615±0.050 | 0.672±0.040 | | 0.314±0.065 | 0.395±0.061 |
S26 | 0.798±0.035 | 0.827±0.024 | | 0.786±0.044 | 0.822±0.028 | | 0.777±0.034 | 0.800±0.031 | | 0.780±0.036 | 0.807±0.029 | | 0.560±0.073 | 0.616±0.057 |
S27 | 0.716±0.011 | 0.725±0.010 | | 0.626±0.028 | 0.653±0.034 | | 0.563±0.018 | 0.558±0.015 | | 0.557±0.024 | 0.545±0.023 | | 0.154±0.040 | 0.147±0.037 |
S28 | 0.864±0.016 | 0.872±0.029 | | 0.872±0.022 | 0.875±0.030 | | 0.864±0.016 | 0.872±0.029 | | 0.860±0.018 | 0.870±0.030 | | 0.841±0.019 | 0.851±0.034 |
S29 | 0.880±0.046 | 0.920±0.042 | | 0.891±0.048 | 0.935±0.040 | | 0.851±0.055 | 0.894±0.052 | | 0.863±0.053 | 0.908±0.049 | | 0.729±0.104 | 0.818±0.096 |
S30 | 0.907±0.053 | 0.973±0.033 | | 0.924±0.041 | 0.978±0.027 | | 0.907±0.053 | 0.973±0.033 | | 0.904±0.058 | 0.973±0.033 | | 0.860±0.080 | 0.960±0.049 |
S31 | 0.764±0.057 | 0.792±0.080 | | 0.717±0.125 | 0.782±0.085 | | 0.716±0.116 | 0.774±0.107 | | 0.671±0.108 | 0.743±0.094 | | 0.574±0.103 | 0.637±0.131 |
S32 | 0.712±0.034 | 0.737±0.023 | | 0.735±0.039 | 0.775±0.021 | | 0.712±0.034 | 0.737±0.023 | | 0.701±0.037 | 0.731±0.025 | | 0.691±0.037 | 0.718±0.025 |
S33 | 0.434±0.121 | 0.488±0.081 | | 0.446±0.140 | 0.538±0.103 | | 0.455±0.115 | 0.496±0.082 | | 0.430±0.123 | 0.489±0.081 | | 0.157±0.172 | 0.219±0.122 |
S34 | 0.823±0.039 | 0.849±0.025 | | 0.677±0.079 | 0.676±0.012 | | 0.675±0.081 | 0.674±0.015 | | 0.672±0.078 | 0.673±0.014 | | 0.661±0.076 | 0.707±0.050 |
S35 | 0.781±0.042 | 0.834±0.034 | | 0.786±0.044 | 0.836±0.035 | | 0.781±0.043 | 0.834±0.033 | | 0.780±0.043 | 0.834±0.033 | | 0.562±0.085 | 0.668±0.067 |
S36 | 0.669±0.070 | 0.726±0.068 | | 0.674±0.074 | 0.742±0.070 | | 0.669±0.071 | 0.725±0.070 | | 0.666±0.071 | 0.718±0.080 | | 0.337±0.141 | 0.450±0.141 |
S37 | 0.550±0.072 | 0.561±0.096 | | 0.407±0.133 | 0.448±0.160 | | 0.459±0.066 | 0.481±0.100 | | 0.403±0.073 | 0.445±0.118 | | −0.089±0.145 | -0.043±0.219 |
S38 | 0.857±0.088 | 0.891±0.039 | | 0.858±0.052 | 0.891±0.039 | | 0.857±0.088 | 0.894±0.038 | | 0.855±0.055 | 0.890±0.039 | | 0.711±0.108 | 0.780±0.078 |
S39 | 0.852±0.055 | 0.882±0.056 | | 0.822±0.080 | 0.855±0.076 | | 0.780±0.061 | 0.830±0.066 | | 0.794±0.068 | 0.839±0.069 | | 0.591±0.136 | 0.678±0.137 |
S40 | 0.853±0.039 | 0.859±0.035 | | 0.556±0.097 | 0.542±0.055 | | 0.597±0.070 | 0.619±0.049 | | 0.566±0.070 | 0.573±0.049 | | 0.610±0.090 | 0.625±0.077 |
Tab.6 Classification performance comparison between LR and LR on benchmark datasets S41-S60 |
Data | Accuracy | | Precision | | Recall | | F1 | | Kappa |
LR | LR | | LR | LR | | LR | LR | | LR | LR | | LR | LR |
S41 | 0.652±0.075 | 0.675±0.081 | | 0.485±0.089 | 0.475±0.066 | | 0.508±0.071 | 0.516±0.068 | | 0.487±0.077 | 0.486±0.067 | | 0.371±0.138 | 0.402±0.151 |
S42 | 0.693±0.063 | 0.713±0.050 | | 0.730±0.091 | 0.725±0.073 | | 0.645±0.044 | 0.653 ±0.068 | | 0.652±0.053 | 0.664±0.067 | | 0.481±0.086 | 0.506±0.094 |
S43 | 0.868±0.037 | 0.882±0.052 | | 0.683±0.202 | 0.759±0.170 | | 0.615±0.100 | 0.706±0.140 | | 0.625±0.124 | 0.712±0.137 | | 0.270±0.230 | 0.431±0.269 |
S44 | 0.590±0.037 | 0.644±0.035 | | 0.414±0.075 | 0.579±0.055 | | 0.432±0.043 | 0.530±0.043 | | 0.404±0.046 | 0.524±0.039 | | 0.422±0.048 | 0.513±0.048 |
S45 | 0.709±0.021 | 0.715±0.015 | | 0.356±0.008 | 0.357±0.008 | | 0.496±0.012 | 0.500±0.000 | | 0.415±0.007 | 0.417±0.005 | | −0.010±0.031 | 0.000±0.000 |
S46 | 0.680±0.079 | 0.680±0.079 | | 0.334±0.078 | 0.334±0.078 | | 0.450±0.081 | 0.450±0.081 | | 0.382±0.077 | 0.382±0.077 | | −0.040±0.110 | −0.040±0.110 |
S47 | 0.503±0.100 | 0.509±0.041 | | 0.341±0.118 | 0.331±0.081 | | 0.370±0.117 | 0.383±0.081 | | 0.335±0.114 | 0.337±0.082 | | 0.428±0.118 | 0.439±0.076 |
S48 | 0.933±0.044 | 0.971±0.032 | | 0.939±0.042 | 0.976±0.026 | | 0.933±0.044 | 0.971±0.032 | | 0.932±0.044 | 0.971±0.032 | | 0.900±0.065 | 0.957±0.047 |
S49 | 0.892±0.065 | 0.892±0.056 | | 0.907±0.078 | 0.900±0.048 | | 0.913±0.064 | 0.906±0.048 | | 0.901±0.072 | 0.895±0.050 | | 0.881±0.072 | 0.881±0.061 |
S50 | 0.645±0.057 | 0.686±0.043 | | 0.581±0.073 | 0.636±0.064 | | 0.571±0.064 | 0.592±0.042 | | 0.570±0.070 | 0.589±0.052 | | 0.149±0.134 | 0.204±0.093 |
S51 | 0.752±0.055 | 0.785±0.064 | | 0.766±0.057 | 0.801±0.070 | | 0.751±0.056 | 0.784±0.065 | | 0.748±0.058 | 0.782±0.066 | | 0.503±0.111 | 0.569±0.129 |
S52 | 0.667±0.011 | 0.668±0.012 | | 0.458±0.085 | 0.562±0.044 | | 0.497±0.008 | 0.522±0.015 | | 0.418±0.012 | 0.484±0.025 | | −0.018±0.021 | 0.054±0.038 |
S53 | 0.839±0.054 | 0.850±0.051 | | 0.842±0.057 | 0.854±0.051 | | 0.836±0.055 | 0.845±0.052 | | 0.836±0.056 | 0.847±0.053 | | 0.673±0.111 | 0.694±0.104 |
S54 | 0.940±0.024 | 0.960±0.021 | | 0.946±0.022 | 0.965±0.019 | | 0.940±0.024 | 0.960±0.021 | | 0.938±0.025 | 0.960±0.021 | | 0.928±0.029 | 0.952±0.026 |
S55 | 0.506±0.048 | 0.511±0.025 | | 0.508±0.049 | 0.518±0.027 | | 0.509±0.049 | 0.513±0.025 | | 0.497±0.047 | 0.509±0.024 | | 0.261±0.072 | 0.268±0.038 |
S56 | 0.774±0.180 | 0.900±0.134 | | 0.775±0.210 | 0.933±0.088 | | 0.742±0.188 | 0.900±0.128 | | 0.716±0.204 | 0.891±0.144 | | 0.470±0.383 | 0.799±0.258 |
S57 | 0.842±0.042 | 0.848±0.058 | | 0.824±0.049 | 0.829±0.064 | | 0.823±0.058 | 0.828±0.078 | | 0.819±0.050 | 0.825±0.071 | | 0.639±0.101 | 0.651±0.141 |
S58 | 0.852±0.050 | 0.858±0.050 | | 0.815±0.071 | 0.826±0.071 | | 0.803±0.069 | 0.813±0.069 | | 0.804±0.070 | 0.814±0.070 | | 0.761±0.082 | 0.772±0.082 |
S59 | 0.983±0.026 | 0.994±0.017 | | 0.983±0.026 | 0.994±0.017 | | 0.986±0.021 | 0.995±0.014 | | 0.983±0.025 | 0.994±0.017 | | 0.974±0.039 | 0.992±0.025 |
S60 | 0.951±0.022 | 0.954±0.015 | | 0.940±0.034 | 0.916±0.052 | | 0.891±0.041 | 0.901±0.038 | | 0.896±0.045 | 0.892±0.044 | | 0.935±0.029 | 0.940±0.020 |
From Tab.5 and Tab.6, we have the following observations:
1. For each performance metric, LR tends to obtain better performance, which is consists with the results on Group 1. For example, the accuracy, precision, recall, F1 and kappa increase from 0.557, 0.552, 0.556, 0.537, and 0.512 to 0.801, 0.816, 0.801, 0.797, and 0.781, achieving the performance improvement of 24.4%, 26.4%, 24.5%, 26.0%, and 26.9%, respectively on the dataset S9. Overall, LR wins 258 times, ties 21 times, losses 21 times in the 300 experimental configurations (5 metrics 60 datasets).
2. LR tends to have much smaller standard deviations than LR, which suggests that the LR is much better robustness for small-scale data classification.
3. The AssoRep method often performs better on the original data representation with a lower performance. For example, when the representation ability of the dataset S19 is enhanced via the AssoRep, its accuracy is improved from the 0.544 to 0.844. This suggests that the association information between features is a good auxiliary information for representation learning.
Furthermore, we test whether the LR performs significantly better than the LR via the paired -test. As shown in Tab.5 and Tab.6, LR is significantly better than the LR on 13, 14, 12, 12, and 13 datasets at signification level . Compared to the results on Group 1, the times that LR is significantly better than the LR are obviously less. This is because that the association degree between some features may be unaccurately assessed via less samples. It is worth pointing out that no case that LR is significantly better than the LR happened at signification level . The results suggest that the proposed association-based representation is also effective on the datasets with the smaller sample size, especially, the classification algorithm coupled with AssoRep is much better robustness for small-scale data classification.
In summary, the proposed AssoRep algorithm has been demonstrated to be effective for different sample size datasets via Group 1 and Group 2. This indicates the AssoRep is robust for different sample size datasets, hence it can be safely applied in various tasks.
4.3 Experimental results on different classifiers
In this section, we evaluate the performance of AssoRep by combining it with five different classifiers including support vector machine (SVM) [
51], k-nearest neighbors (kNN) [
52], random forest (RF) [
53], perceptron [
54], gaussian naive bayes (GaussianNB), i.e.,
. The experimental results are reported in Tab.7 where
and
denote that classifier
learns from the original data representation
and AssoRep data representation
, respectively; For each metric of each dataset, the best result of
and
on same algorithm and all algorithms are marked with bold font and underline, respectively.
Tab.7 Classification performance comparison between original and association-based enhancement representation using different classifiers |
Data | Accuracy | | Precision | | Recall | | F1 | | Kappa |
SVM | SVM | | SVM | SVM | | SVM | SVM | | SVM | SVM | | SVM | SVM |
Iris | 0.967±0.054 | 0.980±0.031 | | 0.972±0.047 | 0.983±0.025 | | 0.967±0.054 | 0.980±0.031 | | 0.966±0.055 | 0.980±0.031 | | 0.950±0.081 | 0.970±0.046 |
oocMer4D | 0.787±0.033 | 0.832±0.028 | | 0.787±0.062 | 0.821±0.036 | | 0.718±0.032 | 0.803±0.029 | | 0.734±0.036 | 0.806±0.029 | | 0.476±0.073 | 0.615±0.057 |
Contrac | 0.519±0.030 | 0.557±0.024 | | 0.505±0.036 | 0.545±0.030 | | 0.494±0.036 | 0.530±0.026 | | 0.494±0.037 | 0.531±0.028 | | 0.249±0.049 | 0.309±0.037 |
Abalone | 0.642±0.025 | 0.654±0.023 | | 0.644±0.029 | 0.653±0.030 | | 0.640±0.025 | 0.651±0.023 | | 0.635±0.026 | 0.647±0.026 | | 0.464±0.038 | 0.481±0.034 |
Magic | 0.792±0.005 | 0.852±0.005 | | 0.781±0.005 | 0.851±0.006 | | 0.748±0.007 | 0.818±0.006 | | 0.759±0.006 | 0.830±0.005 | | 0.520±0.012 | 0.662±0.011 |
Mean values | 0.741 | 0.775 (3.4%) | | 0.738 | 0.771 (3.3%) | | 0.713 | 0.756 (4.3%) | | 0.718 | 0.759 (4.1%) | | 0.532 | 0.607 (7.5%) |
Data | Accuracy | | Precision | | Recall | | F1 | | Kappa |
kNN | kNN | | kNN | kNN | | kNN | kNN | | kNN | kNN | | kNN | kNN |
Iris | 0.953±0.052 | 0.960±0.044 | | 0.960±0.045 | 0.964±0.042 | | 0.953±0.052 | 0.960±0.044 | | 0.953±0.053 | 0.960±0.044 | | 0.930±0.078 | 0.940±0.066 |
oocMer4D | 0.739±0.055 | 0.793±0.038 | | 0.734±0.036 | 0.806±0.029 | | 0.773±0.050 | 0.728±0.048 | | 0.698±0.058 | 0.768±0.046 | | 0.399±0.114 | 0.537±0.093 |
Contrac | 0.489±0.024 | 0.501±0.023 | | 0.470±0.028 | 0.485±0.025 | | 0.467±0.026 | 0.485±0.025 | | 0.465±0.026 | 0.482±0.024 | | 0.203±0.035 | 0.227±0.035 |
Abalone | 0.601±0.027 | 0.616±0.023 | | 0.598±0.030 | 0.616±0.031 | | 0.599±0.027 | 0.615±0.024 | | 0.595±0.030 | 0.611±0.027 | | 0.402±0.040 | 0.425±0.035 |
Magic | 0.840±0.008 | 0.851±0.008 | | 0.846±0.011 | 0.860±0.008 | | 0.798±0.009 | 0.810±0.010 | | 0.814±0.009 | 0.827±0.010 | | 0.630±0.018 | 0.656±0.019 |
Mean values | 0.724 | 0.744 (2.0%) | | 0.722 | 0.746 (2.4%) | | 0.718 | 0.720 (0.2%) | | 0.705 | 0.730 (2.5%) | | 0.513 | 0.557 (4.4%) |
Data | Accuracy | | Precision | | Recall | | F1 | | Kappa |
RF | RF | | RF | RF | | RF | RF | | RF | RF | | RF | RF |
Iris | 0.947±0.058 | 0.953±0.052 | | 0.953±0.056 | 0.964±0.038 | | 0.947±0.058 | 0.953±0.052 | | 0.946±0.059 | 0.953±0.052 | | 0.920±0.087 | 0.930±0.078 |
oocMer4D | 0.761±0.034 | 0.787±0.032 | | 0.730±0.039 | 0.764±0.040 | | 0.728±0.048 | 0.747±0.037 | | 0.728±0.043 | 0.753±0.036 | | 0.456±0.086 | 0.507±0.073 |
Contrac | 0.511±0.016 | 0.517±0.036 | | 0.489±0.022 | 0.500±0.036 | | 0.481±0.020 | 0.491±0.034 | | 0.480±0.021 | 0.491±0.035 | | 0.233±0.026 | 0.243±0.058 |
Abalone | 0.604±0.027 | 0.624±0.028 | | 0.603±0.032 | 0.625±0.032 | | 0.602±0.028 | 0.622±0.028 | | 0.600±0.030 | 0.619±0.029 | | 0.406±0.041 | 0.436±0.042 |
Magic | 0.870±0.005 | 0.860±0.007 | | 0.871±0.007 | 0.860±0.010 | | 0.840±0.007 | 0.828±0.008 | | 0.852±0.006 | 0.840±0.008 | | 0.705±0.013 | 0.681±0.016 |
Mean values | 0.739 | 0.748 (0.9%) | | 0.729 | 0.743 (1.4%) | | 0.720 | 0.728 (0.8%) | | 0.721 | 0.731 (1.0%) | | 0.544 | 0.559 (1.5%) |
Data | Accuracy | | Precision | | Recall | | F1 | | Kappa |
Percept | Percept | | Percept | Percept | | Percept | Percept | | Percept | Percept | | Percept | Percept |
Iris | 0.873±0.081 | 0.973±0.033 | | 0.910±0.059 | 0.978±0.027 | | 0.873±0.081 | 0.973±0.033 | | 0.865±0.089 | 0.973±0.033 | | 0.810±0.122 | 0.960±0.049 |
oocMer4D | 0.751±0.050 | 0.784±0.042 | | 0.736±0.082 | 0.767±0.045 | | 0.694±0.044 | 0.759±0.041 | | 0.703±0.048 | 0.755±0.041 | | 0.411±0.100 | 0.515±0.081 |
Contrac | 0.452±0.034 | 0.517±0.040 | | 0.434±0.051 | 0.502±0.051 | | 0.424±0.039 | 0.487±0.039 | | 0.407±0.047 | 0.483±0.044 | | 0.142±0.056 | 0.244±0.063 |
Abalone | 0.604±0.054 | 0.594±0.042 | | 0.589±0.078 | 0.596±0.050 | | 0.598±0.056 | 0.592±0.040 | | 0.574±0.072 | 0.564±0.050 | | 0.404±0.083 | 0.392±0.062 |
Magic | 0.745±0.023 | 0.776±0.019 | | 0.735±0.026 | 0.757±0.022 | | 0.700±0.022 | 0.746±0.017 | | 0.704±0.022 | 0.749±0.018 | | 0.418±0.038 | 0.500±0.036 |
Mean values | 0.685 | 0.729 (4.4%) | | 0.681 | 0.720 (3.9%) | | 0.658 | 0.711 (5.3%) | | 0.651 | 0.705 (5.4%) | | 0.437 | 0.522 (8.5%) |
Data | Accuracy | | Precision | | Recall | | F1 | | Kappa |
GNB | GNB | | GNB | GNB | | GNB | GNB | | GNB | GNB | | GNB | GNB |
Iris | 0.953±0.043 | 0.940±0.036 | | 0.963±0.033 | 0.952±0.027 | | 0.953±0.043 | 0.940±0.036 | | 0.952±0.044 | 0.939±0.037 | | 0.930±0.064 | 0.910±0.054 |
oocMer4D | 0.593±0.052 | 0.675±0.080 | | 0.599±0.040 | 0.680±0.060 | | 0.610±0.045 | 0.696±0.070 | | 0.580±0.049 | 0.663±0.076 | | 0.193±0.083 | 0.353±0.133 |
Contrac | 0.466±0.036 | 0.539±0.023 | | 0.486±0.030 | 0.535±0.019 | | 0.490±0.037 | 0.535±0.024 | | 0.463±0.035 | 0.529±0.021 | | 0.214±0.048 | 0.299±0.036 |
Abalone | 0.572±0.062 | 0.603±0.033 | | 0.566±0.068 | 0.626±0.034 | | 0.568±0.060 | 0.604±0.031 | | 0.558±0.063 | 0.601±0.034 | | 0.357±0.092 | 0.407±0.048 |
Magic | 0.727±0.006 | 0.763±0.009 | | 0.721±0.010 | 0.750±0.011 | | 0.647±0.007 | 0.709±0.012 | | 0.653±0.008 | 0.719±0.012 | | 0.329±0.014 | 0.445±0.023 |
Mean values | 0.662 | 0.704 (4.2%) | | 0.667 | 0.709 (4.2%) | | 0.654 | 0.697 (4.3%) | | 0.641 | 0.690 (4.9%) | | 0.405 | 0.483 (7.8%) |
Based on Tab.7, the following conclusions can be made. (1) For each kind of classifier , the mean value of surpasses that of its opponent on all evaluation metrics. Especially, for the mean values of kappa metric that is a more proper metric to value the ability of a classifier for dealing with complex datasets like imbalance, SVM, Perceptron and GaussianNB achieve 7.56%, 8.52%, 7.82% improvement than those of SVM, Perceptron and GaussianNB, respectively. (2) wins 109 out of 125 experimental configurations (5 datastes 5 methods 5 metrics). (3) achieves the best or comparable result on each data set.
In summary, the above results imply that association among features is indeed able to improve the discrimination ability of the original data.
4.4 Classification performance comparison with other feature enhancement methods
In this section, we compare AssoRep with six feature enhancement methods: AF [
9], AF
, CRAM
(discrete version CRAM) [
10], CRAM
(continuous version CRAM) [
10], FS
[
34], and FS
[
34]. Specifically, we first obtain enhanced features using above feature enhancement methods, and then compare their classification performance by passing them into the same classifier (here the logistic regression is used).
Benchmark denotes that the features are not enhanced using any methods. AF is the original association data reconstruction proposed in [
9], and uses pDor as association measure method. AF
is enhanced versions of the AF by concatenating the result and the original features
like CRAM
and CRAM
. CRAM
and CRAM
enhance the representation ability of data with some extra information including the recounting statistics on the class membership of neighboring as well as distance information between examples and their
nearest neighbors. The hype-parameter
in CRAM
and CRAM
takes 8 that is recommended by the paper [
10]. FS
and FS
are two feature enhancement methods based on feature selection strategy. FS
selects importance features according to mutual information each feature vector and label vector, and the number of selected features is take from
where
is the number of features of the original data
. While FS
achieves the purpose using logistic regression algorithm, the selection strategy adopts the default settings in sklearn library. The experimental results are reported in Tab.8, in which the best result on each data set is marked with bold font.
Tab.8 Accuracy comparison between AssoRep with other feature enhancement methods |
Data | Benchmark | AF | AF | CRAM | CRAM | FS | FS | AssoRep |
Iris | 0.907±0.053 | 0.927±0.055 | 0.953±0.043 | 0.953±0.043 | 0.953±0.043 | 0.947±0.050 | 0.940±0.055 | 0.973±0.033 |
oocMer4D | 0.796±0.036 | 0.751±0.023 | 0.811±0.035 | 0.820±0.028 | 0.822±0.028 | 0.797±0.028 | 0.800±0.035 | 0.837±0.020 |
Contrac | 0.507±0.042 | 0.568±0.052 | 0.566±0.058 | 0.519±0.035 | 0.517±0.035 | 0.507±0.030 | 0.519±0.041 | 0.568±0.055 |
Abalone | 0.647±0.020 | 0.640±0.023 | 0.659±0.022 | 0.650±0.015 | 0.651±0.017 | 0.647±0.019 | 0.635±0.012 | 0.662±0.021 |
Magic | 0.791±0.006 | 0.837±0.007 | 0.844±0.008 | 0.845±0.008 | 0.844±0.008 | 0.791±0.007 | 0.787±0.008 | 0.850±0.008 |
Annealing | 0.873±0.027 | 0.893±0.017 | 0.910±0.024 | 0.910±0.024 | 0.911±0.021 | 0.880±0.024 | 0.863±0.017 | 0.951±0.014 |
ctg-10classes | 0.768±0.032 | 0.802±0.030 | 0.800±0.026 | 0.817±0.023 | 0.813±0.023 | 0.771±0.027 | 0.751±0.030 | 0.834±0.027 |
oocTris2F | 0.797±0.030 | 0.815±0.036 | 0.815±0.031 | 0.828±0.043 | 0.829±0.040 | 0.795±0.031 | 0.785±0.031 | 0.836±0.030 |
Mean values | 0.7608 (5.31%) | 0.7791 (3.48%) | 0.7948 (1.91%) | 0.7927 (2.012%) | 0.7925 (2.14%) | 0.7669 (4.70%) | 0.7600 (5.39%) | 0.8139 |
Avg. rank | 6.813 | 5.250 | 3.563 | 3.125 | 3.063 | 6.188 | 6.938 | 1.063 |
It is easy to see from Tab.8 that 1) All feature enhancement methods except FS achieve the higher accuracy than the benchmark method, which highlights that the importance of feature enhancement strategy. 2) The AssoRep algorithm gets the highest accuracy values on all datasets. 3) The AssoRep algorithm achieves the improvement of 3.48% than the AF algorithm which indicates that the quality of association matrix plays an important role. 4) The mean accuracy of the AssoRep is higher 2.14% than the CRAM algorithm that rank the first in seven baseline methods. It is noteworthy that the CRAM uses the discriminative information from output space (label information) while the proposed AssoRep only uses information from the input (feature) space. Moreover, the new representation of CRAM and CRAM contains the original representation, which is helpful for performance improvement. This can be found the result that the performance of AF is higher than . 5) Compared to the FS and FS, AF, AF, CRAM, and CRAM get the better accuracy. This suggests that enhancing the feature by mining some new information from the original data may be more effective than only remove some weaker features. These interesting results indicate that the association-based representation learning is worth further studying.
To further assess the signification differences of the eight algorithms in term of the classification accuracy, we employ the Friedman test [
55] that a favorable choice for comparisons of multiple algorithms over many datasets. It follows a Fisher distribution with
numerator degrees of freedom and
denominator degree of freedom, and is defined as:
where and denote the number of the compared algorithms and datasets, respectively. is the average rank of algorithm among all the datasets. The smaller the average rank value is, the better the corresponding algorithm is. Its null hypothesis is rejected if the returned is higher than the specified the critical value.
As shown in Tab.9, the 20.610 is higher than the critical value 2.203 at signification level , the null hypothesis that the accuracy of all algorithms is equivalent in this paper is clearly rejected. This indicates that the classification performance of eight algorithms is significantly different. Hence, we need to further study relative performance among the comparing algorithms. To this end, the Nemenyi post hoc test that compares classifiers in a pairwise manner is adoped. In Nemenyi test, the performance of two algorithms is considered significantly different if the distance of the average ranks exceeds the following critical distance
Tab.9 Summary of the Friedman statistics |
Evaluation metric | | Critical value |
Accuracy | 20.610 | 2.203 |
where when .
The CD diagram is often used to illustrate the rank relation among the comparing algorithms. In CD diagrams, the average rank of each algorithm is marked along the axis (the smaller the better). As shown in Fig.1, AssoRep ranks the first. It is significantly better than the AF, FS and FS, while CRAM has not a significant difference from those. This further validates the advantage of the proposed AssoRep.
Fig.1 Comparison between A and B (control algorithms, A and B denote the AssoRep and the baseline algorithm CRAM with the best performance, and they are remarked with red star and blue star, respectively) against other comparing algorithms with the Nemenyi test. Algorithms are not connected with A (red line) and B (blue line) in the CD diagram are considered to have significantly different performance from the control algorithm (significance level ) |
Full size|PPT slide
4.5 Efficiency analysis
This experiment aims to investigate the efficiency of the AssoRep algorithm via replacing the dCor with Pearson’s correlation coefficient (pCor), normalized mutual information (NMI), the maximal information coefficient (MIC) [
18] and its improved version MIC
[
56]. The NMI, MIC and MIC
have the very highly computational complexity, which brings some challenge for the comparison experiment. In this paper, we use minepy Python library that provides an efficient achievement of the MIC and MIC
, while the sklearn toolkit is used to NMI. It is worth pointing out that the maximal neighborhood coefficient (MNC) [
19] is not use due to its higher the computation complexity. The results are shown in Tab.10, in where the computation time is provided on AWArgsift-hist when
takes 1, while the computation time is recorded on other datasets when
takes 10.
Tab.10 Computation time (s) of the different association mining methods |
Data | pCor | dCor | NMI | MIC | MIC |
Iris | 0.16 | 0.05 | 1.01 | 0.30 | 0.29 |
oocMer4D | 0.36 | 5.90 | 75.37 | 71.23 | 70.37 |
Contrac | 0.22 | 0.55 | 3.84 | 10.05 | 7.16 |
Abalone | 0.26 | 1.24 | 3.71 | 10.82 | 11.08 |
Magic | 1.02 | 7.50 | 7.07 | 58.84 | 59.11 |
AWArgsift-hist | 178.90 | 618.08 | 3124.94 | 9956.50 | 15541.76 |
According to Tab.10, we can observe that 1) the pCor costs the least time, but its classification accuracy is lower than dCor shown in Tab.10; 2) Compared to the NMI, MIC and MIC that are extremely time-consuming, the computation time of the dCor is accepted. For example, for the dataset AWArgsift-hist, dCor costs about 618 seconds for calculating the association relationships of 1999000 paired features and training the logistic regression model. While NMI needs to about 3124 seconds, MIC costs about 4.32 hours, which are five times and 25 times of computation time that dCor costs, respectively. These results suggest that the taking the dCor as the association mining is an appropriate choice that is able to well balance effectiveness and efficiency.
5 Conclusion
We have proposed an association-based representation improvement method (AssoRep), which is able to well balance effectiveness and efficiency. Moreover, AssoRep has a better interpretability because the work mechanism of its each process is transparent than existing enhancing feature methods like multilayer perceptron, attention. The effectiveness of AssoRep has been validated by a lot of experimental results on classification tasks.
Although this work further prefects and riches the association data reconstruction domain, like AF [
9], AssoRep only provides the vector-like improved representation. As a result, it can not fit the models that take tensor-like data as input like convolutional neural networks. Hence, tensorizing association-based representation is worthwhile studying in the future. Moreover, AssoRep equally treats the relationship between the paired features, it is worthwhile to generalize the AssoRep with cause and effect among features. Like MIC and MIC
, dCor over estimates the strength of association between two features when the true relationship is very weak. Hence, it is urgent to study a solution to eliminate the bias of dCor.
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}