Instance reduction for supervised learning using input-output clustering method

Anusorn Yodjaiphet , Nipon Theera-Umpon , Sansanee Auephanwiriyakul

Journal of Central South University ›› 2015, Vol. 22 ›› Issue (12) : 4740 -4748.

PDF
Journal of Central South University ›› 2015, Vol. 22 ›› Issue (12) : 4740 -4748. DOI: 10.1007/s11771-015-3026-4
Article

Instance reduction for supervised learning using input-output clustering method

Author information +
History +
PDF

Abstract

A method that applies clustering technique to reduce the number of samples of large data sets using input-output clustering is proposed. The proposed method clusters the output data into groups and clusters the input data in accordance with the groups of output data. Then, a set of prototypes are selected from the clustered input data. The inessential data can be ultimately discarded from the data set. The proposed method can reduce the effect from outliers because only the prototypes are used. This method is applied to reduce the data set in regression problems. Two standard synthetic data sets and three standard real-world data sets are used for evaluation. The root-mean-square errors are compared from support vector regression models trained with the original data sets and the corresponding instance-reduced data sets. From the experiments, the proposed method provides good results on the reduction and the reconstruction of the standard synthetic and real-world data sets. The numbers of instances of the synthetic data sets are decreased by 25%-69%. The reduction rates for the real-world data sets of the automobile miles per gallon and the 1990 census in CA are 46% and 57%, respectively. The reduction rate of 96% is very good for the electrocardiogram (ECG) data set because of the redundant and periodic nature of ECG signals. For all of the data sets, the regression results are similar to those from the corresponding original data sets. Therefore, the regression performance of the proposed method is good while only a fraction of the data is needed in the training process.

Keywords

instance reduction / input-output clustering / fuzzy c-means clustering / support vector regression / supervised learning

Cite this article

Download citation ▾
Anusorn Yodjaiphet, Nipon Theera-Umpon, Sansanee Auephanwiriyakul. Instance reduction for supervised learning using input-output clustering method. Journal of Central South University, 2015, 22(12): 4740-4748 DOI:10.1007/s11771-015-3026-4

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

SanguinettiG. Dimensionality reduction of clustered data sets [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(3): 535-540

[2]

RoweisS T, SaulL K. Nonlinear dimensionality reduction by locally linear embedding [J]. Science, 2000, 290: 2323-2326

[3]

KannanS S, RamarajN. An improved correlation-based algorithm with discretization for attribute reduction in data clustering [J]. Data Science Journal, 2009, 8: 125-138

[4]

SánchezJ S. High training set size reduction by space partitioning and prototype abstraction [J]. Pattern Recognition, 2004, 37(7): 1561-1564

[5]

OugiaroglouS, EvangelidisG. Efficient dataset size reduction by finding homogeneous clusters [C]. Proc of Balkan Conf in Informatics, 2012New York, USAACM168-173

[6]

ChenG, ChengY, XuJ. Cluster reduction support vector machine for large-scale data set classification [C]. Proc of Pacific-Asia Workshop on Computational Intell and Industrial App, 2008Piscataway, USAIEEE8-12

[7]

NikolaidisK R-, MartinezE, GoulermasJ Y, WuQ H. Spectral graph optimization for instance reduction [J]. IEEE Transactions on Neural Networks and Learning Systems, 2012, 23(7): 1169-1175

[8]

Olvera-LópezJ A C-, OchoaJ A M, NeztrinidadJ F. A new fast prototype selection method based on clustering [J]. Pattern Analysis and Applications, 2010, 13(2): 131-141

[9]

CanoJ R, HerreraF, LozanoM. Using evolutionary algorithms as instance selection for data reduction in KDD: An experimental study [J]. IEEE Transactions on Evolutionary Computation, 2003, 7(6): 561-575

[10]

AhmadS S S, PedryczW. Feature and instance selection via cooperative PSO [C]. Proc of IEEE Int Conf on Systems, Man, and Cybernetics, 2011Piscataway, USAIEEE2127-2132

[11]

GonzalezJ, RojasH, OrtegaJ, PrietoA. A new clustering technique for function approximation [J]. IEEE Transactions on Neural Networks, 2002, 13(1): 132-142

[12]

BožicM, StojanovicM, StajicZ, FloranovicN. Mutual information-based inputs selection for electric load time series forecasting [J]. Entropy, 2013, 15(3): 926-942

[13]

Rodríguez-FdezI, MucientesM, BugarínA. An instance selection algorithm for regression and its application in variance reduction [C]. Proc of IEEE Int Conf on Fuzzy Systems, 2013Piscataway, USAIEEE1-8

[14]

WangD, ZengX, KeanJ A. An output-constrained clustering approach for the identification of fuzzy systems and fuzzy granular systems [J]. IEEE Transactions on Fuzzy System, 2011, 19(6): 1127-1140

[15]

YodjaiphetA, Theera-UmponN, AuephanwiriyakulS. Electrocardiogram reconstruction using support vector regression [C]. Proc of IEEE Int Symp on Signal Proc and Info Tech., 2012Piscataway, USAIEEE269-273

[16]

LiD. TOPSIS-based nonlinear-programming methodology for multiattribute decision making with interval-valued intuitionistic fuzzy sets [J]. IEEE Transactions on Fuzzy Systems, 2010, 18(2): 299-311

[17]

BALASKO B, ABONYI J, FEIL B. Fuzzy clustering and data analysis toolbox for use with Matlab [EB/OL]. [2014-12-01]. http://www.abonyilab.com/software-and-data/fclusttoolbox.

[18]

HalkidiM, BatistakisY, VazirgiannisM. Clustering validity checking methods: Part II [J]. SIGMOD Record, 2002, 31(3): 19-27

[19]

DaviesD L, BouldinD W. A cluster separation measure [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1979, 1(2): 224-227

[20]

GunnSSupport vector machines for classification and regression [M], 1998Southampton, UKUniversity of Southampton1-52

[21]

WangD, ZengX, KeanJ A. An evolving-construction scheme for fuzzy systems [J]. IEEE Transactions on Fuzzy System, 2010, 18(4): 755-770

[22]

PedryczW. Linguistic model as a framework of user-centric system modeling [J]. IEEE Transaction on Man and Cybernetics Systems, Part A: Systems and Humans, 2006, 36(4): 727-745

[23]

MoodyG B, MarkR G, GoldbergerA L. PhysioNet: A research resource for studies of complex physiologic and biomedical signals [C]. Proc of Computers in Cardiology, 2000Piscataway, USAIEEE179-182

AI Summary AI Mindmap
PDF

204

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/