Anensemble method for data stream classification in the presence of concept drift

Omid ABBASZADEH; Ali AMIRI; Ali Reza KHANTEYMOORI

doi:10.1631/FITEE.1400398

PDF(363 KB)

Front. Inform. Technol. Electron. Eng ›› 2015, Vol. 16 ›› Issue (12) : 1059-1068. DOI: 10.1631/FITEE.1400398

Anensemble method for data stream classification in the presence of concept drift

Author information +

History +

Abstract

One recent area of interest in computer science is data stream management and processing. By ‘data stream’, we refer to continuous and rapidly generated packages of data. Specific features of data streams are immense volume, high production rate, limited data processing time, and data concept drift; these features differentiate the data stream from standard types of data. An issue for the data stream is classification of input data. A novel ensemble classifier is proposed in this paper. The classifier uses base classifiers of two weighting functions under different data input conditions. In addition, a new method is used to determine drift, which emphasizes the precision of the algorithm. Another characteristic of the proposed method is removal of different numbers of the base classifiers based on their quality. Implementation of a weighting mechanism to the base classifiers at the decision-making stage is another advantage of the algorithm. This facilitates adaptability when drifts take place, which leads to classifiers with higher efficiency. Furthermore, the proposed method is tested on a set of standard data and the results confirm higher accuracy compared to available ensemble classifiers and single classifiers. In addition, in some cases the proposed classifier is faster and needs less storage space.

Keywords

Data stream / Classificaion / Ensemble classifiers / Concept drift

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Omid ABBASZADEH, Ali AMIRI, Ali Reza KHANTEYMOORI. Anensemble method for data stream classification in the presence of concept drift. Front. Inform. Technol. Electron. Eng, 2015, 16(12): 1059‒1068 https://doi.org/10.1631/FITEE.1400398

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Baena-García, M., del Campo-Ávila, J., Fidalgo, R., et al., 2006. Early drift detection method. ECML PKDD.

[2]	Bifet, A., 2009. Adaptive learning and mining for data streams and frequent patterns. ACM SIGKDD Explor. Newsl., 11(1):55–56. CrossRef Google scholar

[3]	Bifet, A., Holmes, G., Kirkby, R., et al., 2010. MOA: massive online analysis. J. Mach. Learn. Res., 11:1601–1604.

[4]	Brzezinski, D., Stefanowski, J., 2014. Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans. Neur. Netw. Learn. Syst., 25(1):81–94. CrossRef Google scholar

[5]	Gama, J., 2010. Knowledge Discovery from Data Streams. Chapman & Hall/CRC, London.

[6]	Gama, J., Medas, P., Castillo, G., et al., 2004. Learning with drift detection. Brazilian Symp. on Artificial Intelligence, p.286–295. CrossRef Google scholar

[7]	Hulten, G., Spencer, L., Domingos, P., 2001. Mining timechanging data streams. Proc. 7th ACM SIGKDD Int. Conf. on Knowledge Discovery Data Mining, p.97106. <DOI OutputMedium="All"/>[doi:10.1145/502512.502529]

[8]	Jiang, T., Feng, Y.C., Zhang, B., et al., 2009. Monitoring correlative financial data streams by local pattern similarity. J. Zhejiang Univ.-Sci. A, 10(7):937–951. CrossRef Google scholar

[9]	Kolter, J.Z., Maloof, M.A., 2007. Dynamic weighted majority: an ensemble method for drifting concepts. J. Mach. Learn. Res., 8:2755–2790.

[10]	Kuncheva, L.I., 2004. Combining Pattern Classifiers: Methods and Algorithms. John Wiley & Sons, Hoboken.

[11]	Minku, L.L., Yao, X., 2012. DDD: a new ensemble approach for dealing with concept drift. IEEE Trans. Knowl. Data Eng., 24(4):619–633. CrossRef Google scholar

[12]	Oza, N.C., 2005. Online bagging and boosting. IEEE Int. Conf. on System and Man Cybernetics, p.2340–2345. CrossRef Google scholar

[13]	Ruping, S., 2001. Incremental learning with support vector machines. IEEE 13th Int. Conf. on Data Mining, p.641–642. CrossRef Google scholar

[14]	Sim, J., Wright, C.C., 2005. The kappa statistic in reliability studies: use, interpretation, and sample size requirements. Phys. Ther., 85(3):257–268.

[15]	Street, W.N., Kim, Y.S., 2001. A streaming ensemble algorithm (SEA) for large-scale classification. Proc. 7th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.377–382. CrossRef Google scholar

[16]	Tsymbal, A., Pechenizkiy, M., Cunningham, P., et al., 2008. Dynamic integration of classifiers for handling concept drift. Inform. Fus., 9(1):56–68. CrossRef Google scholar

[17]	Wang, H., Fan, W., Yu, P.S., et al., 2003. Mining concept-drifting data streams using ensemble classifiers. Proc. 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.226–235. CrossRef Google scholar

[18]	Xu, W.H., Qin, Z., Chang, Y., 2011. Clustering feature decision trees for semi-supervised classification from high-speed data streams. J. Zhejiang Univ.-Sci. C (Comput. & Electron.), 12(8):615–628. CrossRef Google scholar

[19]	Zhang, P., Zhu, X., Shi, Y., 2008. Categorizing and mining concept drifting data streams. Proc. 14th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.812–820. CrossRef Google scholar

[20]	Zhang, P., Zhou, C., Wang, P., et al., 2015. E-tree: an efficient indexing structure for ensemble models on data streams. IEEE Trans. Knowl. Data Eng., 27(2):461–474. CrossRef Google scholar

[21]	Zhu, X., Zhang, P., Lin, X., et al., 2010. Active learning from stream data using optimal weight classifier ensemble. IEEE Trans. Syst. Man Cybern. B, 40(6):1607–1621. CrossRef Google scholar