Fast correlation coefficient estimation algorithm for HBase-based massive time series data

Wen LIU, Tuqian ZHANG, Yanming SHEN, Peng WANG

PDF(725 KB)
PDF(725 KB)
Front. Comput. Sci. ›› 2019, Vol. 13 ›› Issue (4) : 864-878. DOI: 10.1007/s11704-018-6308-9
RESEARCH ARTICLE

Fast correlation coefficient estimation algorithm for HBase-based massive time series data

Author information +
History +

Abstract

In recent years, the rapid development of Internet of Things and sensor networks makes the time series data experiencing explosive growth. OpenTSDB and other emerging systems begin to use Hadoop, HBase to store massive time series data, and how to use these platforms to query and mine time series data has become a current research hotspot. As a typical time series distance measurementmethod, correlation coefficient is widely used in various applications. However, it requires a large amount of I/O and network transmission to compute the correlation coefficient of long time sequence on HBase in real time, and therefore cannot be applied to interactive query. To address this problem, in this paper, we present two methods to estimate the correlation coefficients of two sequences on HBase. We first propose a fast estimation algorithm for the upper and lower bounds of correlation coefficient, named as DCE. In order to further reduce the cost of I/O, we extend the DCE algorithm, and propose the ADCE algorithm, which can estimate the correlation coefficient quickly with an iterative manner. Experiments show that the algorithms proposed in this paper can quickly calculate the correlation coefficient of the long time series.

Keywords

time series / HBase / correlation coefficient / fast estimation

Cite this article

Download citation ▾
Wen LIU, Tuqian ZHANG, Yanming SHEN, Peng WANG. Fast correlation coefficient estimation algorithm for HBase-based massive time series data. Front. Comput. Sci., 2019, 13(4): 864‒878 https://doi.org/10.1007/s11704-018-6308-9

References

[1]
Mueen A, Nath S, Liu J. Fast approximate correlation for massive timeseries data. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 171–182
CrossRef Google scholar
[2]
Tao Y F, Papadias D, Faloutsos C. Approximate temporal aggregation. In: Proceedings of the 20th IEEE International Conference on Data Engineering. 2004, 190–201
CrossRef Google scholar
[3]
Tao Y F, Yi K, Sheng C, Pei J, Li F F. Logging every footstep: quantile summaries for the entire history. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 639–650
CrossRef Google scholar
[4]
Esling P, Agon C. Time-series data mining. ACM Computing Surveys, 2012, 45(1): 12
CrossRef Google scholar
[5]
Camerra A, Palpanas T, Shieh J, Keogh E. iSAX 2.0: indexing and mining one billion time series. In: Proceedings of the 10th IEEE International Conference on Data Mining. 2010, 58–67
CrossRef Google scholar
[6]
Yang J, Widom J. Incremental computation and maintenance of temporal aggregates. The VLDB Journal — The International Journal on Very Large Data Bases, 2003, 12(3): 262–283
[7]
Jin J, An N, Sivasubramaniam A. Analyzing range queries on spatial data. In: Proceedings of the 16th IEEE International Conference on Data Engineering. 2000, 525–534
CrossRef Google scholar
[8]
Mueen A, Hamooni H, Estrada T. Time series join on subsequence correlation. In: Proceedings of the 2014 IEEE International Conference on Data Mining. 2014, 450–459
CrossRef Google scholar
[9]
Li Y H, Hou U L, Yiu M L, Gong Z G. Discovering longest-lasting correlation in sequence databases. Proceedings of the VLDB Endowment, 2013, 6(14): 1666–1677
CrossRef Google scholar
[10]
Wang Y, Wang P, Pei J, Huang S. A data-adaptive and dynamic segmentation index for whole matching on time series. Proceedings of the VLDB Endowment, 2013, 6(10): 793–804
CrossRef Google scholar
[11]
Jeffrey J, Jeff M P, Li F F, Tang M W. Ranking large temporal data. Proceedings of the VLDB Endowment, 2012, 5(11): 1412–1423
CrossRef Google scholar
[12]
Luo W M, Tan H Y, Chen L, Lione l M. Finding time period-based most frequent path in big trajectory data. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 2013, 713–724
CrossRef Google scholar
[13]
Agrawal R, Faloutsos C, Swami A. Efficient similarity search in sequence databases. In: Proceedings of the International Conference on Foundations of Data Organization and Algorithms. 1993, 69–84
CrossRef Google scholar
[14]
Chan K P, Fu W C. Efficient time series matching by wavelets. In: Proceedings of the IEEE International Conference on Data Engineering. 1999, 126–133
[15]
Keogh E, Chakrabarti K, Pazzani M, Mehrotra S. Locally adaptive dimensionality reduction for indexing large time series databases. ACM Transactions on Database Systems, 2002, 27(2): 188–228
CrossRef Google scholar
[16]
Camerra A, Shieh J, Palpanas T, Rakthanmanon T, Keogh E. Beyond one billion time series: indexing and mining very large time series collections with iSAX2+. Knowledge & Information Systems, 2014, 39(1):123–151
CrossRef Google scholar
[17]
Faloutsos C, Ranganathan M, Manolopoulos Y. Fast subsequence matching in time-series databases. In: Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data. 1994, 419–429
CrossRef Google scholar
[18]
Soroush E, Balazinska M, Wang D. ArrayStore: a storage manager for complex parallel array processing. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. 2011, 253–264
CrossRef Google scholar
[19]
Das S, Sismanis Y, Beyer K S. Ricardo: integrating R and Hadoop. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 987–998
CrossRef Google scholar
[20]
Huang B, Babu S, Yang J. Cumulon: optimizing statistical data analysis in the cloud. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 2013, 1–12
CrossRef Google scholar

RIGHTS & PERMISSIONS

2018 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature
AI Summary AI Mindmap
PDF(725 KB)

Accesses

Citations

Detail

Sections
Recommended

/