Multi-stream join answering for mining significant cross-stream correlations

Robert GWADERA

Front. Comput. Sci. ›› 0

PDF(684 KB)
PDF(684 KB)
Front. Comput. Sci. ›› DOI: 10.1007/s11704-012-2862-8
RESEARCH ARTICLE

Multi-stream join answering for mining significant cross-stream correlations

Author information +
History +

Abstract

Sliding-window multi-stream join (SWMJ) is a fundamental operation for correlating information from different streams. We provide a solution to the problem of assessing significance of the SWMJ result by focusing on the relative frequency of windows satisfying a given equijoin predicate as the most important parameter of the SWMJ result. In particular, we derive a formula for computing the expected relative frequency of windows satisfying a given equijoin predicate that can be evaluated in quadratic time in the window size given a proposed probabilistic model of the multi-stream. In experiments conducted on a daily rainfall data set we demonstrate the remarkable accuracy of our method, which confirms our theoretical analysis.

Keywords

probabilistic data streams / stream summarization / stream sketch / window aggregate queries

Cite this article

Download citation ▾
Robert GWADERA. Multi-stream join answering for mining significant cross-stream correlations. Front Comput Sci, https://doi.org/10.1007/s11704-012-2862-8

References

[1]
Xie J, Yang J. A survey of join processing in data streams. In: Aggarwal C C, eds. Data Streams–Models and Algorithms. Advances in Database Systems, Vol 31. Berlin: Springer, 2007, 209-306
[2]
Golab L, Özsu M T. Processing sliding window multi-joins in continuous queries over data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases. 2003, 500-511
[3]
Gu X, Wen Z, Lin C, Yu P. ViCo: an adaptive distributed video correlation system. In: Proceedings of the 14th ACM International Conference on Multimedia. 2006, 559-568
CrossRef Google scholar
[4]
Hammad M A, Aref W G, Elmagarmid A K. Query processing of multi-way stream window joins. VLDB Journal, 2008, 17(3): 469-488
CrossRef Google scholar
[5]
Gwadera R. MDL-based segmentation of multi-attribute sequences. In: Proceedings of 2011 IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services. 2011, 106-111
[6]
Iwanuma K, Ishihara R, Takano Y, Nabeshima H. Extracting frequent subsequences from a single long data sequence: a novel anti-monotonic measure and a simple on-line algorithm. In: Proceedings of the 5th IEEE International Conference on Data Mining. 2005, 186-193
CrossRef Google scholar
[7]
Wilschut A N, Apers P M G. Dataflow query execution in a parallel main-memory. In: Proceedings of the 1st International Conference on Parallel and Distributed Information Systems. 1991, 68-77
[8]
Oates T, Cohen P R. Searching for structure in multiple streams of data. In: Proceedings of the 13th International Conference on Machine Learning. 1996, 346-354
[9]
Srivastava U, Widom J. Memory-limited execution of windowed stream joins. In: Proceedings of the 30th International Conference on Very Large Data Bases. 2004, 324-335
[10]
Gwadera R, Gionis A, Mannila H. Optimal segmentation using tree models. Knowledge and Information Systems, 2008, 15(3): 259-283
CrossRef Google scholar
[11]
Gwadera R, Atallah M J, Szpankowski W. Markov models for identification of significant episodes. In: Proceedings of 2005 SIAM International Data Mining Conference. 2005, 404-414
[12]
Gwadera R, Atallah M J, Szpankowski W. Reliable detection of episodes in event sequences. Knowledge and Information Systems, 2005, 7(4): 415-437
CrossRef Google scholar
[13]
Gwadera R, Crestani F. Discovering significant patterns in multistream sequences. In: Proceedings of the 8th IEEE International Conference on Data Mining. 2008, 827-832
CrossRef Google scholar
[14]
Atallah M J, Gwadera R, Szpankowski W. Detection of significant sets of episodes in event sequences. In: Proceedings of the 4th IEEE International Conference on Data Mining. 2004, 3-10
CrossRef Google scholar
[15]
Hughes J P, Guttorp P, Charles S P. A non-homogeneous hidden Markov model for precipitation occurrence. Journal of the Royal Statistical Society: Series C (Applied Statistics), 1999, 48(1): 15-30
CrossRef Google scholar

RIGHTS & PERMISSIONS

2014 Higher Education Press and Springer-Verlag Berlin Heidelberg
AI Summary AI Mindmap
PDF(684 KB)

Accesses

Citations

Detail

Sections
Recommended

/