
Multi-stream join answering for mining significant cross-stream correlations
Robert GWADERA
Multi-stream join answering for mining significant cross-stream correlations
Sliding-window multi-stream join (SWMJ) is a fundamental operation for correlating information from different streams. We provide a solution to the problem of assessing significance of the SWMJ result by focusing on the relative frequency of windows satisfying a given equijoin predicate as the most important parameter of the SWMJ result. In particular, we derive a formula for computing the expected relative frequency of windows satisfying a given equijoin predicate that can be evaluated in quadratic time in the window size given a proposed probabilistic model of the multi-stream. In experiments conducted on a daily rainfall data set we demonstrate the remarkable accuracy of our method, which confirms our theoretical analysis.
probabilistic data streams / stream summarization / stream sketch / window aggregate queries
[1] |
Xie J, Yang J. A survey of join processing in data streams. In: Aggarwal C C, eds. Data Streams–Models and Algorithms. Advances in Database Systems, Vol 31. Berlin: Springer, 2007, 209-306
|
[2] |
Golab L, Özsu M T. Processing sliding window multi-joins in continuous queries over data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases. 2003, 500-511
|
[3] |
Gu X, Wen Z, Lin C, Yu P. ViCo: an adaptive distributed video correlation system. In: Proceedings of the 14th ACM International Conference on Multimedia. 2006, 559-568
CrossRef
Google scholar
|
[4] |
Hammad M A, Aref W G, Elmagarmid A K. Query processing of multi-way stream window joins. VLDB Journal, 2008, 17(3): 469-488
CrossRef
Google scholar
|
[5] |
Gwadera R. MDL-based segmentation of multi-attribute sequences. In: Proceedings of 2011 IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services. 2011, 106-111
|
[6] |
Iwanuma K, Ishihara R, Takano Y, Nabeshima H. Extracting frequent subsequences from a single long data sequence: a novel anti-monotonic measure and a simple on-line algorithm. In: Proceedings of the 5th IEEE International Conference on Data Mining. 2005, 186-193
CrossRef
Google scholar
|
[7] |
Wilschut A N, Apers P M G. Dataflow query execution in a parallel main-memory. In: Proceedings of the 1st International Conference on Parallel and Distributed Information Systems. 1991, 68-77
|
[8] |
Oates T, Cohen P R. Searching for structure in multiple streams of data. In: Proceedings of the 13th International Conference on Machine Learning. 1996, 346-354
|
[9] |
Srivastava U, Widom J. Memory-limited execution of windowed stream joins. In: Proceedings of the 30th International Conference on Very Large Data Bases. 2004, 324-335
|
[10] |
Gwadera R, Gionis A, Mannila H. Optimal segmentation using tree models. Knowledge and Information Systems, 2008, 15(3): 259-283
CrossRef
Google scholar
|
[11] |
Gwadera R, Atallah M J, Szpankowski W. Markov models for identification of significant episodes. In: Proceedings of 2005 SIAM International Data Mining Conference. 2005, 404-414
|
[12] |
Gwadera R, Atallah M J, Szpankowski W. Reliable detection of episodes in event sequences. Knowledge and Information Systems, 2005, 7(4): 415-437
CrossRef
Google scholar
|
[13] |
Gwadera R, Crestani F. Discovering significant patterns in multistream sequences. In: Proceedings of the 8th IEEE International Conference on Data Mining. 2008, 827-832
CrossRef
Google scholar
|
[14] |
Atallah M J, Gwadera R, Szpankowski W. Detection of significant sets of episodes in event sequences. In: Proceedings of the 4th IEEE International Conference on Data Mining. 2004, 3-10
CrossRef
Google scholar
|
[15] |
Hughes J P, Guttorp P, Charles S P. A non-homogeneous hidden Markov model for precipitation occurrence. Journal of the Royal Statistical Society: Series C (Applied Statistics), 1999, 48(1): 15-30
CrossRef
Google scholar
|
/
〈 |
|
〉 |