Adaptive watermark generation mechanism based on time series prediction for stream processing

Yang SONG , Yunchun LI , Hailong YANG , Jun XU , Zerong LUAN , Wei LI

Front. Comput. Sci. ›› 2021, Vol. 15 ›› Issue (6) : 156213

PDF (2116KB)
Front. Comput. Sci. ›› 2021, Vol. 15 ›› Issue (6) : 156213 DOI: 10.1007/s11704-020-0206-7
RESEARCH ARTICLE

Adaptive watermark generation mechanism based on time series prediction for stream processing

Author information +
History +
PDF (2116KB)

Abstract

The data stream processing framework processes the stream data based on event-time to ensure that the request can be responded to in real-time. In reality, streaming data usually arrives out-of-order due to factors such as network delay. The data stream processing framework commonly adopts the watermark mechanism to address the data disorderedness. Watermark is a special kind of data inserted into the data stream with a timestamp, which helps the framework to decide whether the data received is late and thus be discarded. Traditional watermark generation strategies are periodic; they cannot dynamically adjust the watermark distribution to balance the responsiveness and accuracy. This paper proposes an adaptive watermark generation mechanism based on the time series prediction model to address the above limitation. This mechanism dynamically adjusts the frequency and timing of watermark distribution using the disordered data ratio and other lateness properties of the data stream to improve the system responsiveness while ensuring acceptable result accuracy. We implement the proposed mechanism on top of Flink and evaluate it with realworld datasets. The experiment results show that our mechanism is superior to the existing watermark distribution strategies in terms of both system responsiveness and result accuracy.

Keywords

data stream processing / watermark / time series based prediction / dynamic adjustment

Cite this article

Download citation ▾
Yang SONG, Yunchun LI, Hailong YANG, Jun XU, Zerong LUAN, Wei LI. Adaptive watermark generation mechanism based on time series prediction for stream processing. Front. Comput. Sci., 2021, 15(6): 156213 DOI:10.1007/s11704-020-0206-7

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Iqbal M H, Soomro T R. Big data analysis: apache storm perspective. International Journal of Computer Trends and Technology, 2015, 19(1): 9–14

[2]

Armbrust M, Das T, Torres J, Yavuz B, Zaharia M. Structured streaming: a declarative api for real-time applications in apache spark. In: Proceedings of the 2018 International Conference on Management of Data. 2018, 601–613

[3]

Carbone P, Katsifodimos A, Sweden S, Tzoumas K. Apache flink: stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2015, 36(4): 28–37

[4]

Akidau T, Schmidt E, Whittle S, Bradshaw RPerry F. The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proceedings of the VLDB Endowment, 2015, 8(12): 1792–1803

[5]

Akidau T, Balikov A, Bekiroˇglu K, Chernyak S, Haberman L. Mill-Wheel: fault-tolerant stream processing at internet scale. Proceedings of the VLDB Endowment, 2013, 6(11): 1033–1044

[6]

Awad A, Traub J, Sakr S. Adaptive watermarks: a concept drift-based approach for predicting event-time progress in data streams. In: Proceedings of the 22nd International Conference on Extending Database Technology. 2019

[7]

Barlow H B. Unsupervised learning. Neural Computation, 1989, 1(3): 295–311

[8]

Gers F A, Eck D, Schmidhuber J. Applying LSTM to time series predictable through time-window approaches. In: Proceedings of International Conference on Artificial Neural Networks. 2001, 669–676

[9]

Chen T, Guestrin C. Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. 2016, 785–794

[10]

Tucker P A, Maier D, Sheard T, Fegaras L. Exploiting punctuation semantics in continuous data streams. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(3): 555–568

[11]

Sun D, Hwang S. DSSP: stream split processing model for high correctness of out-of-order data processing. In: Proceedings of the 1st IEEE International Conference on Artificial Intelligence and Knowledge Engineering. 2018, 193–197

[12]

Mutschler C, Philippsen M. Distributed low-latency out-of-order event processing for high data rate sensor streams. In: Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing. 2013, 1133–1144

[13]

Babu S, Srivastava U, Widom J. Exploiting k-constraints to reduce memory overhead in continuous queries over data streams. ACM Transactions on Database Systems, 2004, 29(3): 545–580

[14]

Kuralenok I E, Marshalkin N, Trofimov A, Novikov B. An optimistic approach to handle out-of-order events within analytical stream processing. In: Proceedings of CEUR Workshop Proceedings. 2018, 22–29

[15]

Dries A, Röckert U. Adaptive concept drift detection. Statistical Analysis and DataMining: The ASA Data Science Journal, 2009, 2(5–6): 311–327

[16]

Bifet A, Gavalda R. Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM International Conference on Data Mining. 2007, 443–448

[17]

Thein K M M. Apache kafka: next generation distributed messaging system. International Journal of Scientific Engineering and Technology Research, 2014, 3(47): 9478–9483

[18]

Das S. Time Series Analysis. Princeton University Press, Princeton, NJ, 1994

[19]

Gal Y, Ghahramani Z. A theoretically grounded application of dropout in recurrent neural networks. In: Proceedings of the 30th International Con ference on Neural Information Processing Systems. 2016, 1019–1027

[20]

Sanjappa S, Ahmed M. Analysis of logs by using logstash. In: Proceedings of the 5th International Conference on Frontiers in Intelligent Computing: Theory and Applications. 2017, 579–585

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (2116KB)

Supplementary files

Article highlights

1422

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/