Rts: learning robustly from time series data with noisy label
Zhi ZHOU , Yi-Xuan JIN , Yu-Feng LI
Front. Comput. Sci. ›› 2024, Vol. 18 ›› Issue (6) : 186332
Significant progress has been made in machine learning with large amounts of clean labels and static data. However, in many real-world applications, the data often changes with time and it is difficult to obtain massive clean annotations, that is, noisy labels and time series are faced simultaneously. For example, in product-buyer evaluation, each sample records the daily time behavior of users, but the long transaction period brings difficulties to analysis, and salespeople often erroneously annotate the user’s purchase behavior. Such a novel setting, to our best knowledge, has not been thoroughly studied yet, and there is still a lack of effective machine learning methods. In this paper, we present a systematic approach RTS both theoretically and empirically, consisting of two components, Noise-Tolerant Time Series Representation and Purified Oversampling Learning. Specifically, we propose reducing label noise’s destructive impact to obtain robust feature representations and potential clean samples. Then, a novel learning method based on the purified data and time series oversampling is adopted to train an unbiased model. Theoretical analysis proves that our proposal can improve the quality of the noisy data set. Empirical experiments on diverse tasks, such as the house-buyer evaluation task from real-world applications and various benchmark tasks, clearly demonstrate that our new algorithm robustly outperforms many competitive methods.
weakly-supervised learning / time-series classification / class-imbalanced learning
| [1] |
|
| [2] |
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 770−778 |
| [3] |
|
| [4] |
|
| [5] |
Cao H, Li X L, Woon Y K, Ng S K. SPO: structure preserving oversampling for imbalanced time series classification. In: Proceedings of the 11th IEEE International Conference on Data Mining. 2011, 1008−1013 |
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
Atkinson G, Metsis V. Identifying label noise in time-series datasets. In: Proceedings of 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of 2020 ACM International Symposium on Wearable Computers. 2020, 238−243 |
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
Ward M, Malmsten K, Salamy H, Min C H. Data balanced bagging ensemble of convolutional- LSTM neural networks for time series data classification with an imbalanced dataset. In: Proceedings of 2021 IEEE International Symposium on Circuits and Systems. 2021, 1−5 |
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
Jia L H, Guo L Z, Zhou Z, Li Y F. Lamda-ssl: a comprehensive semi-supervised learning toolkit. Science China Information Science, 2023 |
| [27] |
Dau H A, Bagnall A J, Kamgar K, Yeh C M, Zhu Y, Gharghabi S, Ratanamahatana C A, Keogh E J. The UCR time series archive. IEEE/CAA Journal of Automatica Sinica, 2019, 6(6): 1293–1305 |
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
Higher Education Press
Supplementary files
/
| 〈 |
|
〉 |