Group-wise co-salient object detection via multi-view self-labeling novel class discovery

Yang WU, Gang DONG, Lingyan LIANG, Yaqian ZHAO, Kaihua ZHANG

Front. Comput. Sci. ›› 2024, Vol. 18 ›› Issue (2) : 182709.

PDF(1545 KB)
Front. Comput. Sci. All Journals
PDF(1545 KB)
Front. Comput. Sci. ›› 2024, Vol. 18 ›› Issue (2) : 182709. DOI: 10.1007/s11704-023-3284-5
Image and Graphics
LETTER

Group-wise co-salient object detection via multi-view self-labeling novel class discovery

Author information +
History +

Graphical abstract

Cite this article

Download citation ▾
Yang WU, Gang DONG, Lingyan LIANG, Yaqian ZHAO, Kaihua ZHANG. Group-wise co-salient object detection via multi-view self-labeling novel class discovery. Front. Comput. Sci., 2024, 18(2): 182709 https://doi.org/10.1007/s11704-023-3284-5

1 Introduction

Multivariate time series (MTS) data are prevalent across various sectors, including healthcare [1,2], finance [3], and cybersecurity [46]. To enhance the efficiency and accuracy of analyzing MTS data, machine learning techniques are often deployed in downstream tasks, ranging from predicting stock market trends [7] to assessing patient data [8,9].
During MTS data collection, practitioners across various sectors often encounter issues such as irregularities in datasets [10,11]. An example of this can be seen in the healthcare sector, where the monitoring of patient health metrics via wearable devices may not always be uniform. This lack of uniformity can result from several factors, such as device malfunctions or irregular usage patterns by individuals, leading to data being captured at irregular intervals. Irregularity can be formally defined as the presence of uneven time intervals between observations. As shown in Fig.1, regular multivariate time series are observed at consistent, evenly spaced time intervals with no missing observations. Conversely, irregular time series are observed at inconsistent or uneven time intervals, with not all variables recorded at each timestamp, resulting in missing values. These irregularities pose challenges to recent advanced machine learning models [12,13], where the models typically rely on data being complete and uniformly sampled. Given the deviation from regularly uniformed data, there is a need for adapting machine learning methodologies to ensure effective analysis of irregular multivariate time series (IMTS) data.
Fig.1 Comparison of regular multivariate time series and irregular multivariate time series. Different colors represent different variables in the MTS. Solid circles denote observed data points, while hollow circles indicate missing data points

Full size|PPT slide

Over the years, IMTS data analysis has gone through an evolution in techniques. In [14], early IMTS methods were categorized into repairing methods [15,16], spectral methods [17,18], and kernel-based methods [19,20]. Yet these works fail to harness the full potential of deep models, particularly recurrent ones. To fix this, natural language based models such as Recurrent Neural Networks (RNNs)-based methods [2,11,21,22], attention-based methods [10,23,24], and transformer-based methods [2527] were developed and applied to IMTS data. However, IMTS data differs fundamentally from natural language data as each dimension is typically recorded by sensors with inherent relationships between them. Recent works have addressed this gap by integrating inter-series relationships [23,26,28]. Among them, Graph Neural Network (GNN) has revolutionized the way we handle IMTS data, due to their ability in leveraging dependencies between sensors represented as nodes in a graph.
In GNN-based approaches, missing data sensors, depicted as nodes, can assimilate insights from their connected neighbours. This enables GNNs to make informed predictions even with missing values. The pioneering works [29,30] have applied a graph aggregation method after the first encoding stage in RNNs to enhance the imputation accuracy or representation learning. Alternatively, [31] aimed to address this issue by learning the embeddings and dynamics of sensors through a temporal-aware attention with [32,33] expanding upon its foundation; considering extra interval information and transfer learning strategy. Liu et al. [34] conceptualized the correlation of dynamic variables as a polynomial of a temporal matrix. These advancements indicate a growing interest in employing GNNs for analyzing IMTS. Unfortunately, these methods often presuppose static sensor relationships [29,35] or depend on attention mechanisms which require access to complete datasets [3133]. Acquiring this prior knowledge or early access to complete data is challenging in streaming data scenarios - where data points are collected over time, from multiple sources or sensors, where each source or sensor represents a distinct variable or dimension in the time series. To address the dynamic nature of streaming data in IMTS, ongoing research needs to develop GNN-based models capable of incrementally updating their understanding, without the prerequisite of future data.
In this paper, we introduce a novel instance-atten-tion GNN methodology, specifically tailored to enhance the processing of IMTS data. Starting with minimal initial data, our approach proactively assesses node relationships, enabling instantaneous graph learning and adaptation. Central to our method is a dual strategy: firstly, reducing the initial data dependency to build model robustness, thus minimizing reliance on predefined graph structures or extensive datasets; secondly, implementing a continuous learning mechanism via instance-based attention, which fosters the real-time evolution of the dependency graph within an RNN framework. This innovative architecture not only facilitates the prompt updating of sensor embeddings, but also offers the versatility to cater to the specific needs of downstream tasks, whether data imputation or embedding extraction. Furthermore, our model is equipped with a frequency-specific handling strategy, ensuring it adeptly meets the varied demands of IMTS data analysis. Our contributions are as follows:
● We introduce a novel instance-attention strategy that learns and dynamically updates the graph structure in response to newly incoming data at each timestamp.
● Our model is versatile, efficiently managing both imputation and non-imputation tasks, with customized strategies for different sampling rates in IMTS.
● Experimental results show that our model’s performance surpasses current state-of-the-art approaches.

2 Related work

2.1 Classical methods for IMTS

Classical methods for handling irregular multivariate time series (IMTS) data have traditionally been categorized into three types [14]: spectral-based, repair-based, and kernel-based methods. Recognizing that spectral-based and repair-based methods often lacked accuracy or required specific spectral structures, [14] introduced a kernel-based approach. This method distinguished itself by employing a unique generalization of the inner product using non-parametric kernel functions. Subsequent advancements in kernel-based methodologies were presented by [19,20], which developed a novel approach for classifying sparse and irregularly sampled time series. However, these kernel-based methods, while innovative, faced limitations due to their reliance on kernel functions and the assumption of linear relationships, potentially restricting their applicability across diverse time series datasets.

2.2 Neural network methods for IMTS

Building on classical methodologies, the analysis of IMTS data has significantly evolved towards deep learning. GRU-D [21] marked this transition by employing the Gated Recurrent Unit (GRU) architecture to innovatively integrate missing patterns through masking and time intervals. BRITS [36] further developed data imputation techniques using bidirectional RNNs, avoiding strict data dynamic assumptions. Meanwhile, the use of Generative Adversarial Networks (GANs) with a modified GRU in [37] sought to learn the distribution of multivariate time series for missing value imputation, yet challenges remained in fully capturing complex interrelations, especially the continuous nature of time-series data.
To fully capture the continuous nature of time-series data, ordinary differential equations (ODEs) have been utilized. ODE-RNNs [11,22] signified a pivotal development in modeling continuous-time dynamics, particularly suited for non-uniformly sampled series. By integrating ODEs, these models proficiently managed varying time intervals. Building on this, Continuous Recurrent Units (CRUs) [2] advanced the concept further, allowing the hidden state to evolve through linear stochastic differential equations within an encoder-decoder architecture, thus enhancing the model’s capacity to interpret and process the temporal dynamics inherent in IMTS.
Recognizing the intrinsic nature of IMTS data, typically sourced from sensor measurements, there has been a shift towards leveraging attention [10,23,24] and transformer [2527] techniques to address the nuanced inter-series relationships. Warpformer [26], an attention-based method, provided innovative solutions for intra-series irregularity and inter-series discrepancy by combining input representation with a warping module and attention mechanism, thereby significantly enhancing the management of sparse, irregularly sampled multivariate time series. In parallel, [24] developed an attention framework that embeds continuous time values, thus improving the handling of variable-length series and boosting performance in tasks like interpolation and classification. In contrast to these methods, GNNs [38] offer a more direct and interpretable approach by explicitly capturing the dependencies between sensors represented as nodes in a graph, providing a robust framework for addressing the challenges of irregularity in MTS data.

2.3 GNN methods for IMTS

GNNs stand out as a particularly promising framework in handling IMTS data, adept at capturing complex relationships due to their intrinsic design for graph-structured contexts, where nodes symbolize entities and edges depict interrelations. Central to GNN functionality is message passing, a process where nodes amalgamate received messages from neighbours to refine their representation, thus enriching it with local neighbourhood information.
Specifically, in the realm of MTS, frameworks such as METRO [39] exemplify the general GNN application, though challenges such as irregular patterns and missing values necessitate more tailored approaches for IMTS. Efforts in irregular time series modelling have seen advancements with methods such as Raindrop [31], employing temporal attention to discern pertinent temporal segments amidst irregular intervals, and subsequent enhancements by [32], integrating time intervals with decay functions for improved encoding and temporal attention.
Furthermore, adapting to evolving data structures in real-time scenarios, dynamic GNNs, as explored by [40,41], offer updated graph structures to accommodate shifting patterns, albeit necessitating further research on scalability and computational efficiency. Meanwhile, traditional methods such as those of Wei et al. [30] and GRIN [29] focus on predefined attention mechanisms and known graph topologies, respectively, revealing a gap in handling dynamic graph structures inherent in real-world streaming. The dual-phase methodology of Barros et al. [35], merging RNN and GNN capabilities, exemplifies the effort to marry temporal feature extraction with graph-based synthesis, pointing to the need for models that can adapt to both evolving graph structures and the real-time demands of streaming data.
Despite GNNs’ proven effectiveness in leveraging static and prior known relationships, their full potential in dynamic, real-time environments remains fertile ground for further investigation.

2.4 Self-supervised learning in GNN methods

Self-supervised learning in Graph Neural Networks (GNNs) has recently gained attention due to its ability to enhance feature learning without extensive labeled data. [42] categorize current research in this direction into three classes based on the self-supervised learning (SSL) techniques explored.
The first class involves contrasting representations to distinguish between positive pairs (node level, graph level or sample level) and negative pairs, enhancing representation methods [43]. The second class masks data, training the model to predict the masked parts or some properties, leveraging the temporal aspect of time-series data [44]. In the third class, data is masked, but instead of predicting the masked parts, methods generate subgraphs or node features that resemble the masked data [45]. Self-supervised learning in GNN has significant applications in recommendation systems, addressing common challenges including popularity bias and interaction noise. For example, AutoCF [46] employs a masked graph autoencoder to aggregate global information by reconstructing masked subgraph structures, thereby enhancing representation discrimination ability. CGI [47] adaptively learns to drop edges or nodes, creating optimized graph structures in an end-to-end manner, and reducing interaction noise.
Our proposed method shares some similarities with the masking methods in self-supervised GNNs, particularly the second class that predicts masked values. However, we employ a different approach by using recurrent values to enable the imputation of missing data. In our study, when there are no missing values, we use the ground truth values as the masked values to guide the model’s predictions. When there are missing values at a particular timestamp, we leverage the already learned models in the recurrent structure to extract representations and use these representations for imputation.

3 The DynIMTS methodology

In this section, we introduce our proposed algorithm DynIMTS (Dynamic IMTS). We begin by describing the preliminary formulation of the problem. Following this, we delve into the overall framework of DynIMTS, describing each component in detail. Finally, we present our tailored strategy for handling data with high sampling rates, as well as the implementation for downstream tasks.

3.1 Preliminaries

In a regular MTS, each data point consists of d individual series and the ith series is represented as a length-T vector, denoted by xi=[xi,1,xi,2,,xi,T]. This vector is collected by a sensor over the time period from 1 to T. The entire data point is represented as a matrix of these vectors, X=[x1;x2;;xd], where XRd×T, with each column corresponding to a vector xt collected at a specific time stamp t.
In IMTS, different vectors xi may record data at varying intervals, leading to the presence of missing values in the matrix X. To manage these missing values, a mask matrix M{0,1}d×T is used, which indicates the presence or absence of data at each position in X. Specifically, the (i,j)th value of M, denoted as mi,j, is set to 1 if the value xi,j in X is not missing, and is set to 0 otherwise. The elementwise product XM can be used to zero out the missing values, resulting in a matrix X^.
Typically, a training dataset D consists of N data points. In imputation tasks, the primary objective is to fill in the missing values in X^, with the dataset represented as D={X^j}j=1N. For classification tasks, imputation might not be necessary if a reliable representation of the data can be learned directly from incomplete data. In this case, the dataset is denoted as D={(X^j,yj)}j=1N, where yj represents the class label associated with each data point X^j.
In MTS, each time series is collected by a sensor, and these sensors may exhibit connections, such as geographical proximity. To model the pairwise relationships between sensors, a graph structure is employed, where each node represents a sensor. Specifically, a graph is defined as G=(V,A), with V=v1,,vd representing the set of nodes, and A being the adjacency matrix. Each node vi will be associated with a sensor. In this matrix, the element Ai,j quantifies the correlation between nodes vi and vj, with a larger value indicating a higher correlation. As these relationships can change over time, the graphs corresponding to different time stamps are denoted as G1,G2,,GT, representing the graphs dynamically learned at each time stamp from 1 to T, respectively.

3.2 Framework

The DynIMTS framework, depicted in Fig.2, is a recurrent architecture designed to leverage spatial-temporal properties for IMTS data. At its core, each recurrent unit is a spatial-temporal encoder composed of three components: graph learning, embedding learning and spatial-temporal learning.
In the graph learning component, the framework utilizes the sensor embeddings learned from the previous timestamp, t1. Each instance within the batch contributes its own unique sensor embeddings, reflecting distinct data characteristics. The primary objective of this component is to integrate the individual information into a cohesive, unified graph for time stamp t. Initially, the edge weights for each data sample are carefully estimated to tailor the graph structure to the unique attributes of each instance. Following this, an instance-attention mechanism is employed to integrate these diverse graphs. This integration results in a unified graph that captures and reflects the inter-instance relationships, significantly enhancing the model’s capability to generalize across varied data samples.
The embedding learning begins by taking the current timestamp inputs for each node. These inputs are processed through an MLP which operates across all instances in batch mode, using shared graph structure and adjacency matrix learned in the graph learning component, although the embeddings are instance-specific. The MLP outputs new embeddings, which are then passed through a Graph Convolution Network (GCN) for message passing. This step leverages the graph structure to update each node’s representation by integrating neighbourhood embeddings, enriching the missing values’ information.
In the spatial-temporal learning component, the instance embeddings learned in the embedding learning component go through a recurrent graph convolution network to learn the sensor embedding, complemented by normalization and dropout techniques to stabilize and regularize the learning process. This produces a refined instance-wise sensor embedding for the current timestamp, which is fed back into the recurrent unit in the next time stamp. This cycle repeats, continuously evolving the graph’s structure and sensor embeddings.
Fig.2 Framework of DynIMTS. The model is a recurrent structure based on a spatial-temporal encoder and consists of three main components: embedding learning, spatial-temporal learning, and graph learning. The Bt denotes samples in a Batch, and A and At is the corresponding graph structure. The final learned sensor embedding sT is used for classification

Full size|PPT slide

Ultimately, the final graph representation serves as the input for task-specific components. Depending on the application, this could involve: i) A spatial-temporal decoder for imputation tasks, aiming to fill missing values by leveraging neighbourhood embeddings and the learned graph structure; ii) Several classification layers for classification tasks, focusing on harnessing the spatial-temporal features encoded in the graph. The network is trained end-to-end, optimizing either an imputation loss for data recovery tasks or a classification loss for predictive tasks. This training approach ensures that the DynIMTS framework adapts to the specific characters of the provided spatial-temporal data, maximizing its performance on the designated tasks. Through its recurrent structure and integration of graph-based learning with instance attention mechanisms, the DynIMTS framework offers a robust solution for IMTS data on either imputation or classification tasks.
In our approach, missing values at timestamp t are addressed by updating sensor embeddings using a combination of methods. We employ the representations from the encoder model at timestamp t and then decode these representations to impute the missing values using the decoder model. Both the encoder and decoder models are trained with data from timestamps 1 to t1. More specifically, we propose a two-step embedding learning process: transformation and message-passing. In the message-passing step, the strategy integrates information from neighboring observed nodes, effectively representing missing values as embeddings with integrated information. Following this, the sensor embedding is further updated in the spatial-temporal learning process, ensuring accurate representation and handling of missing values. In summary, the model incorporates a recurrent structure that leverages past data and a graph neural network that leverages neighborhood data for imputation.

3.3 Graph learning

Graph Learning aims to learn the connections or dependencies that exist amongst the sensors. Initially, instance-wise sensor embeddings learned in the t1 timestamp for instance X’s ith dimension, denoted as st1i(X), are inputted. Based on such input, this process first computes the cosine similarity cos(,) between sensor embeddings, which is used as the initialization of the adjacency matrix, that is
A~i,j(X)=cos(st1i(X),st1j(X)),
where the learned A~i,j(X) will form a matrix denoted as A~(X).
Instance Attention integrates the adjacency matrix learned for different instances into one. To do so, we utilize an instance-based attention mechanism based on the query, key and value matrices to calculate attention weights. This approach enhances the overall performance by acknowledging and responding to the inherent variation presented across multiple instances. We use single-head, and denote WQRd×dK, and WKRd×dK the parameter matrices specific to the head and dK the embedding dimension. Then we have
Q(X)=A~(X)WQ,K(X)=A~(X)WK.
Then we have the integrated adjacency matrix as
α(X)=softmax(Q(X)K(X)dK),At=σ{Wa×norm(XBt1α(X)A~(X))},
in which At is the integrated graph at time point t, σ() is the sigmoid activation function, WaRd×d is linear transformation parameters, norm() gives the max-min normalization of the matrix and Bt1 is the batch used in the previous round.

3.4 Embedding learning

Embedding learning aims to capture a latent representation of the initial data through a two-step process: transformation and message-passing. The transformation step involves converting the value xi,t from the ith sensor at timestamp t into a latent space of dimension d1. To incorporate the missing information, the xi,t is concatenated with mi,t, the (i,t)th element of mask matrix M. This transformation is mathematically represented as:
ei,t=wiconcat(xi,t,mi,t),
where wiRd1×2 denotes the transformation weights for the ith sensor and concat(,) gives a column-wise concatenation.
To accommodate the irregularity often found in data, where some embeddings may be null, we employ a message-passing function within a Graph Convolution Network (GCN). This function facilitates the substitution of null embeddings with integrated information from neighbouring nodes. We define nodes i and j as neighbours if their connectivity in the adjacency matrix Ai,j exceeds a threshold, for instance, 0.1. Message passing is then conducted as follows:
MP(ei,t,At)=σ(WMPjN(i)ej,t),
where WMPRd1×d1 is the transformation parameters to be learned, σ represents the activation function (PReLU here), and At is the adjacency matrix learned in the graph learning component. We denote the final integrated embedding after the MP for the ith dimension at timestamp t to be e¯i,t. Note that the learned e¯i,t is specifically for each instance and we can also denote it as e¯i,t(X) if necessary.

3.5 Spatial-temporal learning

This component mainly learns the sensor emebddings specified to each instance. In the following description, we will omit X in the description if the context is clear. In learning such embeddings, we employ a recurrent graph convolution network based on GRU. This approach seeks to extract temporal patterns of IMTS, similar to traditional RNNs, yet relies on a learned graph structure for calculation. To do so, we initialize the sensor embedding as
s~i=concat(e¯i,t,mi,t),
where concat(,) gives a column-wise concatenation of two vectors. Overall, s~iRd1+1 gives an integration of sensor embedding and the missing pattern.
Then we follow the GRU procedure to define the reset gate, update gate and candidate hidden state respectively in
rti=sigmoid(MP(s~i,At),Wr),uti=sigmoid(MP(s~i,At),Wu),cti=tanh(MP(rtis~i,At),Wc),
where is the element-wise multiplication, Wr,Wv, and Wc are the weights to be learned. In the above equations, (MP(s~,At),Wk) calculates the product between the updated s~ after MP and Wk, k{r,u,c}. Thus the final sensor embedding can be written as
sti=utis~i+(1uti)cti,
where stiRd2 and d2 is the embedding size here.

3.6 Data with high sampling frequency

For IMTS data with a high sampling rate, the historical data may be more reliable in providing essential information for the current timestamp. In this way, we modify the previously proposed learning strategies. Specifically, we modify three places: calculating the integrated adjacency matrix in the Graph Learning component, the transformation step in the embedding learning component, and the initialization of sensor embedding in the Spatial-temporal Learning component. Below we introduce them one by one.
For the initialization of sensor embedding in the Spatial-temporal Learning component, instead of directly using the learned e¯i,t, we first use a simple linear model to impute the missing data by
x^i,t=Wline¯i,t+blin.
Then, we initialize the sensor embedding, we replace Eq. (5) by
s~i=concat(x^i,t,mi,t),
in which s~iR2.
For the integrated adjacency matrix in the Graph Learning component, we added a min-max normalization out of the sigmoid function. Specifically, we replace Eq. (2) with
At=norm{σ[Wa×norm(XBt1α(X)A~(X))]}.
For the transformation step in the embedding learning component, the historical information is added into Eq. (3), and we get the transformation by
ei,t=WiTconcat(xi,t,mi,t,st1i),
where WiT is now a matrix of size d1×(d2+2).

3.7 Imputation and classification

Below we introduce how we do imputation and classification based on the learned embeddings.
Imputation For the imputation task, we provide a decoding component to learn the missing x values from the sensor embeddings sti. The most straightforward way of decoding can be a linear transformation, for example,
x^i,t=Wimsti+bim.
In real implementations, the x^i,t may further undergo RGCN and MP processes [29], and we omit the details here. After having the estimated x^i,t, we can then optimize the MAE loss to learn the model.
Classification To do classification, we concatenate all dimensions’ embeddings and train a generic classifier such as MLP for the final prediction. Note that in our proposed framework, there are two kinds of embeddings, one is outputted by the Embedding Learning component, i.e., e¯i,T and another is the sensor embedding outputted by the Spatio-temporal Learning component, i.e., si,T. We use both of them in the classification task and add their losses together. Specifically, we use the following loss function in classification
Lossc=CrossEntropy(y,MLP1(e¯1,T,,e¯d,T))+CrossEntropy(y,MLP2(s1,T,,sd,T)),
i.e., we concatenate the embeddings from different nodes and use an MLP to do the prediction.

3.8 Time complexity

We analyze the time complexity component by component. For the partial temporal encoder, in one time stamp, the embedding learning has the complexity of O(m0Nd2), where m0 is the number of learnable parameters in the neural networks models, including both MLP and GCN. The quadratic increase in complexity with respect to the dimensionality d is essential in our proposed method because it involves graph integration among individual series. The spatio-temporal learning follows a similar time complexity of O(m1Nd2), where m1 is defined similarly to m0 in this component. The graph learning component will have a time complexity as well depending linearly on N and the number of parameters, and quadratically on d. By adding this component together, we have an overall complexity of O(mNTd2), where m is the total number of learnable parameters in the proposed model. The decoding phase will double the overall complexity, and we omit the constant 2 in the Big O notation.

4 Experiments and results

In this section, we conduct an empirical analysis of our proposed method, concentrating on imputation and classification tasks. We compare its performance with that of baseline methods. Additionally, we perform an ablation study. To illustrate the evolution of the learned graph, we provide visualizations. Our experiments were carried out using the PyTorch framework on NVIDIA RTX A5000 GPUs within a CUDA environment. The code for replicating the experiments is available online at the website of github.com/Evan-Kun/DynIMTS.

4.1 Imputation

Datasets We use the following datasets
Air quality index (AQI) [29]: This time-series dataset, as detailed in [48], comprises hourly PM2.5 level recordings from 437 monitoring stations across 43 cities in China, spanning from May 2014 to April 2015. The dataset is irregular in nature and has been extensively utilized to assess the effects of imputation, as reported in [29,36,49]. For consistent evaluation, we follow the methodology described in [29] to simulate irregular data patterns and construct a geometric distance-based graph structure.
AQI-36 [29]: This dataset is a reduced version of AQI by incorporating the measure of only 36 cities. It was also used extensively [29,36,49]. As AQI, we follow [29] to simulate the irregular data patterns and construct the geometric distance-based graph.
Exchange-rate [50]: Unlike the inherently irregular AQI and AQI-36 datasets, the Exchange-rate dataset is regular, containing daily currency exchange rates for one US dollar to 8 different currencies from 1990 to 2016. The graph structure for this dataset is developed by calculating the Pearson correlation between pairs of exchange-rate vectors. To introduce patterns of irregularity, data points are randomly omitted following a uniform distribution across the interval [0,1). Given the originally complete nature of this dataset, it serves as an ideal basis to evaluate how our proposed algorithm performs under varying patterns of irregularity. Specifically, we vary the missing data ratio from {20%,80%} to demonstrate the effectiveness of our approach under different conditions.
Baselines For the task of imputing missing values in IMTS, we compare our model against the following baselines:
BRITS [36]: This method utilizes a bidirectional recurrent neural network tailored for imputing missing values in multivariate time series. Since future information is unavailable at the time of imputation, the model is modified to function unidirectionally. Therefore, instead of comparing it with the original bidirectional RITS, we benchmark against a variant, which we refer to as RITS.
GRIN [29]: This is a recurrent graph convolution neural network devised to impute missing values within multivariate time series. Similarly, only one direction is used to process data. Due to that a pre-defined graph structure is requested by running the GRIN method, we use two versions of it: GRIN (all-ones) and GRIN (physical). GRIN (all-ones) uses a fully connected graph with an all-ones adjacency matrix, while GRIN (physical) uses the physical graph structure, i.e., the graph of geometric distances in AQI and AQI-36, and the Pearson Correlation graph in Exchange Rate. Note that GRIN (physical) employs the groundtruth graph information in AQI datasets, so it is expected to perform better than our proposed DynIMTS, which has no access to the ground truth graph.
Settings For training the model, the dataset is partitioned into three subsets: 70% for training, 20% for validation, and 10% for testing. Each experiment is run 5 times and we report the mean and standard deviation (std) of the results. For each neural network, we tune the learning rate from {103,104}, batch size from {32,64,128}, hidden layer size from {8,16,32,64} and sensor embedding dimension (for GRIN and DynIMTS) from {4,8,16,32,64}. The number of epochs is capped at 200, with an early stopping mechanism that terminates the process if no improvement is observed in the validation set across 40 successive epochs.
Metrics Three metrics are used to evaluate the imputation effects: MAE (Mean Average Error), MSE (Mean Squared Error) and MRE (Mean Relative Error). Assuming the groudtruth of time t, dimension i is xt,i and the imputated value is x^t,i, then MAE calculates the average of |x^t,ixt,i|, MSE calculates the average of (x^t,ixt,i)2 and MRE calculate the average of |x^t,ixt,i|/|xt,i|.
ResultsTab.1 demonstrates the performance of the proposed DynIMTS algorithm on the AQI-36 and AQI datasets. Among the methods without access to physical graph information, our DynIMTS consistently outperforms the others across all three metrics on both datasets. Even when compared to GRIN (physical)—which benefits from full access to the geometric distance graph—our DynIMTS demonstrates comparable or superior results in most cases. The larger graph size required by the AQI dataset, which presents more significant learning challenges, may account for our method’s superior performance on AQI-36 as opposed to AQI. Contrastively, the relative underperformance on the AQI may be attributed to the lack of essential graph information that GRIN (physical) utilizes.
Tab.1 Comparative performance of imputation models on AQI and AQI-36 datasets (mean ± std)
AQI-36AQI
MAEMSEMREMAEMSEMRE
GRIN (physical)15.8 ± 0.6792.0 ± 93.90.5 ± 0.017.4 ± 0.3935.3 ± 34.30.5 ± 0.0
RITS16.8 ± 1.7872.6 ± 184.50.7 ± 0.122.8 ± 0.71437.7 ± 99.30.6 ± 0.0
GRIN (all-ones)15.7 ± 0.5710.7 ± 45.70.6 ± 0.019.8 ± 0.31089.0 ± 11.00.6 ± 0.0
DynIMTS14.9 ± 1.1673.0 ± 149.70.5 ± 0.018.3 ± 0.3997.8 ± 70.40.5 ± 0.0
Fig.3 illustrates the performance comparison of our DynIMTS against baseline methods under missing rates of 20% and 80% on the Exchange Rate dataset. The results lead us to several conclusions: (i) an elevated missing rate negatively impacts all methods; (ii) our DynIMTS outperforms the baseline models in terms of MAE and MSE; (iii) however, it lags behind the baselines on MRE by approximately 0.02. The combination of a higher MAE and a lower MRE suggests that while our method excels at imputing larger values, it does so at the cost of precision with smaller values. It is worth noting that RITS performs less effectively than both the baselines and our DynIMTS, likely due to its disregard for graph structure. To clearly highlight the comparative advantages of DynIMTS over the GRIN methods, RITS is excluded from this visual illustration. We have also added Fig.4 to visualize the original time series and the predicted time series on the Exchange Rates Data. The plots across eight dimensions show that the predicted values closely align with the ground-truth values, capturing the overall trends and fluctuations accurately. While minor deviations are present in some dimensions, the general pattern of the data is well-represented.
Fig.3 Evaluation of imputation methods at 20% and 80% missing data ratios for exchange rates dataset. (a) MAE by missing ratio and method; (b) MSE by missing ratio and method; (c) MRE by missing ratio and method

Full size|PPT slide

Fig.4 Visualization of the imputation results produced by our model with a 20% missing data ratio on the Exchange-rate dataset across eight dimensions. The black lines represent the ground truth values, while the orange lines represent the predicted values. (a) Dimension 1; (b) Dimension 2; (c) Dimension 3; (d) Dimension 4; (e) Dimension 5; (f) Dimension 6; (g) Dimension 7; (h) Dimension 8

Full size|PPT slide

4.2 Classification

Datasets For the classification task, we use the following datasets.
P12 [51]: This dataset comprises binary classification data collected from 36 sensors monitoring 11,988 ICU patients over a 48-hour period. The data is inherently irregular, with the classification task focusing on determining whether patients’ ICU stays exceed three days. It is characterized as a low-frequency dataset due to the considerable time gaps between some measurements. There are some static features in this dataset that were excluded from the experiments.
P19 [52]: For this binary classification dataset, we preprocess it following the methodology outlined in [31]. The processed dataset is derived from data collected by 34 sensors monitoring 38,803 patients. The labels identify whether the patients develop sepsis within the subsequent 6 hours. Similar to P12, static features are excluded. Additionally, this dataset is characterized by its low-frequency data collection.
PAM [53]: This multi-class dataset comprises eight classes, each representing a different daily living activity. Following the preprocessing methods outlined in [31], the resultant dataset includes 5,333 samples. Each sample is collected from 17 sensors over 600 observations of daily activities, with a time gap of 0.01 seconds between each. The original dataset is regular; however, observations can be uniformly and randomly removed to simulate irregular patterns. This dataset is counted as high-frequency data.
AWR [54,55]: The dataset captures the movements of the tongue and lips during speech, specifically focusing on the articulation of 25 words. Each word serves as a label, and the data is sampled at a rate of 0.005 seconds. The dataset comprises 575 instances, each with a sequence length of 144 and across 9 dimensions. It is structured as a regular dataset but can be transformed into an irregular format through both uniform and random sampling techniques. This dataset is counted as high-frequency data.
For none of the dataset contains any geographical information, we simply use all-ones adjacency matrix for baseline methods requiring a graph input.
Baselines Our model’s performance is evaluated against the imputation methods RITS and an all-ones variant of GRIN. Subsequently, two fully connected layers are utilized for classification purposes. Additionally, we compare our results with the baseline method Raindrop [31], which is designed to learn representations for downstream tasks. Notably, the graph used in Raindrop is based on an all-ones adjacency matrix.
Settings The outputs of all methods are processed through two fully connected layers designed specifically for the classification task. All hyperparameters are tuned according to the procedures outlined in Section 4.1. So is the train/val/test split.
Metrics For classification tasks, we assess performance using AUROC [56], AUPRC [57], and Accuracy metrics. Among them, AUROC and AUPRC are particularly well-suited for evaluating the imbalanced classification tasks P12 and P19. Accuracy will be used to evaluate the multi-class balanced dataset PAM and AWR.
ResultsTab.2 presents an empirical comparison of various models on the P12 and P19 datasets. It is evident that on the P12 dataset, DynIMTS consistently outperformed all baseline models across both evaluated metrics. In contrast, on the P19 dataset, while DynIMTS generally surpassed other baselines, its performance was comparable to that of GRIN (all-ones). This variance in performance could be attributed to that the P19 dataset, collected at a lower frequency, results in shorter time-series data compared to the P12 dataset (60 vs. 215). This shorter time series results in less data being available to accurately learn and model the interconnections between sensors. Consequently, the proposed method, which leverages these interconnections in the form of a graph, has limited effectiveness, leading to only marginal improvement due to insufficient and potentially less reliable data from the shorter observation period.
Tab.2 Evaluation on P12 and P19 datasets using AUROC% and AUPRC% (mean ± std)
P12P19
AUROCAUPRCAUROCAUPRC
RITS71.2 ± 2.016.4 ± 2.786.7 ± 0.747.2 ± 3.1
Raindrop60.9 ± 5.314.7 ± 2.074.1 ± 0.833.6 ± 1.7
GRIN (all-ones)70.8 ± 2.012.5 ± 2.887.1 ± 1.148.4 ± 2.0
DynIMTS75.1 ± 1.625.2 ± 3.087.1 ± 1.148.4 ± 1.9
Tab.3 provides an evaluation of accuracy across two datasets at varying missing rates. It is apparent that the proposed DynIMTS model exhibits less sensitivity to increasing missing rates compared to baseline methods. In most cases, DynIMTS consistently outperforms the baseline models, demonstrating its robustness and superior performance even under conditions of high irregularity. The observation that classification performance improves when masking more data (such as from 50% of the data compared to 40%) seems counterintuitive. Upon examining the dataset, we found that as the missing ratio increases, the observed data points become sparser, resulting in smoother and flatter temporal patterns. As a result, the correlation of weakly related variables may be lost due to the higher missing ratio, leading the model to “ignore” these less significant patterns. We think the high missing ratio acts as a feature selector and regularizer, helping the model to generalize slightly better by ignoring some patterns and data points that might otherwise introduce noise or less relevant information. We will further explore this phenomenon in future work.
Tab.3 Accuracy% (mean ± std) on PAM and AWR datasets with various missing rates
Missing ratio/% Models PAM AWR
10 RITS 87.8 ± 0.6 83.8 ± 2.8
10 Raindrop 75.5 ± 1.7 52.5 ± 4.8
10 GRIN (all-ones) 97.5 ± 0.6 79.1 ± 4.0
10 DynIMTS 97.4 ± 0.5 84.4 ± 3.7
20 RITS 79.1 ± 0.7 73.2 ± 16.8
20 Raindrop 75.5 ± 1.7 47.5 ± 5.8
20 GRIN (all-ones) 97.6 ± 0.2 80.0 ± 3.6
20 DynIMTS 98.3 ± 0.2 83.5 ± 3.2
30 RITS 73.5± 0.8 69.2 ± 14.5
30 Raindrop 70.4 ± 0.5 47.7 ± 6.6
30 GRIN (all-ones) 95.9 ± 1.2 77.4 ± 3.2
30 DynIMTS 97.2 ± 0.7 77.2 ± 5.7
40 RITS 71.9 ± 0.9 56.0 ± 24.6
40 Raindrop 64.0 ± 1.4 50.6 ± 5.2
40 GRIN (all-ones) 93.5 ± 0.6 76.5 ± 5.9
40 DynIMTS 94.0 ± 0.5 78.6 ± 2.2
50 RITS 61.1 ± 1.9 59.0 ± 14.0
50 Raindrop 58.3 ± 2.0 34.6 ± 3.8
50 GRIN (all-ones) 93.5 ± 0.6 76.5 ± 4.3
50 DynIMTS 95.8 ± 0.5 78.8 ± 6.6

4.3 Ablation study

This section presents an ablation study of our proposed DynIMTS, with results on the P12 and P19 datasets evaluated using AUROC and AUPRC, as reported in Tab.4. The findings demonstrate that: (i) Omitting the graph component leads to a significant decline in performance across both datasets, particularly in the smaller P12 dataset, underscoring the critical role of a well-structured graph. (ii) The absence of instance attention results in a performance drop, though it is less pronounced than the impact of excluding the graph component, especially in the P12 dataset. This further emphasizes the importance of the graph structure in the model. (iii) Including graph normalization also deteriorates performance, likely due to the disproportionate influence of edge values on the imputation process. Overall, these results confirm the necessity of each component in the DynIMTS architecture for achieving optimal performance.
Tab.4 Ablation study for the impact of graph, instance attention and graph normalisation
P12P19
AUROCAUPRCAUROCAUPRC
w/o graph57.9 ± 1.38.6 ± 0.781.1 ± 1.639.1 ± 2.9
w graph normalisation59.3 ± 5.19.7 ± 1.683.4 ± 2.142.1 ± 6.4
w/o instance attention69.3 ± 8.318.7 ± 9.085.2 ± 2.446.0 ± 5.7
DynIMTS75.1 ± 1.625.2 ± 3.087.8 ± 0.547.5 ± 3.5

4.4 Graph visualization

This section provides the visualization of graphs learned by DynIMTS in Fig.5. Specifically, we present the (learned) graph’s adjacency matrix on epochs 0, 10, and 20. We can see at the beginning of the training, the edge values exhibit a high degree of similarity. However, as training progresses, a noticeable variation in colour intensity among the edges becomes apparent, indicating a shift in the dependencies across different nodes. Furthermore, the lighter colours tend to become even lighter, while the darker colours intensify, until the graph reaches convergence.
Fig.5 Visualization of the adjacency matrices of epochs 0, 10 and 20 on the P19 dataset. The darker the colour is, the larger the value on the corresponding point. (a) Graph at epoch 0; (b) graph at epoch 10; (c) graph at epoch 20

Full size|PPT slide

Analysis of the derived sensor dependency graph reveals discernible relationships among various sensors. Sensors 0 through 7 represent vital signs (0: Heart Rate, HR; 1: Pulse oximetry, O2Sat; 2: Temperature, Temp; 3: Systolic BP, SBP; 4: Mean arterial pressure, MAP; 5: Diastolic BP, DBP; 6: Respiration rate, Resp; 7: End tidal carbon dioxide, EtCO2) whereas the remaining sensors capture laboratory values. Initially, all sensors exhibit correlations with sensors 0 (HR) and 1 (O2Sat). Over time, the relationships evolve such that 0 (HR) and 2 (Temp) predominantly correlate with other vital sign sensors, suggesting that heart rate and body temperature maintain strong associations primarily with vital parameters. Conversely, sensor 7 (EtCO2), initially not correlated with laboratory values, begins to show significant relationships with all laboratory measurements as time progresses, indicating its potential utility in predicting certain laboratory outcomes such as sensor 11 (pH) and 27 (Troponin). Furthermore, sensor 10 (Fraction of Inspired Oxygen, FiO2) initially displays connections across the sensor array. However, as time advances, it becomes apparent that FiO2 maintains fewer associations, particularly excluding sensors like 14 (Aspartate transaminase, AST), 15 (Blood urea nitrogen, BUN), 18 (Chloride), and 19 (Creatinine), while still correlating with sensor 28 (Hematocrit, Hct) and other vital signs.

5 Conclusion

In this paper, we have proposed a novel dynamic GNN framework tailored to address the complexities of IMTS data, characterized by irregular sampling and missing values. Our approach leverages an instance-attention mechanism to dynamically learn and update graph structures based on real-time data input. Empirical evaluations on real-world datasets demonstrate that our method significantly outperforms existing state-of-the-art solutions. In the future, our aim is to provide a theoretical guarantee for the proposed method.
This is a preview of subscription content, contact us for subscripton.

References

[1]
Wu Y, Song H, Liu B, Zhang K, Liu D. Co-salient object detection with uncertainty-aware group exchange-masking. In: Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, 19639−19648
[2]
Fan D P, Li T, Lin Z, Ji G P, Zhang D, Cheng M M, Fu H, Shen J . Re-thinking co-salient object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44( 8): 4339–4354
[3]
Liu Y, Li T, Wu Y, Song H, Zhang K . Self-supervised image co-saliency detection. Computers and Electrical Engineering, 2023, 105: 108533
[4]
Wu Y, Liang L, Zhao Y, Zhang K. Object-aware calibrated depth-guided transformer for RGB-D co-salient object detection. In: Proceedings of 2023 IEEE International Conference on Multimedia and Expo. 2023, 1121−1126
[5]
Wu Y, Zhang H, Liang L, Zhao Y, Zhang K. Group-wise co-salient object detection with Siamese transformers via Brownian distance covariance matching. In: Proceedings of 2023 IEEE International Conference on Acoustics, Speech and Signal Processing. 2023, 1−5
[6]
Fan Q, Fan D P, Fu H, Tang C K, Shao L, Tai Y W. Group collaborative learning for co-salient object detection. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 12283−12293
[7]
Fini E, Sangineto E, Lathuilière S, Zhong Z, Nabi M, Ricci E. A unified objective for novel class discovery. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 9264−9272
[8]
Zhang N, Han J, Liu N, Shao L. Summarize and search: learning consensus-aware dynamic convolution for co-saliency detection. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 4147−4156
[9]
Zhang K, Wu Y, Dong M, Liu B, Liu D, Liu Q . Deep object co-segmentation and co-saliency detection via high-order spatial-semantic network modulation. IEEE Transactions on Multimedia, 2023, 25: 5733–5746
[10]
Yu S, Xiao J, Zhang B, Lim E G. Democracy does matter: comprehensive feature mining for co-salient object detection. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 969−978

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (Grant No. 62276141).

Competing interests

The authors declare that they have no competing interests or financial conflicts to disclose.

RIGHTS & PERMISSIONS

2024 Higher Education Press
AI Summary AI Mindmap
PDF(1545 KB)

Supplementary files

Highlights (454 KB)

Accesses

Citations

Detail

Sections
Recommended

/