Shape-intensity knowledge distillation for robust medical image segmentation

Wenhui DONG, Bo DU, Yongchao XU

Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (9) : 199705.

PDF(2419 KB)
Front. Comput. Sci. All Journals
PDF(2419 KB)
Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (9) : 199705. DOI: 10.1007/s11704-024-40462-2
Image and Graphics
RESEARCH ARTICLE

Shape-intensity knowledge distillation for robust medical image segmentation

Author information +
History +

Abstract

Many medical image segmentation methods have achieved impressive results. Yet, most existing methods do not take into account the shape-intensity prior information. This may lead to implausible segmentation results, in particular for images of unseen datasets. In this paper, we propose a novel approach to incorporate joint shape-intensity prior information into the segmentation network. Specifically, we first train a segmentation network (regarded as the teacher network) on class-wise averaged training images to extract valuable shape-intensity information, which is then transferred to a student segmentation network with the same network architecture as the teacher via knowledge distillation. In this way, the student network regarded as the final segmentation model can effectively integrate the shape-intensity prior information, yielding more accurate segmentation results. Despite its simplicity, experiments on five medical image segmentation tasks of different modalities demonstrate that the proposed Shape-Intensity Knowledge Distillation (SIKD) consistently improves several baseline models (including recent MaxStyle and SAMed) under intra-dataset evaluation, and significantly improves the cross-dataset generalization ability. The source code will be publicly available after acceptance.

Graphical abstract

Keywords

medical image segmentation / knowledge distillation / shape-intensity prior / deep neural network

Cite this article

Download citation ▾
Wenhui DONG, Bo DU, Yongchao XU. Shape-intensity knowledge distillation for robust medical image segmentation. Front. Comput. Sci., 2025, 19(9): 199705 https://doi.org/10.1007/s11704-024-40462-2

1 Introduction

Medical image segmentation aims to predict the semantic interpretation of each pixel, and is one of the crucial tasks in clinical image analysis. Accurate and reliable automatic segmentation is required to quickly provide clinicians with assistance and advice to improve the efficiency of clinical workflows.
In the past decades, due to the emergence of deep learning [19], medical image segmentation [1016] has witnessed substantial progress. The pioneer U-Net [10] presents skip-connection that effectively fuses the shallow texture information and deep semantic information, achieving very promising medical image segmentation results in most cases. Though many variants of the U-Net [1114] have then been proposed, U-Net still remains the popular de facto network for medical images, achieving relatively good results compared to many of its alternatives [17].
Inspired by the success of vision transformer (ViT) in image recognition [18] and semantic segmentation [19,20], some transformer-based networks have been proposed for medical image segmentation. Specifically, Chen et al. [21] propose TransUNet that replaces the encoder in U-Net with a hybrid of ResNet and ViT, while keeping the original decoder of U-Net for medical image segmentation. Based on the popular Swin-transformer [22], Swin-Unet [23], a UNet-like pure transformer architecture has been proposed. Both TransUNet and Swin-Unet achieve very encouraging results in medical image segmentation.
Most existing methods frame the medical image segmentation problem as an individually pixel-wise classification task, and ignore that medical objects usually have some specific shapes. This may lead to anatomically implausible segmentation results, in particular for images of unseen datasets. Recently, some methods incorporating the shape information [2431] to make the segmentation results more anatomically correct have been proposed. For instance, some methods [2427] learn extra shape-related information and fuse shape feature and segmentation feature. Some other methods [2831] leverage the encoder-decoder network on both predicted and ground-truth segmentation to align their latent non-linear representation, and refine the predicted segmentation through auto-encoder.
Incorporating shape information has been shown to improve the performance of medical image segmentation. Most existing methods [2438] force the network to learn and make use of the shape feature by predicting extra shape-related information or aligning non-linear representation of the segmentation result and ground truth. However, these methods often necessitate additional computations to fuse the learned shape features or a post-processing step to refine the segmentation. Furthermore, these methods fail to incorporate intensity information, which has been demonstrated to be valuable prior knowledge for medical image segmentation [39,40].
In this paper, different from the existing methods, we propose incorporating joint shape-intensity information to enhance the model’s segmentation performance of the model, while simultaneously improving the model’s generalization capabilities on unseen datasets. This is achieved by novelly leveraging knowledge distillation [41]. Specifically, we first train a segmentation network on the class-wise averaged training images without texture information. This segmentation network encodes the useful shape-intensity knowledge, and is regarded as the teacher network. We then train a student segmentation network with the same network architecture as the teacher network on the original training images. In addition to the classical segmentation loss, we also apply a distillation loss on the penultimate layer between the teacher and student network. In this way, the student network effectively learns shape-intensity information, leading to more plausible intra-dataset and cross-dataset segmentation results (see Fig.1). The student network is considered as the final segmentation network, which does not require any extra computation cost during inference, making it efficient in practical usage.
Fig.1 Comparison of averaged performance between the proposed SIKD and corresponding baseline models (including U-Net, SAUNet, PraNet, SANet, TransUnet, MaxStyle, SAMed, LM-Net and 2D D-LKA) under intra-dataset and cross-dataset evaluation on five medical image segmentation tasks of different modalities. The result is the average value across all baseline methods and the corresponding SIKD

Full size|PPT slide

The main contribution of this paper is: 1) We novelly propose to train a network (i.e., teacher network) on the class-wise averaged training images. This simple design explicitly extracts shape-intensity prior information; 2) We then leverage knowledge distillation to transfer the shape-intensity knowledge learned by the teacher model to the student network (i.e., final segmentation network), effectively incorporating shape-intensity prior information for medical image segmentation; 3) Extensive experiments on five medical image segmentation tasks of different modalities, demonstrate that the proposed Shape-Intensity Knowledge Distillation (SIKD) consistently/significantly improves the baseline models and has a better generalization ability to images of unseen datasets.
The rest of this paper is organized as follows. In Section 2, we review some related works on medical image segmentation incorporating shape information and knowledge distillation. We then detail the proposed method in Section 3, followed by extensive experimental results in Section 4 and some discussions in Section 5. Finally, we conclude and give some perspectives in Section 6.

2 Related work

Since the pioneer work of U-Net [10], there are many U-Net-based works on medical image segmentation [1114,17,42,43]. A detailed review of recent methods can be found in [44]. In this section, we mainly focus on shortly reviewing some related works on deep-learning-based medical image segmentation incorporating shape information and knowledge distillation. More details about the shape-aware medical image segmentation and knowledge distillation are referred to [4547].

2.1 Shape-aware medical image segmentation

Many methods [2438] are proposed to incorporate the shape prior information into segmentation network, and achieve more accurate segmentation results.
Some methods [2427] rely on the adoption of additional losses to learn extra shape-related targets, which are typically based on the object boundary. For instance, SAUNet [24] adds a shape stream (supervised by ground-truth boundary) to the texture stream of U-Net with a decoder replaced by the spatial and channel-wise attention paths. A gated convolution layer is then used to fuse shape and texture information for segmentation. In [25], the authors propose a loss based on the segment-level shape similarity that measures the curve similarity between each ground-truth boundary and corresponding predicted boundary segmentation, requiring no extra runtime during inference. AtrialJSQnet [26] predicts an additional distance map with respect to the boundary to incorporate spatial and shape information, without introducing additional inference time. In addition to classical segmentation and edge loss, SMU-Net [27] also adopts shape-aware loss characterizing the distance to the nearest object boundary and position-aware loss reflecting the distance to the object center.
Some methods [2833] resort to apply an auto-reconstruction network with encoder-decoder architecture on the predicted segmentation and the ground-truth annotation. The encoder projects segmentation result and annotation to latent non-linear representation, on which they align the prediction and ground-truth segmentation. The decoder is often used to refine the predicted segmentation [2933], which heavily relies on the ground-truth mask degradation strategy to train the auto-reconstruction network. Specifically, [29] and ACNN-Seg [28] share a similar idea of leveraging such an auto-reconstruction pipeline. The latter does not rely on the decoder to refine segmentation results, and thus needs no additional runtime during inference. Based on the similar auto-reconstruction network, [32] propose an anatomically-constrained rejection sampling procedure to augment the latent representation, and warp anatomically invalid segmentation toward the closest anatomically plausible one. Post-DAE [30] leverages denoising autoencoders to post-process the segmentation result to anatomically plausible segmentation. Chen et al. [31] further propose a hard example generation in the latent space of the segmentation network to generate diverse training images and corrupted segmentation results, reinforcing the performance of refined segmentation. LFB-Net [33] shares the same decoder between the segmentation network and auto-reconstruction network, and fuses the latent feature of both segmentation and auto-reconstruction network.
Adversarial learning is also used to integrate shape information [48,49], where the segmentation network is regarded as the generator. The core idea is to generate segmentation result that confuses the discriminator in discriminating the ground-truth segmentation and predicted segmentation. Jointly training the discriminator and segmentation network helps to yield more plausible segmentation results, without introducing any extra runtime during inference.
Some other methods [3437] focus on fusing/learning prior shape characteristics or segmenting objects of specific shapes (e.g., star and circular shape). For instance, The method in [34] fuses prior shape information about the distribution of semantic class over the image domain (statistically computed on the training set) into a segmentation network. Tilborghs et al. [35] directly learn shape parameters of an underlying shape model statistically computed on the set of training images, avoiding anatomically implausible segmentation. Mirikharaji and Hamarneh [36] introduce star shape regularized loss term to segment star shape skin lesion, which does not require additional inference time. Guo et al. [37] develop a globally optimal label fusion (GOLF) algorithm that frames the predicted segmentation and “nesting”/circular shape priors into a normalize cut framework, optimized by the proposed max-flow algorithm.
The existing methods that incorporate shape information achieve improved segmentation results. Most of them learn and fuse shape features guided by shape-related supervision, or rely on time-consuming auto-reconstruction to refine segmentation results. Some others devote to leveraging statistical shape models to take into account shape prior information. These methods often ignore the intensity prior information, which is also valuable for medical image segmentation [39,40]. Differently, the proposed SIKD incorporates joint shape-intensity into deep-learning-based medical image segmentation. Besides, we also novelly resort to knowledge distillation to transfer the shape-intensity knowledge to the segmentation network. This further boosts the segmentation performance and generalization ability to unseen images.

2.2 Knowledge distillation

Knowledge distillation (KD) [41] generally refers to transferring knowledge from a pre-trained teacher model to a student model, to improve the performance of the student network. Since the pioneer work [41], many methods [5055] have been proposed for efficient image classification. Hinton et al. [41] firstly proposes that an efficient compact model could be obtained by transferring the knowledge of the cumbersome model to the compact model. Ba and Caruana [50] also demonstrate through a series of experiments that lightweight networks could learn complex functions previously learned by deep networks. Tian et al. [51] suggest that previous KD methods ignore important structural knowledge of the teacher network, and propose to leverage contrastive learning to transfer structural knowledge. Wang et al. [52] discover that it is better to distillate the knowledge lying in the penultimate layer of the teacher network than mimicking the teacher’s soft logits. Besides, the authors also propose in [52] to adopt locality-sensitive hashing (LSH) to make the student focus more on mimicking the feature direction than feature magnitude. REFILLED [53] extends general knowledge distillation by applying both intra-image relational KD and cross-image relational KD.
With the increasing efficiency requirement and hardware device limitation, the idea of model compression via KD is gradually applied to other vision tasks, such as semantic segmentation [5661]. For instance, in addition to classical pixel-wise distillation loss, Liu et al. [56] propose pair-wise distillation and holistic distillation (via discriminating the segmentation of teacher and student network as real and fake, respectively) loss, to make the student produce better structured segmentation. Wang et al. [57] propose to compute the class-wise feature prototype (class-wise averaged feature) and then leverage knowledge distillation to transfer the intra-class feature variation of the cumbersome teacher model to the compact student model. Qin et al. [58] propose to distillate regional affinity between class-wise averaged feature prototype for efficient medical image segmentation. Shu et al. [59] improve KD-based segmentation by normalizing the activation map of each feature channel to a soft probability map before minimizing the Kullback–Leibler (KL) divergence between the teacher and student model. This makes the distillation process pay more attention to the most salient feature, which is valuable for segmentation. The adaptive perspective distillation [60] distills inter- and intra- distribution of cosine similarity between the adapted feature and adapted feature prototype. CIRKD [61] distills the cross-image relational knowledge to transfer global pixel relationships between images for segmentation.
KD is also widely used for cross-modal image analysis [6265]. For example, Gupta et al. [62] propose feature mimicking between teacher and student model (with the same network architecture) for transferring learned representation from a large labeled modality to a new unlabeled modality, enabling to learn rich representation from unlabeled modalities. In [63], the authors train the teacher model on concatenated multi-modal images, and leverage KD to transfer multi-modal knowledge to mono-modal segmentation network. Dou et al. [64] minimize KL-divergence of the confusion matrix of class-wise averaged soft logits between different modalities to achieve unpaired multi-modal segmentation via KD. Li et al. [65] propose an online mutual knowledge distillation for cross-modality medical image segmentation, where the segmentor on each modality explores the knowledge of another modality via mutual KD.
Most KD methods transfer knowledge between different networks or modalities. There are also some self-distillation methods [66,67] that transfer knowledge of deep layers to shallow layers of the same network applied on the same modality. Specifically, SAD [66] performs top-down and layer-wise activation-based attention distillation within the network itself, achieving effective lightweight lane detection. Zhang et al. [67] apply several shallow classifiers at different shallow layers of the same network, and make shallow classifiers mimic the deep classifier.
The existing KD methods are mainly developed for efficient image classification and semantic segmentation, cross-modal image analysis. Differently, we novelly leverage KD for incorporating shape-intensity knowledge for medical image segmentation. Both the teacher and student model have the same network architecture, trained on class-wise averaged images without texture information and original training images, respectively. The most related works [64,65] adopt KD to transfer common shape knowledge between different medical image modalities. The proposed SIKD focuses on extracting joint shape-intensity knowledge from single modality, and incorporating such prior information to achieve accurate and robust medical image segmentation of the underlying modality.

3 Proposed method

3.1 Motivation

Medical image segmentation has recently witnessed great progress thanks to the development of deep learning. Most methods frame the problem as individually pixel-wise classification, and adopt U-Net [10] or its various alternatives to learn feature representation for pixel-wise classification. Such classical learning-based segmentation scheme usually ignores the fact that the object of interest in medical images generally has a specific shape, and the intensity prior information is also useful for medical image segmentation. Indeed, it is usually easier to segment objects of relatively smooth appearance than textured objects from an image (see Fig.2 for an example). The smoothed object mainly contains specific shape and homogeneous intensity information other than texture information.
Fig.2 The pipeline of the proposed method. The teacher and student model have the same network architecture, trained on class-wise averaged training images and original images with segmentation loss, respectively. For the student model, we also apply the distillation loss on the penultimate layer between the teacher and student model to transfer the shape-intensity knowledge

Full size|PPT slide

To incorporate the shape-intensity information into the medical image segmentation, we propose to first extract the shape-intensity knowledge from class-wise averaged training images, then transfer such prior knowledge using knowledge distillation [41]. Specifically, for each training image f, we first compute the class-wise averaged image fm, which has only the shape-intensity information. We then feed fm to the teacher segmentation network to extract the shape-intensity information. Knowledge distillation is then adopted to transfer the prior knowledge to the student segmentation network, which has the same network architecture as the teacher network but with the original training images as inputs. In this way, the student model effectively acquires shape-intensity knowledge and is regarded as the final segmentation model. The proposed Shape-Intensity Knowledge Distillation (SIKD) pipeline is depicted in Fig.2.

3.2 Shape-intensity knowledge encoding

The object of interest in medical images is usually intensity inhomogeneous, and contains some texture information. On the other hand, objects in medical images often have a specific shape. Therefore, medical image segmentation can benefit from the shape and intensity information. To encode such shape-intensity information, we propose to apply a segmentation network on the class-wise averaged training images. More precisely, for a given image f, we calculate the class-wise averaged image fm by taking the mean value of the pixel values within each class. Formally, let Xk denote the set of pixels belonging to the kth class. For each pixel xXk, the class-wise averaged image fm(x) is given by:
fm(x)=xXkf(x)/|Xk|,
where || denotes the cardinality. It is noteworthy that when the input image is in RGB format, we calculate the class-wise averaged value separately for each channel. Such class-wise averaged image fm has two distinct characteristics: 1) It does not contain any texture information (shown in Fig.2); 2) Its pixel value distribution is consistent with that of the original image (shown in Fig.3). Thanks to these two characteristics, training a segmentation network on this class-wise averaged image enables the network to learn shape-intensity information. We regard such a trained segmentation network as the teacher network.
Fig.3 Distribution of pixel intensity within each class (e.g., RV cavity, Myocardium, and LV cavity) of the original images (a) and class-wise averaged images (b) on the ACDC training dataset [68].

Full size|PPT slide

We train the teacher network on the class-wise averaged training images with classical segmentation loss Lseg. Specifically, for the U-Net based SIKD, we adopt cross-entropy loss. For the other alternatives of SIKD, we adopt the same cross-entropy loss and Dice loss with the corresponding baseline models as the segmentation loss.

3.3 Shape-intensity knowledge distillation

As introduced in Section 2.2, KD is usually used to transfer knowledge from a cumbersome convolutional neural network (teacher) to a compact convolutional neural network (student) by aligning the representation of some layers of two networks. In this way, the compact student network is equipped with the powerful feature representation of the teacher network, which helps to improve the performance of the student network. Unlike these classical knowledge distillation frameworks, we propose to leverage knowledge distillation to transfer the shape-intensity information extracted by the teacher network described in Section 3.2. Specifically, the student network regarded as the final segmentation network has the same network architecture as the teacher network. We train the student network on the original training images. In addition to the classical segmentation loss function, we also adopt a distillation loss to align the feature of the penultimate layer of both the teacher and student network. Let ft and fs denote the feature of the penultimate layer (before the 1×1 segmentation layer) of the teacher and student network, respectively. The adopted KD loss Lkd is given by:
Lkd=MSE(ft,fs),
where MSE() is the Mean-Squared Error loss function. This distillation loss facilitates the shape-intensity knowledge transfer from the teacher segmentation network to the student segmentation network.
For the student network trained on the original training images, the whole training objective function L consists of the same segmentation loss as the teacher segmentation network and the KD loss defined in Eq. (2). Formally, L is given by:
L=Lseg+α×Lkd,
where α is a hyperparameter (set to 2 in all experiments) to control the contribution of segmentation loss and distillation loss term. It is noteworthy that the teacher segmentation network and the KD process are only involved in the training phase. Therefore, during the inference phase, the segmentation network does not require any extra runtime and memory cost.

4 Experiments

We conduct intra- and cross-dataset experiments on five different segmentation tasks of different medical imaging modalities to demonstrate the effectiveness of the proposed SIKD. The test set of intra-dataset serves as a kind of validation set to select the model. We mainly focus on cross-dataset segmentation performance using the selected model.

4.1 Datasets and evaluation protocols

Cardiac segmentation in MRI images Automated Cardiac Diagnosis Challenge (ACDC) [68] releases 100 annotated MRI volumes obtained from 100 different patients. We randomly divide the dataset into 7:1:2 for training, validation, and test, respectively. We also evaluate the corresponding models on the training set of M&Ms dataset [69] to evaluate the generalization ability of different methods. M&Ms releases 150 annotated training images from two different MRI vendors.
Multi-organ segmentation in CT images The Synapse multi-organ segmentation dataset [70] contains 30 abdominal CT scans, where 18 (resp. 12) cases are used for training (resp. testing). The goal is to segment 8 abdominal organs. We also conduct evaluations on the AbdomenCT-1K dataset [71] containing more 1000 CT scans from 12 medical centers to benchmark the generalization ability of different methods.
Polyp segmentation in colonoscopy images Following the experimental setup in [72], we conduct experiments on five public datasets for colorectal polyp segmentation, including Kvasir [73], CVC-ClinicDB [74], CVC-ColonDB [75], ETIS [76], and Endoscene [77]. We randomly split Kvasir and CVC-ClinicDB dataset into 4:1 for training and testing, respectively. The CVC-ColonDB, ETIS [76], and Endoscene [77] are used for assessing the generalization ability of different methods. Note that the Endoscene dataset is a combination of CVC-ClinicDB and CVC-300. We only evaluate the model on the test set of CVC-300 denoted as CVC-T.
Optic nerve head segmentation in color fundus images The REFUGE challenge dataset [78] contains 400 color fundus images (CFI), randomly divided into 4:1 for training and testing, respectively. To further demonstrate the generalization ability of the proposed SIKD, we also evaluate different models (trained on REFUGE dataset) on the public Drishti-GS [79] dataset consisting of 101 images and RIM-ONE-r3 [80] dataset containing 159 images.
Breast tumor segmentation in ultrasound images The BUSI dataset [81] consists of a total of 780 images, including 487 benign images, 210 malignant images, and 133 normal images. This dataset is randomly split into 4:1 for training and testing. We also conduct the cross-dataset evaluation on the dataset used in [82] and the dataset B [83] to benchmark the generalization ability of different methods. The dataset in [82] is composed of 42 breast ultrasound (BUS) images. The dataset B [83] consists of 163 images corresponding to 110 benign and 53 malignant lesion images.
Evaluation protocols We adopt three metrics: Dice score (Dice), Intersection over Union (IoU), and Hausdorff Distance (HD), which are widely used metrics in medical image segmentation to evaluate the proposed SIKD.

4.2 Implementation details

Since the major goal of this paper is not to obtain state-of-the-art results on each segmentation task, but to show that the proposed SIKD is an effective anatomy-aware segmentation method by incorporating shape-intensity information, which contributes to robust medical image segmentation. We simply adopt several open-sourced segmentation networks on each segmentation task, including the most widely used U-Net [10] for medical image segmentation, PraNet [72], SANet [84], TransUNet [21], MaxStyle [85], LM-Net [86], 2D D-LKA Net [87], and recent SAMed [88], which is based on the vision foundation model SAM [89]. We also implement the proposed SIKD on the baseline (BL) of SAUNet [24], one of the very few shape-aware medical image segmentation methods that release the source code. Note that both the teacher and student networks have exactly the same network architecture, except for the U-Net and TransUNet, where the skip-connection is discarded for the teacher segmentation network. This is because segmenting the class-wise averaged training images is a relatively simple task. Using the skip-connection may prevent the learning of shape-intensity information for the penultimate layer of the teacher model. All the experiments are conducted using the PyTorch framework on a workstation with a NVIDIA GeForce RTX 3090 GPU (24 GB).

4.3 Experimental results

Cardiac segmentation We first simply build the proposed SIKD using the de facto U-Net for medical image segmentation. Some qualitative results are illustrated in Fig.4. SIKD achieves anatomically correct segmentation results, and alleviates the problem of intra-class inconsistency for the baseline U-Net under both intra-dataset and cross-dataset settings. The quantitative comparison is given in Tab.1. SIKD outperforms the corresponding baseline model.
Fig.4 Some results of the proposed SIKD built upon the baseline U-Net on the cardiac segmentation (top two rows: intra- and cross-dataset) and some qualitative illustration of SIKD built on the baseline TransUNet for multi-organ segmentation (bottom two rows: intra- and cross-dataset)

Full size|PPT slide

Tab.1 Quantitative comparison between the baseline models and the proposed SIKD on ACDC dataset [68] and M&Ms dataset [69] for cardiac segmentation in MRI image. (resp. ) indicates the higher (resp. lower) the better
Method Intra-dataset: ACDC Cross-dataset: M&Ms
Average LV Myo RV Average LV Myo RV
DICE HD DICE HD DICE HD DICE HD DICE HD DICE HD DICE HD DICE HD
U-Net [10] 0.884 1.30 0.905 1.47 0.852 1.24 0.895 1.20 0.686 4.08 0.733 3.26 0.641 4.88 0.684 4.09
U-Net w/ SDM [90] 0.867 1.54 0.914 1.25 0.837 2.09 0.850 1.29 0.703 4.54 0.753 4.65 0.658 6.74 0.697 3.22
U-Net + SIKD 0.894 1.21 0.925 1.29 0.861 1.10 0.895 1.23 0.785 2.95 0.831 2.82 0.741 3.57 0.783 2.48
Baseline (BL) in [24] 0.870 2.49 0.913 2.33 0.837 3.62 0.860 1.52 0.729 7.77 0.796 9.73 0.662 7.53 0.728 6.05
SAUNet [24] 0.880 2.05 0.910 1.88 0.845 2.31 0.883 1.96 0.739 4.57 0.806 5.93 0.672 4.21 0.738 3.56
BL + SIKD 0.892 1.58 0.924 1.36 0.856 1.93 0.897 1.45 0.766 3.73 0.816 4.39 0.733 3.67 0.750 3.13
MaxStyle [85] 0.876 2.23 0.914 2.56 0.843 2.28 0.872 1.85 0.818 3.84 0.826 3.17 0.780 5.62 0.847 2.74
MaxStyle + SIKD 0.888 1.95 0.923 2.05 0.857 1.87 0.883 1.92 0.827 2.85 0.833 2.81 0.792 3.13 0.857 2.62
SAMed [88] 0.890 1.15 0.924 1.11 0.853 1.12 0.894 1.22 0.826 2.38 0.862 2.27 0.783 2.39 0.833 2.46
SAMed + SIKD 0.893 1.10 0.929 1.04 0.855 1.11 0.895 1.16 0.842 2.33 0.873 2.29 0.799 1.89 0.855 2.81
LM-Net [86] 0.881 1.53 0.922 1.21 0.852 1.44 0.867 1.93 0.742 3.17 0.828 3.11 0.713 3.97 0.684 2.43
LM-Net + SIKD 0.900 1.33 0.933 1.28 0.873 1.41 0.894 1.11 0.802 3.12 0.866 2.85 0.774 2.70 0.767 3.80
2D D-LKA Net [87] 0.898 2.06 0.929 1.69 0.875 1.09 0.889 3.39 0.824 2.77 0.881 2.03 0.778 1.84 0.813 4.43
2D D-LKA + SIKD 0.902 1.91 0.933 1.51 0.875 1.23 0.896 2.98 0.854 2.21 0.905 1.75 0.822 1.36 0.835 3.51
We then benchmark the generalization ability of different methods for automatic cardiac segmentation in MRI images. As depicted in Tab.1, when generalizing the models trained on the ACDC dataset to M&Ms dataset, SIKD built on U-Net boosts the performance of the baseline model by 9.9% Dice score and 1.13 mm HD. SIKD also shows improvements over SAUNet and U-Net w/ SDM [90] under this cross-dataset evaluation. Besides, building our SIKD upon recent MaxStyle [85], SAMed [88], LM-Net [86], and 2D D-LKA Net [87] with strong generalization capabilities further consistently enhances their cross-dataset segmentation performance. The quantitative benchmark on cardiac segmentation confirms that SIKD is effective in incorporating the shape-intensity information, thus enhancing the generalization ability of medical image segmentation.
Multi-organ segmentation For multi-organ segmentation in CT images, Some illustrative results are shown in Fig.4. Qualitatively, SIKD built on TransUNet [21] accurately segments different organs and preserves their shapes well. The quantitative benchmark on the Synapse multi-organ segmentation dataset is depicted in Tab.2. SIKD performs better than the corresponding baseline in terms of both Dice score and HD, implying that SIKD achieves good surface prediction and preserves the shapes better. Specifically, SIKD outperforms the baseline TransUNet by 2.69% Dice score and 8.04 mm HD. Compared with SwinUnet [23], SIKD based on TransUNet achieves an improvement of 1.01% Dice score and 1.89 mm HD. Implementing SIKD with MaxStyle [85] and SAMed [88] also consistently boosts the intra-dataset segmentation performance. The latest 2D D-LKA achieved state-of-the-art performance. Based on this approach, our method improves the Dice score by more than 1%.
Tab.2 Comparison on the Synapse multi-organ segmentation dataset [70] and the AbdomenCT-1K dataset [71]
Method Intra-dataset Cross-dataset
Synapse AbdomenCT-1K
DICE HD DICE HD
U-Net [10] 0.760 43.64 0.468 94.32
U-Net w/ SDM [90] 0.392 100.35 0.257 152.11
U-Net + SIKD 0.780 30.05 0.487 94.23
Baseline (BL) in [24] 0.782 32.47 0.556 92.61
SAUNet [24] 0.790 31.38 0.598 86.34
BL + SIKD 0.792 24.61 0.670 74.13
SwinUnet [23] 0.791 21.55
TransUNet [21] 0.775 27.70 0.695 75.97
TransUNet + SIKD 0.801 19.66 0.715 71.48
MaxStyle [85] 0.757 26.33 0.664 52.37
MaxStyle + SIKD 0.771 21.76 0.691 43.15
SAMed [88] 0.816 21.78 0.803 44.66
SAMed + SIKD 0.816 20.19 0.819 25.99
LM-Net [86] 0.793 23.64 0.727 50.85
LM-Net + SIKD 0.808 20.76 0.755 38.59
2D D-LKA Net [87] 0.833 18.96 0.797 35.56
2D D-LKA + SIKD 0.843 17.29 0.822 22.11
We also benchmark the generalization ability of different methods by conducting cross-dataset evaluation on AbdomenCT-1K dataset [71] for the models trained on Synapse multi-organ segmentation dataset [70]. As depicted in Tab.2, SIKD outperforms all the corresponding baseline models under cross-dataset evaluation. In particular, SIKD built on the baseline of SAUNet [24] improves the baseline model by 11.44% Dice score and 18.48 mm HD. Compared with SAUNet which only incorporates shape information, SIKD achieves 7.17% Dice score and 12.21 mm HD improvement. Building SIKD on SAMed [88] improves the baseline model by 1.6% Dice score and 18.67 mm HD. Based on the SOTA 2D D-LKA method, our approach achieves the best generalization results. This demonstrates that SIKD effectively incorporates the shape-intensity knowledge and generalizes well to images of unseen dataset.
Polyp segmentation Some qualitative polyp segmentation results are illustrated in Fig.5. The proposed SIKD based on U-Net can accurately segment the polyp. The quantitative comparison with the baseline models and some state-of-the-art methods is shown in Tab.3. SIKD outperforms all the other methods on Kvasir and CVC-ClinicDB dataset, on which the models are trained. In particular, compared with SAUNet which only incorporates the shape information, SIKD is more effective with an improvement ranging from 1.6% to 2.5%. SIKD built on SANet is comparable with recent LDNet [92] which relies on lesion-aware dynamic kernel and cross and self-attention modules. SIKD built on MaxStyle [85], SAMed [88], LM-Net [86], and 2D D-LKA Net [87] further boosts the segmentation performance of the corresponding baseline method. It is noteworthy that these improvements are achieved without any extra runtime and memory cost during inference compared with corresponding baseline methods.
Fig.5 Some results on the intra-dataset (left two columns) and cross-dataset (right two columns) of polyp segmentation, ONH segmentation, and breast tumor segmentation (from top to bottom). Green outline: segmentation by the baseline U-Net model; Blue outline: segmentation by SIKD built upon U-Net; Light blue area: ground-truth segmentation

Full size|PPT slide

Tab.3 Quantitative evaluation of polyp segmentation on Kvasir [73], CVC-ClinicDB [74], CVC-ColonDB [75], ETIS [76], and CVC-T [77] datasets
Method Intra-dataset Cross-dataset
Kvasir CVC-ClinicDB CVC-ColonDB ETIS CVC-T
IoU DICE IoU DICE IoU DICE IoU DICE IoU DICE
SFA [91] 0.611 0.723 0.607 0.700 0.347 0.469 0.217 0.297 0.329 0.467
U-Net++ [12] 0.743 0.821 0.729 0.794 0.410 0.483 0.344 0.401 0.624 0.707
PraNet [72] 0.840 0.898 0.849 0.899 0.640 0.709 0.567 0.628 0.797 0.871
SANet [84] 0.847 0.904 0.859 0.916 0.670 0.753 0.654 0.750 0.815 0.888
LDNet [92] 0.853 0.907 0.895 0.943 0.706 0.784 0.665 0.744
U-Net [10] 0.761 0.841 0.838 0.885 0.552 0.598 0.323 0.383 0.697 0.769
U-Net w/ SDM [90] 0.766 0.849 0.843 0.895 0.567 0.641 0.349 0.405 0.630 0.706
U-Net + SIKD 0.769 0.851 0.851 0.903 0.576 0.650 0.445 0.513 0.712 0.788
Baseline (BL) in [24] 0.810 0.867 0.829 0.886 0.606 0.679 0.448 0.510 0.780 0.849
SAUNet [24] 0.812 0.870 0.825 0.880 0.587 0.658 0.536 0.607 0.761 0.830
BL + SIKD 0.836 0.889 0.850 0.896 0.619 0.687 0.532 0.600 0.791 0.866
PraNet [72] 0.834 0.892 0.859 0.909 0.649 0.721 0.579 0.653 0.818 0.891
PraNet + SIKD 0.852 0.904 0.875 0.927 0.657 0.733 0.607 0.679 0.830 0.897
SANet [84] 0.845 0.903 0.861 0.916 0.679 0.762 0.642 0.741 0.785 0.870
SANet + SIKD 0.856 0.909 0.880 0.929 0.712 0.790 0.678 0.760 0.823 0.892
MaxStyle [85] 0.724 0.808 0.724 0.790 0.493 0.593 0.305 0.364 0.657 0.769
MaxStyle + SIKD 0.755 0.835 0.747 0.813 0.510 0.603 0.361 0.427 0.671 0.780
SAMed [88] 0.836 0.896 0.795 0.865 0.665 0.742 0.565 0.644 0.763 0.841
SAMed + SIKD 0.838 0.897 0.802 0.872 0.688 0.761 0.626 0.709 0.786 0.859
LM-Net [86] 0.827 0.893 0.817 0.872 0.653 0.734 0.565 0.645 0.752 0.835
LM-Net + SIKD 0.835 0.895 0.844 0.896 0.687 0.765 0.598 0.689 0.790 0.864
2D D-LKA Net [87] 0.833 0.890 0.823 0.878 0.664 0.742 0.523 0.599 0.771 0.832
2D D-LKA + SIKD 0.846 0.902 0.831 0.885 0.702 0.779 0.606 0.678 0.805 0.876
We also evaluate the proposed SIKD on the other three unseen datasets to assess the generalizability of SIKD for polyp segmentation. As depicted in Tab.3, using the U-Net as the baseline model, SIKD achieves 2.4%, 12.2%, and 1.5% IoU (resp., 5.2%, 13.0%, and 1.9% Dice score) on CVC-ColonDB, ETITS, CVC-T dataset, respectively. SIKD based on the baseline of SAUNet, PraNet, SANet, MaxStyle, SAMed, LM-Net [86], and 2D D-LKA Net [87] also achieves consistent improvement on these three unseen datasets. This demonstrates that the proposed SIKD generalize well to unseen datasets. In particular, SIKD with SANet performs better than the most recent LDNet [92] in generalizing to images of unseen datasets.
We observe that the performance improvement for segmenting polyp across domains is not as significant as that for the other segmentation tasks (see Fig.1). Therefore, we further perform statistical analysis on polyp segmentation by training 10 times the baseline U-Net and the proposed SIKD built upon U-Net. The statistical results is depicted in Fig.6. The histogram in Fig.6 shows that SIKD achieves higher average performance and more stable results than the baseline model.
Fig.6 Comparison on distribution of dice scores for polyp segmentation when training 10 times the baseline U-Net and the proposed SIKD on training images of Kvasir [73] and CVC-ClinicDB [74]

Full size|PPT slide

Optic nerve head segmentation We then conduct experiments on segmenting optic nerve head (ONH) in color fundus images. Some qualitative segmentation results are shown in Fig.5. Both the baseline model and the proposed SIKD accurately segment ONH on testing images of the REFUGE dataset, on whose training set the models are trained. Yet, the baseline model performs poorly on images with slightly different appearance. This demonstrates that the proposed SIKD can effectively incorporate the shape-intensity information, helping to yield accurate segmentation across domains.
The quantitative comparison between the baseline U-Net model and the proposed SIKD is depicted in Tab.4. On the REFUGE test set, SIKD slightly outperforms the baseline model. On the unseen RIM-ONE-r3 and Drishti-GS dataset, the proposed SIKD significantly improves the baseline model. Precisely, SIKD improves the baseline U-Net model by 23.5% (resp. 19.4%) IoU and 27.8% (resp. 18.7%) Dice score on the RIM-ONE-r3 (resp. Drishti-GS) dataset. Similar performance behavior is observed on SIKD built on the baseline of SAUNet [24]. It is noteworthy that though SAMed [88], LM-Net [86], and 2D D-LKA Net [87] already exhibits a powerful generalization ability, our SIKD can still boosts the cross-dataset segmentation performance.
Tab.4 Quantitative result on optic nerve head segmentation in fundus images. The models are trained on training set of REFUGE [78]
Method Intra-dataset Cross-dataset
REFUGE RIM-ONE-r3 Drishti-GS
IoU DICE IoU DICE IoU DICE
U-Net [10] 0.924 0.960 0.266 0.317 0.433 0.505
U-Net w/ SDM [90] 0.925 0.960 0.346 0.419 0.497 0.624
U-Net + SIKD 0.928 0.962 0.501 0.595 0.627 0.692
Baseline (BL) in [24] 0.884 0.938 0.696 0.812 0.782 0.874
SAUNet [24] 0.906 0.950 0.662 0.766 0.840 0.911
BL + SIKD 0.920 0.958 0.708 0.821 0.861 0.924
MaxStyle [85] 0.927 0.961 0.736 0.846 0.889 0.939
MaxStyle + SIKD 0.935 0.966 0.748 0.852 0.897 0.945
SAMed [88] 0.920 0.958 0.772 0.870 0.906 0.950
SAMed + SIKD 0.922 0.960 0.782 0.879 0.916 0.958
LM-Net [86] 0.909 0.952 0.730 0.841 0.933 0.965
LM-Net + SIKD 0.926 0.961 0.777 0.873 0.939 0.967
2D D-LKA Net [87] 0.897 0.946 0.744 0.851 0.930 0.963
2D D-LKA + SIKD 0.918 0.955 0.768 0.869 0.938 0.966
Breast tumor segmentation We then conduct experiments on breast tumor segmentation in ultrasound images. Some qualitative illustrations are given in Fig.5. Compared with the baseline U-Net, SIKD accurately segments the breast tumor under both intra-dataset and cross-dataset settings. The quantitative comparison between the baseline U-Net and SIKD is depicted in Tab.5. SIKD achieves significant improvement over the baseline U-Net. Specifically, on the BUSI dataset, SIKD improves the baseline U-Net by 3.1% IoU and 3.3% Dice score. SIKD also outperforms U-Net w/ SDM [90] by 2.3% IoU and 2.8% Dice score. Implementing SIKD with the baseline of SAUNet [24] improves the baseline by 8.7% IoU and 7.7% Dice score. Besides, the proposed SIKD also outperforms SAUNet by 2.4% IoU and 2.1% Dice score.
Tab.5 Quantitative evaluation of different methods for breast tumor segmentation on BUSI [81], IDFAHSTU [82], and dataset B [83]
Method Intra-dataset Cross-dataset
BUSI IDFAHSTU Dataset B
IoU DICE IoU DICE IoU DICE
U-Net [10] 0.580 0.676 0.597 0.694 0.397 0.461
U-Net w/ SDM [90] 0.588 0.681 0.626 0.732 0.411 0.483
U-Net + SIKD 0.611 0.709 0.646 0.755 0.449 0.521
BL in [24] 0.535 0.642 0.611 0.738 0.362 0.459
SAUNet [24] 0.598 0.698 0.696 0.805 0.513 0.615
BL + SIKD 0.622 0.719 0.717 0.824 0.540 0.643
MaxStyle [85] 0.603 0.701 0.755 0.851 0.529 0.631
MaxStyle + SIKD 0.612 0.707 0.767 0.861 0.561 0.661
SAMed [88] 0.672 0.762 0.780 0.871 0.684 0.778
SAMed + SIKD 0.686 0.772 0.793 0.879 0.697 0.790
LM-Net [86] 0.663 0.755 0.737 0.842 0.571 0.663
LM-Net + SIKD 0.684 0.773 0.764 0.862 0.633 0.719
2D D-LKA Net [87] 0.674 0.762 0.593 0.719 0.671 0.760
2D D-LKA + SIKD 0.691 0.775 0.655 0.780 0.694 0.781
We also evaluate the proposed SIKD on images from the unseen dataset used in [82] and the dataset B [83] to assess the generalizability of SIKD. As depicted in Tab.5, applying SIKD on the baseline of SAUNet [24], MaxStyle [85], SAMed [88], LM-Net [86], and 2D D-LKA Net [87] boosts the segmentation performance of the corresponding baseline. This also demonstrates that the proposed SIKD generalizes well to images of unseen dataset. It is noteworthy that the performance on the cross-dataset of [82] is better than the intra-dataset performance for breast tumor segmentation. This is probably because that the dataset in [82] only contains 42 ultrasound images, which are not as challenging as the ultrasound images in the BUSI dataset [81] and dataset B [83].

5 Discussion

Medical image segmentation plays a crucial role in clinical practice. With the advancement of deep learning, many approaches have achieved notable results. However, many deep-learning-based methods treat segmentation as a pixel-wise classification task, often overlooking the fact that targets in medical images typically possess specific shape-intensity prior information. Additionally, some studies [93,94] have demonstrated that deep neural networks tend to prioritize learning texture information over shape information. This bias can lead to anatomically implausible segmentation results, especially in cross-dataset segmentation, where domain shifts relative to the training images are present. Therefore, incorporating shape-intensity prior information is beneficial for both intra-dataset and cross-dataset segmentation.
Recently, some methods improve segmentation performance by incorporating shape information, making the segmentation results more reasonable. However, these methods either add edge constraints or further optimize the segmentation results, requiring additional computational resources. Different from existing methods, in this paper, we propose to incorporate joint shape-intensity prior information into the segmentation network. Specifically, we train a teacher network on the class-wise averaged training images to extract the shape-intensity information. We then employ KD to transfer the extracted shape-intensity information from the teacher network to the student network (i.e., the segmentation network).
The proposed SIKD relies on transferring the shape-intensity knowledge extracted by the teacher model. Therefore, we analyze the shape-intensity knowledge by visualizing the penultimate layer feature. As shown in Fig.7, compared with the baseline model, the feature extracted by the teacher model is smoother and has a more complete shape structure. The feature of the student model is very similar to the teacher model. This implies that the proposed SIKD effectively incorporates the shape-intensity knowledge learned by the teacher model into segmentation.
Fig.7 Visualization of the penultimate layer feature of baseline U-Net and the proposed SIKD built upon U-Net

Full size|PPT slide

As previously mentioned, our method does not require extra runtime/memory cost during inference. We also analyze the training time requirements. In comparison to the baseline model, our proposed SIKD involves training an additional teacher model on class-wise averaged images to extract shape-intensity prior information. This information is subsequently transferred to the student model via KD. During the training of the student network, it is necessary to forward the teacher network to compute the KD loss. As shown in Tab.6, during training, our method’s FLOPs are nearly twice as high as the U-Net (as mentioned in Section 4.2 of the manuscript, the skip connections of the U-Net used as the teacher network have been removed, thereby reducing the complexity of the teacher model), and GPU usage is also slightly higher than U-Net. However, during testing, both our method and the U-Net have the same FLOPs and GPU usage, incurring no additional overhead.
Tab.6 Comparison of computational requirements based on U-Net [10] architecture. Calculations for training are performed with a batch size of 16 and image dimensions of 256 × 256 × 1, while testing is conducted with a batch size of 1 and the same image size
Method Training Testing
GFLOPs/G GPU/G GFLOPs/G GPU/G
U-Net [10] 65.46 9.61 65.46 0.25
U-Net + SIKD 121.21 10.47 65.46 0.25
We then conduct two types of ablation studies on the task of cardiac segmentation and optic nerve head segmentation.
Ablation study on the effect of transferring the shape-intensity knowledge A straightforward alternative is to train the teacher model using the original training images. Therefore, we train both the teacher and student U-Net model using the same setting. The only difference with the baseline model is that we also adopt the distillation loss when training the student model. As depicted in Tab.7, such a trivial alternative may occasionally bring some intra-dataset improvement, but not as significant as the proposed SIKD. This demonstrates that the performance improvement of SIKD is mainly brought by the proposed shape-intensity knowledge transferring via knowledge distillation, not the trivial knowledge distillation. We also conduct experiments by training the teacher network on the annotated label maps. As depicted in Tab.7, the variant of SIKD by transferring only the shape knowledge directly extracted from the label map is also beneficial for improving the segmentation performance and generalization ability. Yet, since the intensity of the label map is quite different from the original image in distribution, this variant of SIKD is not as effective as SIKD that distills the shape-intensity knowledge from the teacher model trained on class-wise averaged training images.
Tab.7 Evaluation of SIKD on cardiac and optic nerve head segmentation with different inputs for the teacher network
Input Cardiac segmentation in MRI images Optic nerve head segmentation in color fundus images
ACDC M&Ms REFUGE RIM-ONE-r3 Drishti-GS
Dice HD Dice HD IoU DICE IoU DICE IoU DICE
Baseline U-Net [10] 0.884 1.30 0.686 4.08 0.924 0.960 0.266 0.317 0.433 0.505
Original image 0.882 1.26 0.693 4.09 0.927 0.961 0.371 0.446 0.571 0.631
Annotated label map 0.893 1.16 0.723 3.69 0.922 0.961 0.383 0.452 0.552 0.607
Class-wise mean average 0.894 1.21 0.785 2.95 0.928 0.962 0.501 0.595 0.627 0.692
Ablation study on the loss weight α The only hyper-parameter for the proposed SIKD is the loss weight α involved in Eq. (3). We conduct ablation study on this hyper-parameter by setting α to 0.5, 1.0, 2.0, 3.0, 4.0, and 5.0, respectively. The corresponding results are depicted in Fig.8. Using different α slightly affects the intra-dataset performance. The generalization ability changes more significantly. SIKD with different settings for α generally performs much better than the baseline model (equivalent to set α=0), further proving the effectiveness of SIKD. Setting α=2 gives the best Dice score on cardiac segmentation and performs relatively well on optic nerve head segmentation. Therefore, we set α to 2 for the proposed SIKD in all experiments.
Fig.8 Evaluation of SIKD with different settings for α in Eq. (3). Setting α = 0 is equivalent to the baseline model. (a) Effect of using different α for cardiac segmentation; (b) effect of using different α for optic nerve head segmentation

Full size|PPT slide

6 Conclusion

In this paper, we propose a novel joint shape-intensity KD method for deep-learning-based medical image segmentation. We leverage KD to transfer the shape-intensity information extracted by the teacher network trained on class-wise averaged images. In this way, the student network with the same network architecture as the teacher model effectively learns shape-intensity information for medical image segmentation. Extensive experiments on five medical image segmentation tasks of different modalities demonstrate that the proposed SIKD achieves consistent/significant improvements over the baseline methods and some state-of-the-art methods. The proposed SIKD can be applied to most popular segmentation network and bring performance improvement without any additional computation effort during inference. In the future, we would like to explore other feature layers for distillation and other distillation loss functions, and apply the proposed SIKD to 3D medical image segmentation task.

Wenhui Dong received the MS degree from the School of Software Engineering, Wuhan University, China in 2020. He is currently pursuing the PhD degree in the School of Computer Science, Wuhan University, China. His main research interests include image segmentation, medical image analysis, and video object segmentation

Bo Du received the PhD degree in photogrammetry and remote sensing from the State Key Lab of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, China in 2010. He is currently a professor and the dean of School of Computer Science. He is also the director of the National Engineering Research Center for Multimedia Software, Wuhan University, China. His major research interests include machine leanring, computer vision, and image processing. He has more than 80 journal papers published in IEEE TPAMI/TIP/TCYB/TGRS, and IJCV. He serves as associate editor of Neural Networks, Pattern Recognition, and Neurocomputing. He won the Highly Cited researcher (2019/2020/2021/2022) by the Web of Science Group. He also won IEEE Geoscience and Remote Sensing Society 2020 Transactions Prize Paper Award, and IJCAI Distinguished Paper Prize. He regularly serves as senior PC member of IJCAI and AAAI

Yongchao Xu received the master degree in electronics and signal processing at Université Paris Sud, France in 2010 and the PhD degree in image processing at Université Paris Est, France in 2013. He is currently a professor with the School of Computer Science, Wuhan University, China. His research interests include image segmentation, medical image analysis, and cross-domain generalization for deep learning. He has published more than 40 scientific papers, such as IEEE TPAMI, IJCV, IEEE TIP, CVPR, ICCV, ECCV, and MICCAI. He serves as associate editor of Pattern Recognition, Image and Vision Computing, and young associate editor of Frontiers of Computer Science

References

[1]
Wang H, Dong L, Sun M . Local feature aggregation algorithm based on graph convolutional network. Frontiers of Computer Science, 2022, 16( 3): 163309
[2]
Wang T, Li J, Wu H N, Li C, Snoussi H, Wu Y . ResLNet: deep residual LSTM network with longer input for action recognition. Frontiers of Computer Science, 2022, 16( 6): 166334
[3]
Xu H, Chen Z, Zhang Y, Geng X, Mi S, Yang Z . Weakly supervised temporal action localization with proxy metric modeling. Frontiers of Computer Science, 2023, 17( 2): 172309
[4]
Zhang Y, Wang Z, Zhou J, Mi S . Person video alignment with human pose registration. Frontiers of Computer Science, 2023, 17( 4): 174324
[5]
Tan S, Zhang L, Shu X, Wang Z . A feature-wise attention module based on the difference with surrounding features for convolutional neural networks. Frontiers of Computer Science, 2023, 17( 6): 176338
[6]
Guo M, Sheng H, Zhang Z, Huang Y, Chen X, Wang C, Zhang J . CW-YOLO: joint learning for mask wearing detection in low-light conditions. Frontiers of Computer Science, 2023, 17( 6): 176710
[7]
Wu Z, Gan Y, Xu T, Wang F . Graph-Segmenter: graph transformer with boundary-aware attention for semantic segmentation. Frontiers of Computer Science, 2024, 18( 5): 185327
[8]
Ruan H, Song H, Liu B, Cheng Y, Liu Q . Intellectual property protection for deep semantic segmentation models. Frontiers of Computer Science, 2023, 17( 1): 171306
[9]
Ji Z, Ni J, Liu X, Pang Y . Teachers cooperation: team-knowledge distillation for multiple cross-domain few-shot learning. Frontiers of Computer Science, 2023, 17( 2): 172312
[10]
Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. In: Proceedings of the 18th International Conference on Medical Image Computing and Computer Assisted Intervention. 2015, 234−241
[11]
Çiçek O, Abdulkadir A, Lienkamp S S, Brox T, Ronneberger O. 3D U-Net: learning dense volumetric segmentation from sparse annotation. In: Proceedings of the 19th International Conference on Medical Image Computing and Computer Assisted Intervention. 2016, 424−432
[12]
Zhou Z, Siddiquee M M R, Tajbakhsh N, Liang J . UNet++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Transactions on Medical Imaging, 2020, 39( 6): 1856–1867
[13]
Oktay O, Schlemper J, Le Folgoc L, Lee M, Heinrich M, Misawa K, Mori K, McDonagh S, Hammerla N Y, Kainz B, Glocker B, Rueckert D. Attention U-Net: learning where to look for the pancreas. In: Proceedings of the 1st Conference on Medical Imaging with Deep Learning. 2018
[14]
Isensee F, Jaeger P F, Kohl S A A, Petersen J, Maier-Hein K H . nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 2021, 18( 2): 203–211
[15]
Shi T, Boutry N, Xu Y, Géraud T . Local intensity order transformation for robust curvilinear object segmentation. IEEE Transactions on Image Processing, 2022, 31: 2557–2569
[16]
Billot B, Greve D N, Puonti O, Thielscher A, Van Leemput K, Fischl B, Dalca A V, Iglesias J E, ADNI . SynthSeg: segmentation of brain MRI scans of any contrast and resolution without retraining. Medical Image Analysis, 2023, 86: 102789
[17]
Gut D, Tabor Z, Szymkowski M, Rozynek M, Kucybała I, Wojciechowski W. Benchmarking of deep architectures for segmentation of medical images. IEEE Transactions on Medical Imaging, 41(11): 3231−3241
[18]
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N. An image is worth 16x16 words: transformers for image recognition at scale. In: Proceedings of the 9th International Conference on Learning Representations. 2021
[19]
Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, Xiang T, Torr P H S, Zhang L. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 6877−6886
[20]
Xie E, Wang W, Yu Z, Anandkumar A, Alvarez J M, Luo P. SegFormer: simple and efficient design for semantic segmentation with transformers. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 924
[21]
Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, Lu L, Yuille A L, Zhou Y. TransUNet: transformers make strong encoders for medical image segmentation. 2021, arXiv preprint arXiv: 2102.04306
[22]
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B. Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021, 9992−10002
[23]
Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M. Swin-Unet: Unet-like pure transformer for medical image segmentation. In: Proceedings of the European Conference on Computer Vision. 2023, 205−218
[24]
Sun J, Darbehani F, Zaidi M, Wang B. SAUNet: shape attentive U-Net for interpretable medical image segmentation. In: Proceedings of the 23rd International Conference on Medical Image Computing and Computer Assisted Intervention. 2020, 797−806
[25]
Yan Z, Yang X, Cheng K T . Enabling a single deep learning model for accurate gland instance segmentation: a shape-aware adversarial learning framework. IEEE Transactions on Medical Imaging, 2020, 39( 6): 2176–2189
[26]
Li L, Zimmer V A, Schnabel J A, Zhuang X . AtrialJSQnet: a new framework for joint segmentation and quantification of left atrium and scars incorporating spatial and shape information. Medical Image Analysis, 2022, 76: 102303
[27]
Ning Z, Zhong S, Feng Q, Chen W, Zhang Y . SMU-Net: saliency-guided morphology-aware U-Net for breast lesion segmentation in ultrasound image. IEEE Transactions on Medical Imaging, 2022, 41( 2): 476–490
[28]
Oktay O, Ferrante E, Kamnitsas K, Heinrich M, Bai W, Caballero J, Cook S A, de Marvao A, Dawes T, O‘Regan D P, Kainz B, Glocker B, Rueckert D . Anatomically constrained neural networks (ACNNs): application to cardiac image enhancement and segmentation. IEEE Transactions on Medical Imaging, 2018, 37( 2): 384–395
[29]
Ravishankar H, Venkataramani R, Thiruvenkadam S, Sudhakar P, Vaidya V. Learning and incorporating shape models for semantic segmentation. In: Proceedings of the 20th International Conference on Medical Image Computing and Computer Assisted Intervention. 2017, 203−211
[30]
Larrazabal A J, Martínez C, Glocker B, Ferrante E . Post-DAE: anatomically plausible segmentation via post-processing with denoising autoencoders. IEEE Transactions on Medical Imaging, 2020, 39( 12): 3813–3820
[31]
Chen C, Hammernik K, Ouyang C, Qin C, Bai W, Rueckert D. Cooperative training and latent space data augmentation for robust medical image segmentation. In: Proceedings of the 24th International Conference on Medical Image Computing and Computer Assisted Intervention. 2021, 149−159
[32]
Painchaud N, Skandarani Y, Judge T, Bernard O, Lalande A, Jodoin P M . Cardiac segmentation with strong anatomical guarantees. IEEE Transactions on Medical Imaging, 2020, 39( 11): 3703–3713
[33]
Girum K B, Crehange G, Lalande A . Learning with context feedback loop for robust medical image segmentation. IEEE Transactions on Medical Imaging, 2021, 40( 6): 1542–1554
[34]
Zotti C, Luo Z, Lalande A, Jodoin P M . Convolutional neural network with shape prior applied to cardiac MRI segmentation. IEEE Journal of Biomedical and Health Informatics, 2019, 23( 3): 1119–1128
[35]
Tilborghs S, Bogaert J, Maes F . Shape constrained CNN for segmentation guided prediction of myocardial shape and pose parameters in cardiac MRI. Medical Image Analysis, 2022, 81: 102533
[36]
Mirikharaji Z, Hamarneh G. Star shape prior in fully convolutional networks for skin lesion segmentation. In: Proceedings of the 21st International Conference on Medical Image Computing and Computer Assisted Intervention. 2018, 737−745
[37]
Guo F, Ng M, Kuling G, Wright G . Cardiac MRI segmentation with sparse annotations: ensembling deep learning uncertainty and shape priors. Medical Image Analysis, 2022, 81: 102532
[38]
Wei H, Ma J, Zhou Y, Xue W, Ni D . Co-learning of appearance and shape for precise ejection fraction estimation from echocardiographic sequences. Medical Image Analysis, 2023, 84: 102686
[39]
Yang J, Duncan J S . 3D image segmentation of deformable objects with joint shape-intensity prior models using level sets. Medical Image Analysis, 2004, 8( 3): 285–294
[40]
Wang J, Cheng Y, Guo C, Wang Y, Tamura S . Shape-intensity prior level set combining probabilistic atlas and probability map constrains for automatic liver segmentation from abdominal CT images. International Journal of Computer Assisted Radiology and Surgery, 2016, 11( 5): 817–826
[41]
Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. In: Proceedings of the NeurIPS 2014 Deep Learning Workshop. 2014
[42]
Xiang T, Zhang C, Liu D, Song Y, Huang H, Cai W. BiO-Net: learning recurrent Bi-directional connections for encoder-decoder architecture. In: Proceedings of the 23rd International Conference on Medical Image Computing and Computer Assisted Intervention. 2020, 74−84
[43]
Feng S, Zhao H, Shi F, Cheng X, Wang M, Ma Y, Xiang D, Zhu W, Chen X . CPFNet: context pyramid fusion network for medical image segmentation. IEEE Transactions on Medical Imaging, 2020, 39( 10): 3008–3018
[44]
Tajbakhsh N, Jeyaseelan L, Li Q, Chiang J N, Wu Z, Ding X . Embracing imperfect datasets: a review of deep learning solutions for medical image segmentation. Medical Image Analysis, 2020, 63: 101693
[45]
Xie X, Niu J, Liu X, Chen Z, Tang S, Yu S . A survey on incorporating domain knowledge into deep learning for medical image analysis. Medical Image Analysis, 2021, 69: 101985
[46]
Wang L, Yoon K J . Knowledge distillation and student-teacher learning for visual intelligence: a review and new outlooks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44( 6): 3048–3068
[47]
Gou J, Yu B, Maybank S J, Tao D . Knowledge distillation: a survey. International Journal of Computer Vision, 2021, 129( 6): 1789–1819
[48]
Yi X, Walia E, Babyn P . Generative adversarial network in medical imaging: a review. Medical Image Analysis, 2019, 58: 101552
[49]
Jafari M, Francis S, Garibaldi J M, Chen X . LMISA: a lightweight multi-modality image segmentation network via domain adaptation using gradient magnitude and shape constraint. Medical Image Analysis, 2022, 81: 102536
[50]
Ba J, Caruana R. Do deep nets really need to be deep? In: Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014, 2654−2662
[51]
Tian Y, Krishnan D, Isola P. Contrastive representation distillation. In: Proceedings of the 8th International Conference on Learning Representations. 2020
[52]
Wang G H, Ge Y, Wu J . Distilling knowledge by mimicking features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44( 11): 8183–8195
[53]
Ye H J, Lu S, Zhan D C . Generalized knowledge distillation via relationship matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45( 2): 1817–1834
[54]
Zagoruyko S, Komodakis N. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In: Proc. of International Conference on Learning Representations. 2017
[55]
Ge S, Liu B, Wang P, Li Y, Zeng D . Learning privacy-preserving student networks via discriminative-generative distillation. IEEE Transactions on Image Processing, 2023, 32: 116–127
[56]
Liu Y, Shu C, Wang J, Shen C . Structured knowledge distillation for dense prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45( 6): 7035–7049
[57]
Wang Y, Zhou W, Jiang T, Bai X, Xu Y. Intra-class feature variation distillation for semantic segmentation. In: Proceedings of the 16th European Conference on Computer Vision. 2020, 346−362
[58]
Qin D, Bu J J, Liu Z, Shen X, Zhou S, Gu J J, Wang Z H, Wu L, Dai H F . Efficient medical image segmentation based on knowledge distillation. IEEE Transactions on Medical Imaging, 2021, 40( 12): 3820–3831
[59]
Shu C, Liu Y, Gao J, Yan Z, Shen C. Channel-wise knowledge distillation for dense prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021, 5291−5300
[60]
Tian Z, Chen P, Lai X, Jiang L, Liu S, Zhao H, Yu B, Yang M C, Jia J . Adaptive perspective distillation for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45( 2): 1372–1387
[61]
Yang C, Zhou H, An Z, Jiang X, Xu Y, Zhang Q. Cross-image relational knowledge distillation for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 12309−12318
[62]
Gupta S, Hoffman J, Malik J. Cross modal distillation for supervision transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, 2827−2836
[63]
Hu M, Maillard M, Zhang Y, Ciceri T, La Barbera G, Bloch I, Gori P. Knowledge distillation from multi-modal to mono-modal segmentation networks. In: Proceedings of the 23rd International Conference on Medical Image Computing and Computer Assisted Intervention. 2020, 772−781
[64]
Dou Q, Liu Q, Heng P A, Glocker B . Unpaired multi-modal segmentation via knowledge distillation. IEEE Transactions on Medical Imaging, 2020, 39( 7): 2415–2425
[65]
Li K, Yu L, Wang S, Heng P A. Towards cross-modality medical image segmentation with online mutual knowledge distillation. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 775−783
[66]
Hou Y, Ma Z, Liu C, Loy C C. Learning lightweight lane detection CNNs by self attention distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019, 1013−1021
[67]
Zhang L, Bao C, Ma K . Self-distillation: towards efficient and compact neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44( 8): 4388–4403
[68]
Bernard O, Lalande A, Zotti C, Cervenansky F, Yang X, Heng P A, Cetin I, Lekadir K, Camara O, Gonzalez Ballester M A, Sanroma G, Napel S, Petersen S, Tziritas G, Grinias E, Khened M, Kollerathu V A, Krishnamurthi G, Rohe M M, Pennec X, Sermesant M, Isensee F, Jäger P, Maier-Hein K H, Full P M, Wolf I, Engelhardt S, Baumgartner C F, Koch L M, Wolterink J M, Išgum I, Jang Y, Hong Y, Patravali J, Jain S, Humbert O, Jodoin P M. Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE Transactions on Medical Imaging, 2018, 37(11): 2514−2525
[69]
Campello V M, Gkontra P, Izquierdo C, Martín-Isla C, Sojoudi A, Full P M, Maier-Hein K, Zhang Y, He Z, Ma J, Parreno M, Albiol A, Kong F, Shadden S C, Acero J C, Sundaresan V, Saber M, Elattar M, Li H, Menze B, Khader F, Haarburger C, Scannell C M, Veta M, Carscadden A, Punithakumar K, Liu X, Tsaftaris S A, Huang X, Yang X, Li L, Zhuang X, Vilades D, Descalzo M L, Guala A, Mura L L, Friedrich M G, Garg R, Lebel J, Henriques F, Karakas M, Çavuş E, Petersen S E, Escalera S, Seguí S, Rodríguez-Palomares J F, Lekadir K . Multi-Centre, multi-vendor and multi-disease cardiac segmentation: the M&Ms challenge. IEEE Transactions on Medical Imaging, 2021, 40( 12): 3543–3554
[70]
Landman B, Xu Z, Igelsias J, Styner M, Langerak T, Klein A. MICCAI multi-atlas labeling beyond the cranial vault-workshop and challenge. In: Proceedings of the MICCAI Multi-Atlas Labeling Beyond Cranial Vault-Workshop Challenge. 2015
[71]
Ma J, Zhang Y, Gu S, Zhu C, Ge C, Zhang Y, An X, Wang C, Wang Q, Liu X, Cao S, Zhang Q, Liu S, Wang Y, Li Y, He J, Yang X. AbdomenCT-1K: is abdominal organ segmentation a solved problem? IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(10): 6695−6714
[72]
Fan D P, Ji G P, Zhou T, Chen G, Fu H, Shen J, Shao L. PraNet: parallel reverse attention network for polyp segmentation. In: Proceedings of the 23rd International Conference on Medical Image Computing and Computer Assisted Intervention. 2020, 263−273
[73]
Jha D, Smedsrud P H, Riegler M A, Halvorsen P, de Lange T, Johansen D, Johansen H D. Kvasir-SEG: a segmented polyp dataset. In: Proceedings of the 26th International Conference on Multimedia Modeling. 2020, 451−462
[74]
Bernal J, Sánchez F J, Fernández-Esparrach G, Gil D, Rodríguez C, Vilariño F . WM-DOVA maps for accurate polyp highlighting in colonoscopy: validation vs. saliency maps from physicians. Computerized Medical Imaging and Graphics, 2015, 43: 99–111
[75]
Tajbakhsh N, Gurudu S R, Liang J . Automated polyp detection in colonoscopy videos using shape and context information. IEEE Transactions on Medical Imaging, 2016, 35( 2): 630–644
[76]
Silva J, Histace A, Romain O, Dray X, Granado B . Toward embedded detection of polyps in WCE images for early diagnosis of colorectal cancer. International Journal of Computer Assisted Radiology and Surgery, 2014, 9( 2): 283–293
[77]
Vázquez D, Bernal J, Sánchez F J, Fernández-Esparrach G, López A M, Romero A, Drozdzal M, Courville A . A benchmark for endoluminal scene segmentation of colonoscopy images. Journal of Healthcare Engineering, 2017, 2017: 4037190
[78]
Orlando J I, Fu H, Barbosa Breda J, van Keer K, Bathula D R, Diaz-Pinto A, Fang R, Heng P A, Kim J, Lee J, Lee J, Li X, Liu P, Lu S, Murugesan B, Naranjo V, Phaye S S R, Shankaranarayana S M, Sikka A, Son J, van den Hengel A, Wang S, Wu J, Wu Z, Xu G, Xu Y, Yin P, Li F, Zhang X, Xu Y, Bogunović H . REFUGE challenge: a unified framework for evaluating automated methods for glaucoma assessment from fundus photographs. Medical Image Analysis, 2020, 59: 101570
[79]
Sivaswamy J, Krishnadas S R, Chakravarty A, Joshi G D, Ujjwal , Syed T A . A comprehensive retinal image dataset for the assessment of glaucoma from the optic nerve head analysis. JSM Biomedical Imaging Data Papers, 2015, 2( 1): 1004
[80]
Fumero F, Alayon S, Sanchez J L, Sigut J, Gonzalez-Hernandez M. RIM-ONE: an open retinal image database for optic nerve evaluation. In: Proceedings of the 24th International Symposium on Computer-Based Medical Systems. 2011, 1−6
[81]
Al-Dhabyani W, Gomaa M, Khaled H, Fahmy A . Dataset of breast ultrasound images. Data in Brief, 2020, 28: 104863
[82]
Zhuang Z, Li N, Joseph Raj A N, Mahesh V G V, Qiu S . An RDAU-NET model for lesion segmentation in breast ultrasound images. PLoS One, 2019, 14( 8): e0221535
[83]
Yap M H, Pons G, Martí J, Ganau S, Sentís M, Zwiggelaar R, Davison A K, Martí R . Automated breast ultrasound lesions detection using convolutional neural networks. IEEE Journal of Biomedical and Health Informatics, 2018, 22( 4): 1218–1226
[84]
Wei J, Hu Y, Zhang R, Li Z, Zhou S K, Cui S. Shallow attention network for polyp segmentation. In: Proceedings of the 24th International Conference on Medical Image Computing and Computer Assisted Intervention. 2021, 699−708
[85]
Chen C, Li Z, Ouyang C, Sinclair M, Bai W, Rueckert D. MaxStyle: adversarial style composition for robust medical image segmentation. In: Proceedings of the 25th International Conference on Medical Image Computing and Computer Assisted Intervention. 2022, 151−161
[86]
Lu Z, She C, Wang W, Huang Q . LM-Net: a light-weight and multi-scale network for medical image segmentation. Computers in Biology and Medicine, 2024, 168: 107717
[87]
Azad R, Niggemeier L, Hüttemann M, Kazerouni A, Aghdam E K, Velichko Y, Bagci U, Merhof D. Beyond self-attention: deformable large kernel attention for medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2024, 1276−1286
[88]
Zhang K, Liu D. Customized segment anything model for medical image segmentation. 2023, arXiv preprint arXiv: 2304.13785
[89]
Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L, Xiao T, Whitehead S, Berg A C, Lo W Y, Dollár P, Girshick R. Segment anything. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2023, 3992−4003
[90]
Xue Y, Tang H, Qiao Z, Gong G, Yin Y, Qian Z, Huang C, Fan W, Huang X. Shape-aware organ segmentation by predicting signed distance maps. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 12565−12572
[91]
Fang Y, Chen C, Yuan Y, Tong K Y. Selective feature aggregation network with area-boundary constraints for polyp segmentation. In: Proceedings of the 22nd International Conference on Medical Image Computing and Computer Assisted Intervention. 2019, 302−310
[92]
Zhang R, Lai P, Wan X, Fan D J, Gao F, Wu X J, Li G. Lesion-aware dynamic kernel for polyp segmentation. In: Proceedings of the 25th International Conference on Medical Image Computing and Computer Assisted Intervention. 2022, 99−109
[93]
Geirhos R, Rubisch P, Michaelis C, Bethge M, Wichmann F A, Brendel W. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In: Proceedings of the 7th International Conference on Learning Representations. 2019
[94]
Li Y, Yu Q, Tan M, Mei J, Tang P, Shen W, Yuille A L, Xie C. Shape-texture debiased neural network training. In: Proceedings of the 9th International Conference on Learning Representations. 2021

Acknowledgements

This work was supported in part by the National Key Research and Development Program of China (Grant No. 2023YFC2705700), the National Natural Science Foundation of China (Grant Nos. 62222112, 62225113, and 62176186), the Innovative Research Group Project of Hubei Province (Grant No. 2024AFA017), and the CAAI Huawei MindSpore Open Fund.

Competing interests

The authors declare that they have no competing interests or financial conflicts to disclose.

RIGHTS & PERMISSIONS

2025 Higher Education Press
AI Summary AI Mindmap
PDF(2419 KB)

Supplementary files

Highlights (338 KB)

279

Accesses

0

Citations

Detail

Sections
Recommended

/