1. Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China
2. Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
chenbing_rjh@163.com
sjchen@stn.sh.cn
jieyang@sjtu.edu.cn
Show less
History+
Received
Accepted
Published
2024-03-02
2024-06-18
2024-12-15
Issue Date
Revised Date
2024-09-18
PDF
(6696KB)
Abstract
Chromosome karyotyping is a critical way to diagnose various hematological malignancies and genetic diseases, of which chromosome detection in raw metaphase cell images is the most critical and challenging step. In this work, focusing on the joint optimization of chromosome localization and classification, we propose ChromTR to accurately detect and classify 24 classes of chromosomes in raw metaphase cell images. ChromTR incorporates semantic feature learning and class distribution learning into a unified DETR-based detection framework. Specifically, we first propose a Semantic Feature Learning Network (SFLN) for semantic feature extraction and chromosome foreground region segmentation with object-wise supervision. Next, we construct a Semantic-Aware Transformer (SAT) with two parallel encoders and a Semantic-Aware decoder to integrate global visual and semantic features. To provide a prediction with a precise chromosome number and category distribution, a Category Distribution Reasoning Module (CDRM) is built for foreground–background objects and chromosome class distribution reasoning. We evaluate ChromTR on 1404 newly collected R-band metaphase images and the public G-band dataset AutoKary2022. Our proposed ChromTR outperforms all previous chromosome detection methods with an average precision of 92.56% in R-band chromosome detection, surpassing the baseline method by 3.02%. In a clinical test, ChromTR is also confident in tackling normal and numerically abnormal karyotypes. When extended to the chromosome enumeration task, ChromTR also demonstrates state-of-the-art performances on R-band and G-band two metaphase image datasets. Given these superior performances to other methods, our proposed method has been applied to assist clinical karyotype diagnosis.
Chromosome karyotyping is a critical cytogenetic method in the clinical diagnoses of several genetic diseases, such as Edwards, Turners, and Down syndromes [1,2]. In clinical practice, chromosomes in metaphase cells are first stained to present distinct light and dark bands [3,4]. Subsequently, cytogeneticists precisely locate and recognize each chromosome in captured metaphase cell images. However, a normal human cell has 46 chromosomes, which consist of twenty-two pairs of autosomes and one pair of allosomes (XY or XX) [5]. Moreover, cytogeneticists must analyze at least 20 images per patient, which is tedious and time-consuming. Although some modern microscopes are equipped with automatic chromosome detection systems (e.g., Ikaros [6–9], ASI [10,11], and CytoVision [12,13]), their limited accuracy still necessitates mass of manual annotation.
With the rapid development of computer vision and machine learning technology, Convolutional Neural Network (CNN) [14], Recurrent Neural Network (RNN) [15], and Transformer [16] have been widely applied in omics data analysis [17,18], disease diagnosis [19,20], and medical image processing [21]. For chromosome image analysis, numerous computer-aided methods have recently been developed for chromosome detection [22–41] and segmentation [31–38,42–53] to reduce the burden of manual identification. In contrast to chromosome segmentation, chromosome detection involves the precise location and recognition of each chromosome in metaphase cell images, which is a challenging work. In metaphase cell images, each single chromosome can exhibit arbitrary orientations with some curvature due to their no rigid nature, besides, all chromosomes are always distributed densely and randomly with some overlaps. Furthermore, nuclei and miniscule impurities cause interference in chromosome detection. Two examples of metaphase cell images are shown in Fig.1 (R-band) and 1B (G-band)
Existing studies focusing on chromosome detection can be mainly divided into three types. The first type is binary class detection [22–26,39,40], which only detects chromosomes from the background while regardless of chromosome type. This type of work can also be viewed as chromosome enumeration. Al-Kharraz et al. [22] proposed a three-stage chromosome karyotyping system and used YOLOv2 [54] to detect chromosomes without chromosome type identification in the first stage. Xiao et al. [23] proposed chromosome enumeration methods based on Faster R-CNN [55] with improved hard negative anchors sampling and loss functions. Wang et al. [24] proposed DeepCHM with rotated bounding boxes and morphological priors for chromosome detection. However, all these methods can only detect chromosomes or no-chromosomes in metaphase cell images without considering chromosome type during detection. Even though some works classified the detected chromosomes in the following stage, their classification results cannot be used for localization optimization. The second type is multi-class chromosome detection [27–29], which locates and identifies the type of chromosomes simultaneously. This type is most similar to our work. Luo et al. [27] proposed DeepACC, which enhanced the classification branch of Faster R-CNN [55] with Siamese network and additive angular margin loss. However, DeepACC [27] only considered the classification challenge in the chromosome detection task, while the improvements in detection mean average precision (mAP) are limited. The last type is chromosome instance segmentation methods [30–33], which also involve the detection step. The majority of these works directly applied popular instance segmentation methods (e.g., Mask R-CNN [56]) to metaphase cell image datasets without considering the challenges in chromosome detection, and the costed pixel-level annotations are usually required for training. Furthermore, some other works [34,35,38] focused on chromosome detection and segmentation in local parts of metaphase cell images.
The main challenges encountered in chromosome detection in raw metaphase cell images include the following. (1) Chromosomes are always densely distributed in the local part of raw metaphase cell images. Moreover, numerous cell nuclei and other impurities exist in raw metaphase cell images. However, numerous existing works detected and separated chromosomes only in local parts of metaphase cell images with chromosomes, showing poor anti-interference ability. (2) Chromosome detection needs to identify the localization and specific class of each chromosome. By analyzing all chromosomes in the karyotype together, the class distribution feature can be helpful for improving localization. Nevertheless, most existing studies conducted chromosome detection and recognition separately, neglecting the relationship between the semantic information in images and class distribution feature among chromosomes. (3) Some chromosome detection and segmentation works rely on pixel-level annotations. However, the pixel-level annotation of all chromosomes in raw metaphase cell images is costly.
To tackle the above challenges, we focus on the joint optimization of chromosome localization and classification and propose ChromTR to detect and classify 24 classes of chromosomes in raw metaphase cell images. As shown in Fig.2, ChromTR incorporates semantic feature learning and chromosome class distribution learning into a unified framework for improved location and classification. After feature extraction with a CNN backbone (e.g., ResNet50), we propose a Semantic Feature Learning Network (SFLN) with a Feature Pyramid Network (FPN) [57] and segmentation head for semantic feature extraction and chromosome foreground region segmentation. Subsequently, a Semantic-Aware Transformer (SAT) with two parallel encoders and a semantic-aware decoder is used for visual and semantic feature encoding to locate and separate each chromosome. Specially, we use an attention mask in the semantic encoder to restrict the attention computation within the predicted foreground region. Lastly, considering that the total numbers and classes of chromosomes in a metaphase cell are in a fixed mode, a Category Distribution Reasoning Module (CDRM) is built in the detection head for chromosome object number and class distribution reasoning. Our contributions can be summarized as follows:
● We introduce a novel framework ChromTR for chromosome detection in raw metaphase cell images, which incorporates semantic feature learning and class distribution learning into a unified DETR-based detection framework. This is also the first Transformer-based chromosome detection network.
● ChromTR presents the first semantic segmentation-aware detection model. The proposed SFLN and SAT can extract semantic features for improved localization without additional annotations.
● We introduce CDRM for chromosome object number and class distribution reasoning to provide a prediction combination with higher precision and recall.
● We demonstrate the merit of ChromTR on clinically collected R-band metaphase cell images. The proposed method achieves state-of-the-art performance in the chromosome detection task and is confident in tackling normal and numerically abnormal karyotypes.
● We extend ChromTR to chromosome enumeration and evaluate it on R-band and G-band two metaphase cell image datasets. ChromTR also achieves good performance in the chromosome enumeration task.
The remaining parts of this paper are organized as follows: In the “Method” section, we describe the proposed ChromTR in detail. The experimental setting and results are given in the “Experiments and results” section. The “Discussion” section provides a discussion of the application of ChromTR in chromosome enumeration and clinical situations and is followed by the “Conclusions” section, which provides our conclusions.
2 Method
In this section, we introduce ChromTR in detail. First, we provide an overview of the architecture of ChromTR in the section “ChromTR architecture.” Second, we introduce the details of SFLN in the section “SFLN.” Subsequently, we elaborate upon the details of our designed SAT and CDRM in the sections “SAT” and “CDRM,” respectively. Finally, loss functions are presented in the section “Loss functions.”
2.1 ChromTR architecture
Fig.2 presents the framework of ChromTR, which mainly consists of three components: (1) SFLN, (2) SAT, and (3) CDRM. Given a raw input metaphase cell image, a feature extraction backbone (e.g., ResNet50) extracts multiscale features from the input image. Subsequently, FPN and the auxiliary semantic segmentation head are attached to the backbone for semantic feature extraction and foreground region segmentation. Next, SAT with two parallel encoders and a semantic-aware decoder is built for semantic feature merging and chromosome detection. Finally, CDRM is built in the detection head for chromosome object number and class distribution reasoning. The whole model can be trained in an end-to-end way.
2.2 SFLN
For a given raw metaphase cell image , a feature extraction backbone (e.g., ResNet50) is first used to extract multiscale feature maps , , and with sizes of , , and , respectively. Following the multiscale feature construction manner of Deformable DETR [58], the extracted , , and feature maps are transferred to visual features , , , and with sizes of , , , and by one convolution layer, respectively. For semantic feature extraction and foreground region segmentation, we first use a FPN to upsample and synthesize visual features , , , and to obtain semantic features , , , and with the same sizes. Next, a segmentation head is added on the top of for chromosome foreground region segmentation. In this way, features , , , and are encoded with informative semantic information. Therefore, we call these features as semantic features and features , , , and as visual features . The detailed network structure of the semantic feature learning and segmentation head is shown in Fig.3.
Given the extracted features, we use object-wise labels to supervise the segmentation task of the chromosome foreground region, no extra annotations are required. Specifically, all pixels within all types of chromosome bounding boxes are assigned as foreground labels, and others are assigned as background labels. The segmentation head is composed of one upsampling layer and one convolution layer, which takes feature with the size of as input to predict the segmentation mask . Except for the segmentation mask , the segmentation head can also refine multiscale semantic features , , , and with rich semantic information, which will be used to support the detection task.
2.3 SAT
After feature extraction, Transformer with the encoder–decoder architecture models the discriminative contextual information among all pixel levels and predicts the final detection in an end-to-end manner. The core component in the Deformable DETR [58] is the multiscale deformable attention module, which calculates local attention with a small set of key sampling points around the reference point [58]. The multiscale deformable attention can be calculated by
where is the multiscale feature map; is the normalized coordinates of the reference point; and and are the sampling offset and attention weight of the sampling point in the feature level and the attention head, respectively. rescales the normalized coordinate to the input feature map of the feature level. Considering that chromosomes may be distributed in a limited region of raw metaphase images, nearby impurities and cell nuclei in the background area can interfere with chromosome detection. To utilize chromosome foreground region segmentation results to avoid these interferences and improve chromosome detection, we construct SAT with a visual encoder, a semantic encoder, and a semantic-aware decoder to integrate the global visual and semantic features.
As shown in Fig.4, given the multiscale visual features and semantic features , we build a visual encoder and a semantic encoder to compute multiscale deformable attention, respectively. and have the same number of blocks, and each block consists of a multi-head self-attention layer and a feed-forward network (FFN) to compute multiscale deformable attention to capture the long-range dependencies of input features. The outputs of two encoders are multiscale features with the same resolutions as the inputs and denoted as and . The decoupling of and allows them to learn feature representation from their own perspective in the encoder stage.
We first predict the attention mask by using the predicted chromosome foreground region in the segmentation head to concentrate the semantic feature in the foreground region of the input image and avoid interference from nearby cell nuclei and impurity noise. The attention mask is used to filter out the specific elements in to avoid irrelevant features in the background region. Specifically, we take the predicted chromosome foreground mask as the input and conduct “max-pooling” with the scale of 2 as
In this way, we obtain the attention masks , , , and with the scales of , , , and for multiscale deformable attention computation in the semantic encoder .
Based on the global visual and semantic embedding and , we build a semantic-aware decoder for chromosome object detection. Fig.4 shows that each decoder layer sequentially consists of a self-attention layer, two parallel cross-attention layers, and an FFN. Specifically, the learnable queries first conduct self-attention and then perform cross-attention with visual and semantic embedding and in two parallel cross-attention layers. The outputs of two parallel cross-attention layers are added and followed by an FFN. The output queries thus contain visual and semantic clues.
2.4 CDRM
For humans, a normal metaphase cell contains twenty three pairs of chromosomes, with twenty two pairs of autosomes and one pair of allosomes (XY and XX for males and females, respectively). Numerically abnormal karyotypes usually only exhibit an increase or decrease in the number of certain types of chromosomes, and the number of the remaining types of chromosomes of an abnormal karyotype remains the same as that of a normal karyotype. Hence, the total number and class types of chromosome instances in a metaphase cell image have a relatively fixed pattern. Therefore, different from general object detection tasks, the chromosomes in a metaphase image have fixed numbers and category distribution. In recent years, some non-differentiable [59] and differentiable methods [60,61] have been developed to utilize chromosome distribution features to improve chromosome classification accuracy.
We take the idea of DAM [61] to build CDRM to utilize the above-mentioned prior knowledge. Specifically, we focus on the predicted category distribution of N output queries in the decoder. Different from the chromosome classification task, in the Transformer based chromosome detection framework, a large proportion of N output queries in the decoder are matched to the background, which increases the difficulties of category distribution learning. The object distribution in the detection task contains two aspects: background–foreground object distribution and foreground class distribution. To address the above problem, we first select top K queries with the highest score from all N queries as foreground candidates in both training and inference processes. Next, we choose C foreground class scores of the K selected queries as the input array of CDRM. The core of CDRM is four sequential bidirectional recurrent neural network (Bi-RNN) blocks, and each Bi-RNN block consists of two Bi-RNN layers. The Bi-RNN block mimics the Hungarian algorithm [62] in a learnable and differentiable manner, redistributing the initial class probability distribution array in an alternating row and column manner to provide a desirable distribution. The unselected N−K queries and background class score of the selected queries are kept the same for the output. The detailed structure of CDRM is shown in Fig.5.
2.5 Loss function
The overall loss function of training the proposed ChromTR containing the segmentation loss and detection loss :
and β are two weights controlling the balance between two loss terms. The segmentation loss is the Dice loss and can be formulated as:
where is the ground-truth chromosome region, and is the predicted chromosome region. The detection loss is a combination of classification loss and localization loss, defined as:
where denotes the focal loss [63], is the L1 loss, and is the generalized IoU loss [64]. , , and are the coefficients of each component.
3 Experiments and results
3.1 Datasets
We evaluate the proposed ChromTR on two datasets of R- and G-band metaphase cell images: our collected dataset of R-band metaphase cell images and the public dataset of G-band metaphase cell images AutoKary2022 [30].
We collected and labeled 1404 raw R-band bone marrow metaphase images (including 837 male karyotypes and 567 female karyotypes, with a total of 64 536 chromosome instances) from the Shanghai Institute of Hematology, National Research Center for Translational Medicine, Ruijin Hospital, China. The metaphase images contain normal and numerically abnormal karyotypes. They were captured at 630× by using a CoolCube 1 camera (MetaSystems, Germany) attached to AXIO IMAGER Z2 (Carl Zeiss, Germany) and exported by the Ikaros system (MetaSystems, Germany). The images were annotated manually by two experienced cytologists by using the LabelMe annotation tool [65]. Each chromosome was labeled with a horizontal bounding box with a precise location and class name. All images were normalized with the resolution of 960 × 896 to facilitate model training. All collected images were approved by the review boards of the Shanghai Institute of Hematology State Key Laboratory of Medical Genomics. We randomly split normal karyotypes into five subsets to perform 5-fold cross-validation. In each experiment, four subsets are used for training and validation, and the remaining subset is used for testing. The presented results are the average of 5-fold cross-validation.
AutoKary2022 [30] is a G-band clinical chromosome dataset. We select metaphase images with corresponding annotation files and construct the training and testing sets according to the original division. The training set contains 529 metaphase images, and the testing set contains 66 images. All images are normalized with the resolution of 1024 × 704, and the annotation of each chromosome is transferred to a horizontal bounding box. All chromosomes in the metaphase cell images are labeled with the same class to increase the labeling reliability. The detailed statistical information of the two datasets is listed in Tab.1.
3.2 Implementation details
3.2.1 Foreground mask generation
We generate an object-wise chromosome foreground region mask according to the annotated bounding boxes to supervise foreground chromosome region segmentation and semantic feature learning. Specifically, we set all pixels within all types of chromosome bounding boxes as the foreground and the other pixels as the background. In this way, we obtain an object-wise chromosome mask without additional human annotations. The mask image is resized to to supervise the predicted mask in ChromTR. An example of the labeled metaphase cell image and corresponding mask is shown in Fig.6.
3.2.2 Network training
We conduct random horizontal and vertical flipping for data augmentation to reduce overfitting. The proposed ChromTR is implemented with PyTorch [66] deep learning framework under Ubuntu OS with NVIDIA Tesla A100 GPUs. The training details of ChromTR are as follows: ChromTR is trained in an end-to-end manner for 50 epochs with the Adam optimizer and a batch size of 2. The learning rate is set as 0.02, which decays to 0.002 after 40 epochs. Unless specifically stated, we use the ResNet50 backbone for feature extraction in all experiments. We set h to 256 for the hidden dimension of all Bi-RNN layers in CDRM. In SAT, the number of queries N is set as 300 and K is set as 100.
3.3 Evaluation metrics
We utilize six metrics to evaluate the performance of ChromTR: the mean average precision under the IoU threshold of 0.5 (), the mean average precision over different IoU thresholds (mAP), precision, recall, -score (), and average error rate (AER). We also calculate P values through paired t-test by using the results of 5-fold cross validation.
For a detection task, mAP is one of the most commonly used metrics. It is the mean average precision (AP) of all classes. It is the area under the precision–recall curve for a certain category. Precision, recall, and are calculated as
TP, FP, and FN are true positives, false positives, and false negatives, respectively. For a certain type of chromosome, TP means that the IoU of the predicted bounding box and ground truth bounding box is larger than a predefined threshold (such as 0.5). FP means that the predicted bounding box dissatisfies the IoU requirement or the predicted class is inconsistent with the ground truth class. FN means the ground truth bounding box does not have a matched predicted bounding box. We adopt the IoU threshold of 0.5 to calculate the metric. Furthermore, we calculate the mean average precision over different IoU thresholds, from 0.5 to 0.95 with the step of 0.05, and mark it as mAP.
In addition to the above metrics, we measure the model performance under a given condition (for example, filter the output with the detection confidence of 0.5) and compute precision, recall, as above, which are meaningful in clinical applications. Besides, we calculate AER to reflect the error instance ratio, which reflects the proportion that requires manual correction. AER is defined as the fraction of the sum of false positives and false negatives divided by the number of ground truths:
3.4 Evaluation results and ablation studies
In this section, we provide the full evaluation results and ablation studies on R-band metaphase images to evaluate the effectiveness of the proposed ChromTR. Since our method is implemented based on Deformable DETR [58], the latter is set as the baseline model in ablation experiments. To investigate the effect of key components in the proposed ChromTR, we first solely added SFLN for semantic feature extraction. Then, the SAT is built with the extracted semantic features. Besides, we also evaluate the baseline model with the single CDRM.
As shown in Tab.2, our method achieves favorable results not only on the object detection metric mAP but also on the clinically applied metrics. These results show that our method is effective in chromosome detection. ChromTR with the proposed SAT and CDRM improves from 89.54% to 92.56% and mAP from 74.24% to 76.99%. These results indicate that ChromTR can improve detection performances even on the powerful Deformable-DETR baseline model. When considering the filtered output for clinical application, ChromTR gets the improvements of 3.34%, 6.18%, and 4.92% in precision, recall, and , respectively. Meanwhile, it reduces AER from 7.22% to 4.39%. These improvements suggest that the chromosome localization and classification in raw metaphase cell images are more accurate and the workload of manual modification is reduced.
The results in Tab.2 also show the effectiveness of each component in our proposed approach. The baseline model combined with the single SFLN cannot improve detection performance. However, when SAT is added, mAP is improved from 74.24% to 76.08%. This improvement shows the effectiveness of the extracted semantic features that need to be processed and synthesized by the semantic encoder and semantic-aware decoder in SAT. The baseline model combined with CDRM can also improve mAP, precision, and recall by 1.08%, 2.02%, and 3.51%, respectively. This is because CDRM infers the distribution between the foreground and background, as well as the distribution of different classes in the foreground, and provides more reasonable predictions. As a result, false positive and false negative rates can be reduced by distribution learning. In addition, the combined utilization of SAT and CDRM can further boost performance.
3.5 Comparison with other methods
In this section, we compare our ChromTR with the one-stage object detection method RetinaNet [63], two-stage object detection method Faster RCNN [55], DETR-based object detection methods DETR [67] and Deformable DETR [58], and chromosome detection method DeepACC [27]. For fair comparison, we choose the same backbone (ResNet50) for all methods. We only fine-tune DETR [67] with the pre-trained model on our collected dataset for 150 epochs to accelerate model convergence. Faster RCNN [55], RetinaNet [63], and DeepACC [27] are trained for 12 epochs. The proposed ChromTR and Deformable DETR [58] are trained for 50 epochs.
Tab.3 shows that our proposed ChromTR outperforms the other methods by a large margin, while our baseline model Deformable DETR [58] shows the second highest performance. These results indicate the powerful performance of Deformable DETR in detecting small and dense chromosomes in metaphase cell images, while the proposed ChromTR can bring further improvement on this strong baseline. Compared with DETR [67], Deformable DETR [58] is more suitable for the chromosome detection task. In comparison with the classical object detection method Faster RCNN [55], ChromTR improves from 84.19% to 92.56%, increases from 82.02% to 91.04%, and reduces AER from 10.93% to 4.39%. ChromTR also exceeds the previous Faster RCNN-based [55] chromosome detection method DeepACC [27] by 10.65%, 6.51%, 9.48%, and 8.18% in terms of mAP, precision, recall, and , respectively. Moreover, it reduces AER by 5.74%. These improvements might suggest the superiority of the Deformable DETR detection framework and that of our proposed semantic feature learning and category distribution reasoning strategies over other methods.
For intuitive presentation, we visualize the three examples of metaphase cell images processed by different methods in Fig.7. In the absence of semantic feature learning and chromosome foreground region segmentation, part of the chromosomes in clustered regions are likely to be missed. In comparison, ChromTR can avoid a lot of missed detection by the proposed SFLN and SAT. Therefore, more touched and crossed chromosomes can be correctly located, and recall and are improved. Due to the limited resolution of individual chromosomes, chromosomes are likely to be misclassified with similar classes. Besides, some background stains may also be mistaken as chromosomes. For example, in the last sample, a chromosome-like stain is misjudged as chromosome X, resulting in two X chromosomes and one Y chromosome in this karyotype. Such situations can be significantly avoided by learning the category distribution with the proposed CDRM, and classification errors will be largely reduced jointly.
4 Discussion
4.1 Performance analysis
ChromTR is the first Transformer-based chromosome detection method that incorporates semantic feature learning and category distribution reasoning into a unified detection framework. The SFLN extracted visual and semantic features are refined individually by visual and semantic encoders and merged with a semantic-aware decoder to provide a syncretic prediction. The predicted object number and category distribution of chromosomes in the metaphase cell images are further refined by the proposed CDRM. Evaluation results on the dataset of R-band metaphase cell images demonstrate the effectiveness of ChromTR.
To get deeper insights into the results, we further analyze and compute the APs of each categories achieved by the baseline Deformable-DETR [58] model and our ChromTR. As shown in Fig.8, chromosomes No. 2 and No. 3 perform best with APs of 96.78% and 97.42%, respectively. Chromosomes Nos. 1–7, 9, 11–13, 16, 17, and 19 obtain above-average AP scores, mainly because they are relatively larger and easier to detect than other chromosomes. By contrast, chromosomes Nos. 14, 15, and Y obtain the lowest APs among chromosomes, which is mainly due to the similar sizes and banding patterns of chromosomes Nos. 14 and 15 and the smallest size of chromosome Y. Compared with the baseline method, the proposed ChromTR exhibits better performances on all types of chromosomes. This result provides evidence that the utilization of SFLN and CDRM is beneficial for improving the precision of localization and classification. In addition, the performance gaps between different types of chromosomes have reduced. For example, the AP score for chromosome No. 8 has improved from 86.72% to 92.10%, that for chromosome No. 10 has improved from 84.66% to 90.84%, and that for chromosome No. 14 has improved from 79.94% to 87.74%.
To qualitatively investigate the effectiveness of the proposed SFLN and SAT, we visualize three examples of the predicted chromosome foreground mask and the corresponding detection results in the test set in Fig.9. As we discussed above, object-wise chromosome foreground segmentation is an easy task, and the segmentation mask is accurate. As shown in Fig.9, the chromosome foreground region can be correctly segmented. Except for the segmented mask, the extracted semantic feature can help improve chromosome detection and reduce false positives and false negatives. For example, in the first sample (as shown by the yellow arrow), two chromosomes are missed by the baseline Deformable-DETR model. However, their foreground regions are correctly segmented, and they are correctly detected by the proposed ChromTR model. The background noises in raw metaphase cell images can also be reduced in this manner (as shown by the yellow arrow in the last sample).
4.2 Generalization for chromosome enumeration
Chromosome enumeration can be viewed as one class chromosome detection, which ignores the specific chromosome type when detecting chromosomes in metaphase cell images. We evaluate the potential of ChromTR for the chromosome enumeration task on both R-band and G-band metaphase cell image datasets. Specially, we modify the labels of all chromosomes into the same class, train and evaluate the two datasets separately. We perform 5-fold cross-validation on R-band metaphase cell image datasets and use the training and testing sets according to the original division on G-band metaphase cell image datasets (AutoKary2022). The chromosome enumeration results are compared with existing chromosome enumeration methods: DeepACEv2 [23] and DeepCHM [24]. Specially, in DeepCHM, we only predict horizontal anchors and horizontal bounding boxes for chromosome objects.
The evaluation results for R-band metaphase cell images are shown in Tab.4. ChromTR with SAT and CDRM shows the highest performance in all metrics except recall. For recall, ChromTR with a single SAT shows a slightly higher score than the full model. Compared with our baseline method Deformable DETR [58], ChromTR improves mAP from 77.62% to 79.31% and reduces AER from 1.76% to 1.35%, which also shows the superior performance of ChromTR in the chromosome enumeration task. Moreover, ChromTR outperforms the previously reported chromosome enumeration methods DeepACEv2 [23] and DeepCHM [24] with higher AP and lower AER. This is mainly because ChromTR can extract better semantic features and infer foreground–background object distribution in metaphase cell images. By contrast, the object detection method DETR [67] obtains inferior results due to the limited object size.
Tab.5 shows the chromosome enumeration results for the AutoKary2022 dataset. ChromTR obtains the best performances in terms of mAP, recall, and AER, outperforming the baseline method Deformable DETR [58] by 1.80%, 0.87%, 0.89%, and 3.24%, respectively. Although DeepACEv2 [23] exhibits high scores in and precision, recall is relatively lower and thus the -score is also lower than the proposed ChromTR. ChromTR also surpasses DeepCHM [24] in all aspects. These results also demonstrate that semantic feature learning and distribution reasoning benefit G-band chromosome enumeration.
4.3 Clinical impact
To quantitatively evaluate the clinic value of ChromTR, we randomly select 100 raw R-band metaphase images (including 50 normal karyotypes and 50 karyotypes with numerical abnormalities) and compare the chromosome detection results of ChromTR with those of the commercial software Ikaros [6–9] currently used in Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine. We feed the captured raw metaphase images into the commercial software Ikaros [6–9] and ChromTR for chromosome detection without any manual operation. Normal karyotypes contain 46, XX and 46, XY, and numerically abnormal karyotypes contain the hematological malignancies related +8, −7, +11, and −Y. We filter the output the prediction of ChromTR with the detection confidence of 0.5 and compute precision, recall, , and AER. AER can reflect the ratio of required manual correction in practical use. We also calculate the abnormal detection rate (ADR) to reflect the detection rate of abnormal karyotypes, which is calculated by the number of detected abnormal chromosomes divided by the total number of abnormal chromosomes.
The statistical results are shown in Tab.6. ChromTR surpasses Ikaros in all metrics by a large margin and improves ADR from 52% to 86%, which means ChromTR is also sensitive to numerically abnormal chromosome detection and suitable for clinical diagnosis. Furthermore, ChromTR reduces AER from 21.49% to 12.24%. This result indicates that ChromTR can reduce approximately 50% of the manual correction in practical use. These improvements greatly alleviate cytogeneticists’ workload in chromosome karyotyping, making the automatic classification of chromosomes truly usable in practice.
5 Conclusions
Chromosome detection is a challenging and critical step in karyotyping. In this work, we propose a Transformer-based detection framework ChromTR for chromosome detection in raw metaphase cell images. ChromTR incorporates semantic feature learning and class distribution learning for precise localization and accurate classification, significantly outperforming existing chromosome detection methods and the baseline object detection method. Experimental results on both R-band and G-band metaphase cell image datasets demonstrate the potential application of ChromTR for chromosome enumeration. The innate merit of ChromTR also makes it perform robustly in the detection of numerically abnormal chromosomes, showing high value in clinical karyotyping workflows. In the future, we will extend ChromTR to chromosome instance segmentation on raw metaphase cell images with pixel-level annotations and make disease diagnosis with multiple metaphase cell images from a patient jointly.
GersenSL. The Principles of Clinical Cytogenetics. Springer, 2013
[2]
Patterson D. Molecular genetic analysis of Down syndrome. Hum Genet2009; 126(1): 195–214
[3]
Craig JM, Bickmore WA. Genes and genomes: chromosome bands–flavours to savour. BioEssays1993; 15(5): 349–354
[4]
Rack K, Van den Berg E, Haferlach C, Beverloo HB, Costa D, Espinet B, Foot N, Jeffries S, Martin K, O’Connor S, Schoumans J, Talley P, Telford N, Stioui S, Zemanova Z, Hastings RJ. European recommendations and quality assurance for cytogenomic analysis of haematological neoplasms: reponse to the comments from the francophone group of hematological cytogenetics (gfch). Leukemia2020; 34(8): 2262–2264
[5]
McGowan-JordanJSimonsASchmidM. ISCN 2016: an international system for human cytogenomic nomenclature (2016). 2016
[6]
Ganguly BB, Shouvik M, Kadam NN, Banerjee D, Agarwal M. Experience of conventional cytogenetics in elderly cytopenic indian patients suspected with myelodysplastic syndromes. Blood2016; 128(22): 5488–5488
[7]
MetaSystems. Ikaros. 2024. Available at the website of metasystems-international.com/en/products/ikaros/
[8]
Marková J, Michková P, Burčková K, Březinová J, Michalová K, Dohnalová A, Maaloufová JS, Soukup P, Vítek A, Cetkovský P, Schwarz J. Prognostic impact of DNMT3A mutations in patients with intermediate cytogenetic risk profile acute myeloid leukemia. Eur J Haematol2012; 88(2): 128–135
[9]
Havelka M, Bytyutskyy D, Symonová R, Ráb P, Flajšhans M. The second highest chromosome count among vertebrates is observed in cultured sturgeon and is associated with genome plasticity. Genet Sel Evol2016; 48(1): 12
[10]
Priya PK, Mishra VV, Roy P, Patel H. A study on balanced chromosomal translocations in couples with recurrent pregnancy loss. J Hum Reprod Sci2018; 11(4): 337
[11]
Krishna Chandran R, Geetha N, Sakthivel KM, Suresh Kumar R, Jagathnath Krishna KMN, Sreedharan H. Impact of additional chromosomal aberrations on the disease progression of chronic myelogenous leukemia. Front Oncol2019; 9: 88
[12]
Paulis M, Susani L, Castelli A, Suzuki T, Hara T, Straniero L, Duga S, Strina D, Mantero S, Caldana E, Sergi LS, Villa A, Vezzoni P. Chromosome transplantation: a possible approach to treat human X-linked disorders. Mol Ther Methods Clin Dev2020; 17: 369–377
[13]
Kurtovic-Kozaric A, Mehinovic L, Malesevic R, Mesanovic S, Jaros T, Stomornjak-Vukadin M, Mackic-Djurovic M, Ibrulj S, Kurtovic-Basic I, Kozaric M. Ten-year trends in prevalence of down syndrome in a developing country: impact of the maternal age and prenatal screening. Eur J Obstet Gynecol Reprod Biol2016; 206: 79–83
[14]
LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE1998; 86(11): 2278–2324
[15]
GravesA. Long short-term memory. In: Supervised Sequence Labelling with Recurrent Neural Networks. Springer. 2012. 37–45
[16]
VaswaniAShazeerNParmarNUszkoreitJJonesLGomezANKaiserLPolosukhinI. Attention is all you need. In: Advances in Neural Information Processing Systems. Vol 30 (NIPS 2017). 2017
[17]
Gomes B, Ashley EA. Artificial intelligence in molecular medicine. N Engl J Med2023; 388(26): 2456–2465
[18]
MaoWZhuMSunZShenSWuLYChenHShenC. De novo protein design using geometric vector field networks. 2023. arXiv: 2310.11802
[19]
Liu X, Faes L, Kale AU, Wagner SK, Fu DJ, Bruynseels A, Mahendiran T, Moraes G, Shamdas M, Kern C, Ledsam JR, Schmid MK, Balaskas K, Topol EJ, Bachmann LM, Keane PA, Denniston AK. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health2019; 1(6): e271–e297
[20]
Lin Z, Zhang D, Shi D, Xu R, Tao Q, Wu L, He M, Ge Z. Contrastive pre-training and linear interaction attention-based transformer for universal medical reports generation. J Biomed Inform2023; 138: 104281
[21]
Rajpurkar P, Lungren MP. The current and future state of AI interpretation of medical images. N Engl J Med2023; 388(21): 1981–1990
[22]
Al-Kharraz MS, Elrefaei LA, Fadel MA. Automated system for chromosome karyotyping to recognize the most common numerical abnormalities using deep learning. IEEE Access2020; 8: 157727–157747
[23]
Xiao L, Luo C, Yu T, Luo Y, Wang M, Yu F, Li Y, Tian C, Qiao J. Deepacev2: automated chromosome enumeration in metaphase cell images using deep convolutional neural networks. IEEE Trans Med Imaging2020; 39(12): 3920–3932
[24]
Wang J, Zhou C, Chen S, Hu J, Wu M, Jiang X, Xu C, Qian D. Chromosome detection in metaphase cell images using morphological priors. IEEE J Biomed Health Inform2023; 27(9): 4579–4590
[25]
XiaoLLuoCLuoYYuTTianCQiaoJZhaoY. Deepace: automated chromosome enumeration in metaphase cell images using deep convolutional neural networks. In: Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. Cham: Springer International Publishing, 2019. 595–603
[26]
Bai H, Zhang T, Lu C, Chen W, Xu F, Han ZB. Chromosome extraction based on u-net and yolov3. IEEE Access2020; 8: 178563–178569
[27]
LuoCYuTLuoYWangMYuFLiYTianCQiaoJXiaoL. Deepacc: automate chromosome classification based on metaphase images using deep learning framework fused with prior knowledge. 2020. arXiv: 2006.15528
[28]
Tseng JJ, Lu CH, Li JZ, Lai HY, Chen MH, Cheng FY, Kuo CE. An open dataset of annotated metaphase cell images for chromosome identification. Sci Data2023; 10(1): 104
[29]
DingWChangLGuCWuK. Classification of chromosome karyotype based on faster-rcnn with the segmatation and enhancement preprocessing model. In: 2019 12th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISPBMEI). IEEE. 2019. 1–5
[30]
YouDXiaPChenQWuMXiangSWangJ. AutoKary2022: A Large-Scale Densely Annotated Dataset for Chromosome Instance Segmentation. 2023 IEEE International Conference on Multimedia and Expo (ICME), Brisbane, Australia, 2023. 1577–1582
[31]
Zhou R, Yu L, Chen D, Zhang H, Szczerbicki E. Kemr-net: a knowledge-enhanced mask refinement network for chromosome instance segmentation. Cybern Syst2023; 55(3): 708–718
[32]
WangPHuWZhangJWenYXuCQianD. Enhanced rotated mask r-cnn for chromosome segmentation. In: 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE. 2021. 2769–2772
[33]
Xie N, Li X, Li K, Yang Y, Shen HT. Statistical karyotype analysis using cnn and geometric optimization. IEEE Access2019; 7: 179445–179453
[34]
PijackovaKGotthansTGotthansJ. Deep learning pipeline for chromosome segmentation. In: 2022 32nd International Conference Radioelektronika (RADIOELEKTRONIKA). IEEE. 2022. 1–5
[35]
Huang R, Lin C, Yin A, Chen H, Guo L, Zhao G, Fan X, Li S, Yang J. A clinical dataset and various baselines for chromosome instance segmentation. IEEE/ACM Trans Comput Biol Bioinformatics2022; 19(1): 31–39
[36]
Liu H, Wang G, Song S, Huang D, Zhang L. Rc-net: regression correction for end-to-end chromosome instance segmentation. Front Genet2022; 13: 895099
[37]
MarcCNCzibulaG. Karysom: an unsupervised learning based approach for human karyotyping using self-organizing maps. In: 2018 IEEE 14th International Conference on Intelligent Computer Communication and Processing (ICCP). IEEE. 2018. 167–174
[38]
LinCZhaoGYinADingBGuoLChenH. A multistages chromosome segmentation and mixed classification method for chromosome automatic karyotyping. In: International Conference on Web Information Systems and Applications. Springer. 2020. 365–376
[39]
WuYYueYTanXWangWLuT. End-to-end chromosome karyotyping with data augmentation using gan. In: 2018 25th IEEE International Conference on Image Processing (ICIP). IEEE. 2018. 2456–2460
[40]
Wu Y, Tan X, Lu T. A new multiple-distribution gan model to solve complexity in end-to-end chromosome karyotyping. Complexity2020; 2020: 8923838
[41]
Andrade MF, Dias LV, Macario V, Lima FF, Hwang SF, Silva JC, Cordeiro FR. A study of deep learning approaches for classification and detection chromosomes in metaphase images. Mach Vis Appl2020; 31: 65
[42]
DoughertyAWYouJ. A kernel-based adaptive fuzzy c-means algorithm for m-fish image segmentation. In: 2017 International Joint Conference on Neural Networks (IJCNN). IEEE. 2017. 198–205
[43]
Song S, Bai T, Zhao Y, Zhang W, Yang C, Meng J, Ma F, Su J. A new convolutional neural network architecture for automatic segmentation of overlapping human chromosomes. Neural Process Lett2022; 54: 285–301
[44]
HuangKLinCHuangRZhaoGYinAChenHGuoLShanCNieRLiS. A novel chromosome instance segmentation method based on geometry and deep learning. In: 2021 International Joint Conference on Neural Networks (IJCNN). IEEE. 2021. 1–8
[45]
Mei L, Yu Y, Shen H, Weng Y, Liu Y, Wang D, Liu S, Zhou F, Lei C. Adversarial multiscale feature learning framework for overlapping chromosome segmentation. Entropy (Basel)2022; 24(4): 522
[46]
Liu X, Wang S, Lin JCW, Liu S. An algorithm for overlapping chromosome segmentation based on region selection. Neural Comput Appl2024; 36: 133–142
[47]
Wang G, Liu H, Yi X, Zhou J, Zhang L. Arms net: overlapping chromosome segmentation based on adaptive receptive field multiscale network. Biomed Signal Process Control2021; 68: 102811
[48]
ChenPCaiJYangL. Chromosome segmentation via data simulation and shape learning. In: 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE. 2020. 1637–1640
[49]
Chen X, Cai Q, Ma N, Li H. Chrosegnet: an attentionbased model for chromosome segmentation with enhanced processing. Appl Sci (Basel)2023; 13(4): 2308
[50]
CaoXLanFLiuCMLamTWLuoR. Chromseg: twostage framework for overlapping chromosome segmentation and reconstruction. In: 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE. 2020. 2335–2342
[51]
Altinsoy E, Yang J, Yilmaz C. Fully-automatic raw G-band chromosome image segmentation. IET Image Process2020; 14(9): 1920–1928
[52]
HuRLKarnowskiJFadelyRPommierJP. Image segmentation to distinguish between overlapping human chromosomes. 2017; arXiv: 1712.07639
[53]
YilmazICYangJAltinsoyEZhouL. An improved segmentation for raw g-band chromosome images. In: 2018 5th International Conference on Systems and Informatics (ICSAI). IEEE. 2018. 944–950
[54]
RedmonJFarhadiA. Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. 7263–7271
[55]
RenSHeKGirshickRSunJ. Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems. Vol 28 (NIPS 2015). 2015
[56]
HeKGkioxariGDollarPGirshickR. Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. 2961–2969
[57]
LinTYDollarPGirshickRHeKHariharanBBelongieS. Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. 2117–2125
Qin Y, Wen J, Zheng H, Huang X, Yang J, Song N, Zhu YM, Wu L, Yang GZ. Varifocal-net: a chromosome classification approach using deep convolutional networks. IEEE Trans Med Imaging2019; 38(11): 2569–2581
[60]
XiaCWangJQinYGuYChenBYangJ. An end-to-end combinatorial optimization method for r-band chromosome recognition with grouping guided attention. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2022. 3–13
[61]
Xia C, Wang J, Qin Y, Wen J, Liu Z, Song N, Wu L, Chen B, Gu Y, Yang J. Karyonet: chromosome recognition with end-toend combinatorial optimization network. IEEE Trans Med Imaging2023; 42(10): 2899–2911
[62]
Kuhn HW. The hungarian method for the assignment problem. Nav Res Logist Q1955; 2(1–2): 83–97
[63]
LinTYGoyalPGirshickRHeKDollarP. Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. 2980–2988
[64]
RezatofighiHTsoiNGwakJSadeghianAReidISavareseS. Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019. 658–666
[65]
Russell BC, Torralba A, Murphy KP, Freeman WT. Labelme: a database and web-based tool for image annotation. Int J Comput Vis2008; 77(1–3): 157–173
[66]
PaszkeAGrossSMassaFLererABradburyJChananGKilleenTLinZGimelsheinNAntigaL. Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems. 2019. Vol 32
[67]
CarionNMassaFSynnaeveGUsunierNKirillovAZagoruykoS. End-to-end object detection with transformers. In: European Conference on Computer Vision. Springer. 2020. 213–229
RIGHTS & PERMISSIONS
Higher Education Press
AI Summary 中Eng×
Note: Please be aware that the following content is generated by artificial intelligence. This website is not responsible for any consequences arising from the use of this content.