Remarkable results have been realized by the U-Net network in the task of medical image segmentation. In recent years, many scholars have been researching the network and expanding its structure, such as improvement of encoder and decoder and improvement of skip connection. Based on the optimization of U-Net structure and its medical image segmentation techniques, this paper elucidates in the following: First, the paper elaborates on the application of U-Net in the field of medical image segmentation; Then, the paper summarizes the seven improvement mechanism of U-Net: dense connection mechanism, residual connection mechanism, multi-scale mechanism, ensemble mechanism, dilated mechanism, attention mechanism, and transformer mechanism; Finally, the paper states the ideas and methods on the U-Net structure improvement in a bid to provide a reference for later researches, which plays a significant part in advancing U-Net.
The accurate segmentation of breast ultrasound images is an important precondition for the lesion determination. The existing segmentation approaches embrace massive parameters, sluggish inference speed, and huge memory consumption. To tackle this problem, we propose T2KD Attention U-Net (dual-Teacher Knowledge Distillation Attention U-Net), a lightweight semantic segmentation method combined double-path joint distillation in breast ultrasound images. Primarily, we designed two teacher models to learn the fine-grained features from each class of images according to different feature representation and semantic information of benign and malignant breast lesions. Then we leveraged the joint distillation to train a lightweight student model. Finally, we constructed a novel weight balance loss to focus on the semantic feature of small objection, solving the unbalance problem of tumor and background. Specifically, the extensive experiments conducted on Dataset BUSI and Dataset B demonstrated that the T2KD Attention U-Net outperformed various knowledge distillation counterparts. Concretely, the accuracy, recall, precision, Dice, and mIoU of proposed method were 95.26%, 86.23%, 85.09%, 83.59%and 77.78% on Dataset BUSI, respectively. And these performance indexes were 97.95%, 92.80%, 88.33%, 88.40% and 82.42% on Dataset B, respectively. Compared with other models, the performance of this model was significantly improved. Meanwhile, compared with the teacher model, the number, size, and complexity of student model were significantly reduced (2.2×106 vs. 106.1×106, 8.4 MB vs. 414 MB, 16.59 GFLOPs vs. 205.98 GFLOPs, respectively). Indeedy, the proposed model guarantees the performances while greatly decreasing the amount of computation, which provides a new method for the deployment of clinical medical scenarios.
ObjectiveTo propose a lung artery segmentation method that integrates shape and position prior knowledge, aiming to solve the issues of inaccurate segmentation caused by the high similarity and small size differences between the lung arteries and surrounding tissues in CT images. MethodsBased on the three-dimensional U-Net network architecture and relying on the PARSE 2022 database image data, shape and position prior knowledge was introduced to design feature extraction and fusion strategies to enhance the ability of lung artery segmentation. The data of the patients were divided into three groups: a training set, a validation set, and a test set. The performance metrics for evaluating the model included Dice Similarity Coefficient (DSC), sensitivity, accuracy, and Hausdorff distance (HD95). ResultsThe study included lung artery imaging data from 203 patients, including 100 patients in the training set, 30 patients in the validation set, and 73 patients in the test set. Through the backbone network, a rough segmentation of the lung arteries was performed to obtain a complete vascular structure; the branch network integrating shape and position information was used to extract features of small pulmonary arteries, reducing interference from the pulmonary artery trunk and left and right pulmonary arteries. Experimental results showed that the segmentation model based on shape and position prior knowledge had a higher DSC (82.81%±3.20% vs. 80.47%±3.17% vs. 80.36%±3.43%), sensitivity (85.30%±8.04% vs. 80.95%±6.89% vs. 82.82%±7.29%), and accuracy (81.63%±7.53% vs. 81.19%±8.35% vs. 79.36%±8.98%) compared to traditional three-dimensional U-Net and V-Net methods. HD95 could reach (9.52±4.29) mm, which was 6.05 mm shorter than traditional methods, showing excellent performance in segmentation boundaries. ConclusionThe lung artery segmentation method based on shape and position prior knowledge can achieve precise segmentation of lung artery vessels and has potential application value in tasks such as bronchoscopy or percutaneous puncture surgery navigation.
The skin is the largest organ of the human body, and many visceral diseases will be directly reflected on the skin, so it is of great clinical significance to accurately segment the skin lesion images. To address the characteristics of complex color, blurred boundaries, and uneven scale information, a skin lesion image segmentation method based on dense atrous spatial pyramid pooling (DenseASPP) and attention mechanism is proposed. The method is based on the U-shaped network (U-Net). Firstly, a new encoder is redesigned to replace the ordinary convolutional stacking with a large number of residual connections, which can effectively retain key features even after expanding the network depth. Secondly, channel attention is fused with spatial attention, and residual connections are added so that the network can adaptively learn channel and spatial features of images. Finally, the DenseASPP module is introduced and redesigned to expand the perceptual field size and obtain multi-scale feature information. The algorithm proposed in this paper has obtained satisfactory results in the official public dataset of the International Skin Imaging Collaboration (ISIC 2016). The mean Intersection over Union (mIOU), sensitivity (SE), precision (PC), accuracy (ACC), and Dice coefficient (Dice) are 0.901 8, 0.945 9, 0.948 7, 0.968 1, 0.947 3, respectively. The experimental results demonstrate that the method in this paper can improve the segmentation effect of skin lesion images, and is expected to provide an auxiliary diagnosis for professional dermatologists.
The PET/CT imaging technology combining positron emission tomography (PET) and computed tomography (CT) is the most advanced imaging examination method currently, and is mainly used for tumor screening, differential diagnosis of benign and malignant tumors, staging and grading. This paper proposes a method for breast cancer lesion segmentation based on PET/CT bimodal images, and designs a dual-path U-Net framework, which mainly includes three modules: encoder module, feature fusion module and decoder module. Among them, the encoder module uses traditional convolution for feature extraction of single mode image; The feature fusion module adopts collaborative learning feature fusion technology and uses Transformer to extract the global features of the fusion image; The decoder module mainly uses multi-layer perceptron to achieve lesion segmentation. This experiment uses actual clinical PET/CT data to evaluate the effectiveness of the algorithm. The experimental results show that the accuracy, recall and accuracy of breast cancer lesion segmentation are 95.67%, 97.58% and 96.16%, respectively, which are better than the baseline algorithm. Therefore, it proves the rationality of the single and bimodal feature extraction method combining convolution and Transformer in the experimental design of this article, and provides reference for feature extraction methods for tasks such as multimodal medical image segmentation or classification.
In response to the issues of single-scale information loss and large model parameter size during the sampling process in U-Net and its variants for medical image segmentation, this paper proposes a multi-scale medical image segmentation method based on pixel encoding and spatial attention. Firstly, by redesigning the input strategy of the Transformer structure, a pixel encoding module is introduced to enable the model to extract global semantic information from multi-scale image features, obtaining richer feature information. Additionally, deformable convolutions are incorporated into the Transformer module to accelerate convergence speed and improve module performance. Secondly, a spatial attention module with residual connections is introduced to allow the model to focus on the foreground information of the fused feature maps. Finally, through ablation experiments, the network is lightweighted to enhance segmentation accuracy and accelerate model convergence. The proposed algorithm achieves satisfactory results on the Synapse dataset, an official public dataset for multi-organ segmentation provided by the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), with Dice similarity coefficient (DSC) and 95% Hausdorff distance (HD95) scores of 77.65 and 18.34, respectively. The experimental results demonstrate that the proposed algorithm can enhance multi-organ segmentation performance, potentially filling the gap in multi-scale medical image segmentation algorithms, and providing assistance for professional physicians in diagnosis.