Semi-supervised object detection based on single-stage detector for thighbone fracture localization

2025-05-03 0 0 4.92MB 22 页 10玖币
侵权投诉
Semi-supervised object detection based on single-stage detector for
thighbone fracture localization
Jinman Weia, Jinkun Yaob, Guoshan Zhanga,
, Bin Guana, Yueming Zhanga, Shaoquan Wanga
aSchool of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China.
bDepartment of Radiology,Linyi People’s Hosptial,276000 Linyi,China.
Abstract
The thighbone is the largest bone supporting the lower body. If the thighbone fracture is not
treated in time, it will lead to lifelong inability to walk. Correct diagnosis of thighbone disease is
very important in orthopedic medicine. Deep learning is promoting the development of fracture
detection technology. However, the existing computer aided diagnosis (CAD) methods baesd
on deep learning rely on a large number of manually labeled data, and labeling these data costs
a lot of time and energy. Therefore, we develop a object detection method with limited labeled
image quantity and apply it to the thighbone fracture localization. In this work, we build
a semi-supervised object detection(SSOD) framework based on single-stage detector, which
including three modules: adaptive difficult sample oriented (ADSO) module, Fusion Box and
deformable expand encoder (Dex encoder). ADSO module takes the classification score as
the label reliability evaluation criterion by weighting, Fusion Box is designed to merge similar
pseudo boxes into a reliable box for box regression and Dex encoder is proposed to enhance the
adaptability of image augmentation. The experiment is conducted on the thighbone fracture
dataset, which includes 3484 training thigh fracture images and 358 testing thigh fracture
images. The experimental results show that the proposed method achieves the state-of the-
art AP in thighbone fracture detection at different labeled data rates, i.e. 1%, 5% and 10%.
Besides, we use full data to achieve knowledge distillation, our method achieves 86.2% AP50
and 52.6% AP75.
Keywords: Semi-supervised Learning; Object Detection; Single-stage; Tighbone Fracture
Detection
1. Introduction
The thighbone is located below the pelvis. Thighbone and acetabulum constitute the hip
joint and play a role in supporting the whole body. Various activities of the human body depend
on the thighbone, so it is one of the most vulnerable part. The diagnosis of ordinary fracture
Corresponding author
Email addresses: 2021234147@tju.edu.cn (Jinman Wei), yjk1213@163.com (Jinkun Yao),
zhanggs@tju.edu.cn (Guoshan Zhang), guanbin@tju.edu.cn (Bin Guan), seife@tju.edu.cn (Yueming
Zhang), sqwang@tju.edu.cn (Shaoquan Wang)
Preprint submitted to Applied Soft Computing October 21, 2022
arXiv:2210.10998v1 [eess.IV] 20 Oct 2022
and comminuted fracture is a significant part of surgical diagnosis[1]. However, compared with
the huge number of patients, there is a lack of excellent surgeons. Therefore, surgeons urgently
need an assistant to relieve their work pressure. In order to solve this problem, many computer-
aided detection and diagnosis methods[2] have been proposed. In recent years, substantial
progress has been made in developing deep learning-based CAD systems to fracture diagnosis.
Guan et al. proposed a convolutional neural network for thighbone fracture detection that can
balance the information of each feature map in ResNeXt’s feature pyramid.[3]. Firat et al.
designed an integrated object detection model for wrist X-ray image fracture detection[4]. At
present, the state-of-the-art fracture detection methods are usually developed based on large-
scale expert annotations such as 5134 labeled CT images for spinal fracture detection[5], 7356
wrist radiographic images[6], 9040 labeled hand, wrist, knee, ankle, foot and ankle radiographs
for multiple fracture detection[7].
Compared with the above-mentioned methods, semi-supervised learning (SSL) uses both
labeled data and unlabeled data when training the model, and uses unlabeled data to assist in
optimizing the model, so as to save training cost. The state-of-the-art semi-supervised methods
are the pseudo-label based approaches[8]. Specifically, the model is trained on labeled data,
and then the trained model is used to predict the pseudo labels on unlabeled images. Teacher-
student model[9] is a common method to generate pseudo labels in semi-supervised learning in
which key idea is to train two independent models, namely teacher model and student model.
The teacher model is trained on the labeled images to label the unlabeled images and then mix
these pseudo labeled images with the labeled images to train the student model.
Most research on SSOD has focused on the two-stage detectors[10–13]. But basing on the
single-stage detectors (such as FCOS[14], YOLOF[15], RestinaNet[16]) has more attractive
and practical, because they can be easily deployed on devices with limited resources, eliminate
cumbersome preprocessing and post-processing except for NMS[17]. The main difference be-
tween the single-stage detectors and the two-stage detectors is that Region proposal network
(RPN)[18] of the two-stage detector can filter most of the background samples, and in the next
stage, the remaining candidate boxes are further predicted the detailed categories. The single-
stage detectors make dense prediction for all areas of the image at one time, as long as few
bounding boxes can be predicted as positive samples. Because in the single-stage detectors, the
generation and judgment of the proposal are integrated, this lead to that the detection speed
is faster but the classification score of one-stage detectors is lower than two-stage detectors.
And directly sending pseudo labels with low classification score into student model will bring a
lot of noise and affect the training accuracy of the model. Therefore, how to deal with a large
number of low-quality pseudo labels in dense prediction is still an important problem.
Regression branch is another component of object detection task. The regression quality
of pseudo box is another important factor that determines the performance of semi supervised
target detection model. Xu et al.[20] find that the accuracy of the regression is related to the
uncertainty calculated by the BoxJitter module, but the BoxJitter module relies on Regions
2
with CNN features(RCNN) to process the proposal, so it is not applicable to the single-stage
detector. To address this issue, we propose the Fusion Box module in the regression branch for
SSOD based on single-stage detector.
In summary, we develop a semi-supervised framework based on the single-stage detector
for the thighbone fracture detection. In this framework, the adaptive difficult sample ori-
ented(ADSO) module and the Fusion Box module are developed to reduce the impact of in-
accurate pseudo label prediction. In addition, The Single-in-Single-out (SISO) encoder called
Dex encoder is proposed to improve the adaptability of the augmented input images. The main
contributions of this paper can be summarized as follows:
1. We developed the semi-supervised object detection framework based on single-stage
detector for thighbone fracture detection with limited annotations. Compared with previous
work, it has fewer parameters and faster detection speed.
2. The adaptive difficult sample oriented (ADSO) module is proposed to take the classifi-
cation score of teacher model as the criterion of pseudo labels reliability.
3. The Fusion Box module is proposed to reduce the impact of multiple pseudo boxes
regression in the same position on model performance.
4. We design a Single-in-Single-out encoder named Deformable expand encoder (Dex en-
coder) for enhancing the learning ability of of deformed features.
5. The experimental results show that compared with supervised and semi-superviesed
methods, our method is better than other methods in thighbone fracture detection.
2. Related work
2.1. Deep learning for medical detection
CAD has been extensively studied in the past decade[21, 22], and CAD system based on deep
learning has been developed to diagnose a wide range of Pathology such as detection of covid-
19[23, 24], mass and calcification features in mammography[25] and brain tumor diagnosis[26].
In the fracture detection method based on deep learning[27], FAMO[7] constructed the Feature
Ambiguity Mitigate Operator model to mitigate feature ambiguity in bone fracture detection
on radiographs of various body parts. Due to the requirements of medical professional knowl-
edge, the labor cost of large-scale annotations is expensive which hinders the development of
CAD solutions based on deep learning. Computer aided detection using SSL method is an
emerging task in recent years, such as Yirui Wang et al. proposed the adaptive asymmetric
label sharpening (AALS) algorithm using the teacher-student model paradigm, which solves
the label imbalance problem unique to the medical field[28].
2.2. object detection
Object detection is one of the core tasks in computer vision. At present, the object detector
based on CNN can be divided into single-stage and two-stage detectors. FasterRcnn[18] is
3
Figure 1: The structure of YOLOF.
the representative two-stage detector, which uses RPN network for proposal extraction and
RCNN head for regional prediction and extraction of objects. The single-stage detector only
uses the features extracted by the feature extraction network for regression and classification.
For example, SSD[29] uses the feature pyramid method to complete target regression and
classification on different scale features at the same time. Chen et al. developed the YOLOF
that only uses C5 feature for detection as shown in Figure 1 in which the complex Multiple-in-
Multiple-out encoder is replaced by the simple Single-in-Single-out encoder, YOLOF containing
two key components: dilated encoder and uniform decoder.
2.3. Semi-supervised learning in object detection
SSL method plays a leading role in image classification[31–35]. Because the object detector
has complex architecture design and multi task learning (classification and regression), it is not a
simple work to transfer the SSL method to the object detection task. The current SSOD method
mainly has two directions: Consistency Regularization[36] and Pesudo Label[8]. The former
uses two deep convolution neural networks to learn the consistency between different data
augmentation[37] (horizontal flip, different contrast, brightness, etc.) of the same unlabeled
image, and make the image prediction to small disturbance the same. The latter uses the
pre-training model learned on labeled data to infer the unlabeled data. In recent years, semi-
supervised object detection method has attracted people’s attention[38–40]. STAC[19] first
applies pseudo label method to SSOD, it apply weak data augmentation to unlabeled data,
and uses the trained teacher model to generate pseudo labels of unlabeled images. Unbiased
teacher[41] uses focal loss[16] to solve the imbalance between positive and negative samples.
Instant teaching[42] trains two models at the same time to check and correct pseudo labels for
each other, so as to effectively suppress the accumulation of false predictions. Almost all the
above work is based on the two-stage detector, such as FasterRcnn, which is not convenient to
develop in the medical field with limited resources. Inspired by the above works, we designed
a fast semi-supervised detection model based on the single-stage detector.
4
Figure 2: The pipeline of established semi-supervised object detection framework: the labeled images and
unlabeled images are sent into the training pipeline in batches. The teacher labels the unlabeled images with
pseudo labels as student’s ground truth and the teacher does not back propagate. The student model adopts
EMA to transfer parameters and update the teacher model. The ADSO of classification (Cls) branch adjusts
the confidence of the pseudo labels to evaluate the reliability of the pseudo label. The regression(Reg) branch
judges whether to merge the pseudo boxes according to similarity ξ. The loss function of classification branch
and regression branch adopts focal loss and CIOU loss respectively.
3. Methology
Our method adopts the teacher-student mutual learning mode in which the student model
learns from the detection loss of labeled and unlabeled images. The unlabeled images have two
groups of pseudo boxes, which are used for classification branch and regression branch training,
respectively. The teacher model is updated by using the student model with exponential moving
average (EMA). The pseudo boxes predicted by the teacher model will be filtered by confidence
at first, and then the pseudo labels with classification scores higher than the confidence threshold
σwill be retained. The remaining pseudo boxes will be sent to the classification branch and
regression branch. In this SSOD framework, There are two critical designs: ADSO and Fusion
Box. Figure 2 shows the description of our SSOD framework.
3.1. Semi-supervised learning framework
In each training iteration, unlabeled images and labeled images are extracted according to a
certain data sampling ratio. The data are preprocessed by two different preprocessing methods
to obtain strong augmented labeled images, weak augmented and strong augmented unlabeled
images. The student network is trained with the pseudo boxes generated by teacher model and
5
摘要:

Semi-supervisedobjectdetectionbasedonsingle-stagedetectorforthighbonefracturelocalizationJinmanWeia,JinkunYaob,GuoshanZhanga,,BinGuana,YuemingZhanga,ShaoquanWangaaSchoolofElectricalandInformationEngineering,TianjinUniversity,Tianjin,300072,China.bDepartmentofRadiology,LinyiPeople'sHosptial,276000Li...

展开>> 收起<<
Semi-supervised object detection based on single-stage detector for thighbone fracture localization.pdf

共22页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:22 页 大小:4.92MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 22
客服
关注