Reconstruction from edge image combined with color and gradient difference for industrial surface anomaly detection

2025-05-01 0 0 7.5MB 11 页 10玖币

Reconstruction from edge image combined with color and gradient diﬀerence for

industrial surface anomaly detection

Tongkun Liua, Bing Lia,b, Zhuo Zhaoa,∗

, Xiao Dua, Bingke Jianga, Leqi Genga

aState Key Laboratory for Manufacturing System Engineering, Xi’an Jiaotong University,No.99 Yanxiang Road, Yanta District, 710054,

Xi’an, Shaanxi, China

bInternational Joint Research Laboratory for Micro/Nano Manufacturing and Measurement Technologies, Xi’an Jiaotong University,No.99

Yanxiang Road, Yanta District, 710054, Xi’an, Shaanxi, China

Abstract

Reconstruction-based methods are widely explored in industrial visual anomaly detection. Such methods commonly

require the model to well reconstruct the normal patterns but fail in the anomalies, and thus the anomalies can be

detected by evaluating the reconstruction errors. However, in practice, it’s usually diﬃcult to control the generalization

boundary of the model. The model with an overly strong generalization capability can even well reconstruct the abnormal

regions, making them less distinguishable, while the model with a poor generalization capability can not reconstruct

those changeable high-frequency components in the normal regions, which ultimately leads to false positives. To tackle

the above issue, we propose a new reconstruction network where we reconstruct the original RGB image from its gray

value edges (EdgRec). Speciﬁcally, this is achieved by an UNet-type denoising autoencoder with skip connections. The

input edge and skip connections can well preserve the high-frequency information in the original image. Meanwhile, the

proposed restoration task can force the network to memorize the normal low-frequency and color information. Besides,

the denoising design can prevent the model from directly copying the original high-frequent components. To evaluate

the anomalies, we further propose a new interpretable hand-crafted evaluation function that considers both the color

and gradient diﬀerences. Our method achieves competitive results on the challenging benchmark MVTec AD (97.8% for

detection and 97.7% for localization, AUROC). In addition, we conduct experiments on the MVTec 3D-AD dataset and

show convincing results using RGB images only. Our code will be available at https://github.com/liutongkun/EdgRec.

Keywords: Anomaly detection, Surface defect detection, Denoising autoencoder, MVTec AD, MVTec 3D

1. Introduction

Industrial surface anomaly detection aims at identifying

all types of visible defects that possibly occur during man-

ufacturing. It plays an important role in manufacturing

quality control and has received more and more attention

in recent years. Although existing supervised models have

achieved good performance in many vision tasks, they are

not widely adopted in industrial surface defect detection.

The main reason is that a qualiﬁed production line can

not produce so many defective samples to train such su-

pervised models. More importantly, supervised models are

likely to fail on those unseen types of defects that are not

included in the training set. However, the above issues

can be well mitigated by unsupervised anomaly detection

models. For one thing, there are always adequate nor-

mal samples in the production line, which are suﬃcient

to train an anomaly detection model without any defec-

tive samples. For another, the model trained on normal

samples only captures the distribution of the normal data.

∗Corresponding author at: School of Mechanical Engineering,

Xi’an Jiaotong University, Xi’an, Shaanxi, China.

Therefore, ideally, it’s able to detect any unknown defects

whose distributions deviate from the normal distribution.

Existing methods for industrial visual anomaly detec-

tion can be mainly divided into feature-based methods

and reconstruction-based methods. Feature-based meth-

ods project the original image into a more distinguishable

feature space through ImageNet pre-trained networks or

self-supervised tasks. Generally, these methods achieve

higher performance than reconstruction-based methods

while they are less interpretable and adjustable since it’s

diﬃcult for engineers to read from those abstract fea-

ture vectors. Reconstruction-based methods assume that

the model trained on normal samples can only well re-

construct the normal patterns but fail in anomalies, and

therefore the anomalies can be detected by comparing the

original and reconstructed images through the anomaly

evaluation function. Compared to feature-based methods,

reconstruction-based methods are easier to understand vi-

sually because one can directly observe the diﬀerences be-

tween the original and reconstructed image. Through de-

signing the speciﬁc comparing functions, such methods can

be easily adjusted for speciﬁc situations with human prior

knowledge (e.g., if we only need to detect color anomalies,

arXiv:2210.14485v1 [cs.CV] 26 Oct 2022

then we can just design the color comparing function and

ignore other diﬀerences).

Currently, most reconstruction-based methods are un-

derperforming since it’s hard to control the boundary of

the model’s generalization capability. Concretely, a model

which can not reconstruct the anomalies is also likely to

fail in reconstructing those variable normal patterns, and

vice versa, an overly generalizable model may generalize to

those anomalies, i.e., reconstruct the anomalies well and

thus makes them less distinguishable. Besides, the im-

age will inevitably be degraded (e.g., generating blurred

results in variable regions) during the reconstruction pro-

cess, which bring challenges to the design of anomaly eval-

uation function. Directly using pixel-level l2distance to

compare the original and reconstructed image is usually

unfavorable since it may cause many false alarms in those

degraded normal regions.

Considering the above issues, this paper propose to

boost the performance of the reconstruction-based meth-

ods from two aspects. First, we propose to reconstruct the

image from its gray value edge, which is motivated by [1].

Since the edge retains only partial contents of the original

image, the network needs to generate the removed normal

low-frequency and color contents during training. When

testing the abnormal images, the model is less likely to

generate accurate abnormal patterns as it only sees partial

contents (the abnormal edge) of the image. Meanwhile, the

input edge preserves the important original high-frequency

components, which are usually the hardest parts to be re-

constructed [2]. We further use skip connections to reduce

the loss of those components in the down-sampling pro-

cess. Consequently, the edge and skip connections can

help better reconstruct those complex high-frequency nor-

mal regions and therefore yield fewer false alarms. On the

other hand, the above operations may cause the model di-

rectly copy the edge from the input. Therefore, we use a

denoising autoencoder design to corrupt the original edge

with multi scales pseudo anomalies to avoid an identity

mapping on the edge region. Fig.1illustrates the main

structure of our reconstruction network.

Second, we propose a new color-based evaluation func-

tion and combined it with the existing gradient-based

function[3,4] as our anomaly evaluation function. This

function can eﬀectively detect anomalies while reducing

false alarms caused by image degradation in normal areas.

Finally, our contributions are summarized as follows:

(1) We propose a new reconstruction method for indus-

trial visual anomaly detection where we reconstruct the

images from their edges. Our speciﬁc design can eﬀec-

tively control the generalization capability of the model

between anomalies and normal regions

(2) We propose a new color-based anomaly evaluation

function to detect color anomalies for reconstruction-based

methods. Our function can eﬀectively detect color anoma-

lies and is insensitive to light changes.

(3) We achieve comparable results on the challenging

benchmark MVTec AD and the MVTec 3D-AD dataset

Encoder Decoder

Loss

Back propagation

Encoder Decoder

Comparing

Function

Training

Testing

Fig. 1. For training phase, we ﬁrst corrupt the original image I

with certain noise and thus get IA; then we convert it to grayscale

image Ig

Aand extract the edge Ie

A. Our training goal is to make

the reconstructed image IRas close as possible to the I. For testing

phase, we extract the grayscale edge Ieof the original test image

Iand reconstruct it to the RGB image IR. The anomaly map Ais

obtained by comparing the original and the reconstructed images via

the compare function.

(using RGB images only) for both anomaly detection

and segmentation. Speciﬁcally, our anomaly evaluation

function is totally hand-crafted and therefore more inter-

pretable and adjustable compared to those latent feature-

based evaluation functions (e.g., the separate discrimina-

tive network or the perceptual loss).

2. Related Work

Visual anomaly detection is a widely studied topic

with applications ranging from medical diagnosis[5],

surveillance[6], industrial inspection[7], etc., where there

are usually adequate normal samples while abnormal sam-

ples are rare and diverse. In this paper, we focus on its ap-

plication in industrial surface defect detection. This task

may be more challenging since it requires the model to

not only identify whether there exist anomalies in the im-

age (anomaly detection) but also accurately locate the ab-

normal areas (anomaly segmentation). Bergmann et al.

(2019) [8] propose the MVTec AD (detailed in Sec 4.1),

a comprehensive dataset for industrial visual anomaly de-

tection including 15 diﬀerent industrial products. This

dataset quickly became the most convincing benchmark in

industrial visual anomaly detection and has sparked much

research. Here we mainly divided the existing research

into two groups: feature-based and reconstruction-based.

2.1. Feature based methods

Feature-based methods aim to ﬁnd a feature space

where the normal and abnormal features are fully distin-

guishable. Since there exists no abnormal sample dur-

ing training, it’s preferable to leverage the ImageNet pre-

trained network[9–15] or the models obtained through self-

supervised tasks[16–18], as feature extractors. For the pre-

trained network, several studies[10,15] have found that

it’s important to select appropriate hierarchy levels of fea-

tures, because the low-level features lack global awareness,

while the extremely high-level features may be biased to

the pre-trained task itself. Also, the pre-trained network

can be used as a teacher network to detect anomalies by

knowledge distillation[9]. For the self-supervised based

methods, the key is to design suitable auxiliary tasks.

Li et al.[16] propose to use Cutpaste augmentation to

train a one-class classiﬁer. Other auxiliary tasks includes

the position prediction[17], the geometric transformation

prediction[18], etc.

Overall, beneﬁting from the powerful representation

capabilities of deep features, feature-based methods

can achieve better performance compared to existing

reconstruction-based methods. In particular, [10] achieves

state-of-the-art performance on the MVTec AD. However,

these methods are hard to be optimized for the speciﬁc

situation since those deep features are too abstract to in-

troduce prior knowledge.

2.2. Reconstruction based methods

Reconstruction-based methods commonly leverage gen-

erative models such as autoencoders[4,19,20], VAEs[21],

GANs[5,22], etc., to detect anomalies in the image space.

Generally, these methods contain two steps: 1. Recon-

struct the image; 2. Compare the original and recon-

structed images to get anomaly maps.

Reconstruct the image. Early works mainly lever-

age denoising autoencoders[20,23,24] to help the network

better capture the normal distribution and avoid learning

an identity mapping. In the training phase, these methods

corrupt the original image with certain noise and make the

network eliminate it. In addition to some low-level noise

like Gaussian noise, cutout, stain, etc., the image can also

be corrupted by some semantic transformations, like the

geometric transformation[25–27], color transformation[1],

inpainting masks[4,28,29] etc., which are summarized into

an attribute removal-and-restoration framework by Ye et

al. [1]. They argue that the network can learn more robust

features during the process of restoring the previously re-

moved attributes. Following this paradigm, we propose a

speciﬁc attribute removal-restoration task where the low-

frequency and color attributes are the main attributes to

be restored.

Compare the images. After the reconstruction, the

anomalies can be detected by comparing the original and

reconstructed images. Early comparing functions include

l2distance, structure similarity (SSIM) [30], etc. Further-

more, Zavrtanik et al. [4] introduce a multi-scale gradi-

ent map (MSGMS) anomaly evaluation function which

signiﬁcantly boosts the performance. However, MSGMS

performs poorly on those low-frequency color anomalies.

Later, Zavrtanik et al. [31] further propose to use a sep-

arate discriminative network (DRAEM) which takes the

concatenation of the original and reconstructed images as

input and detects the anomalies via image segmentation.

While DRAEM achieves remarkable performance on the

MVTec AD, the additional discriminative network intro-

duces extra latent features and therefore makes the seg-

mentation results less interpretable. Similarly, the current

state-of-the-art reconstruction-based method OCR-GAN

[22] also leverage latent space features and combine them

with l1distance to detect anomalies. Diﬀerently, in this

paper, we still focus on hand-crafted anomaly score func-

tions, which are more interpretable and adjustable. Con-

cretely, we propose a new color comparing function and

combine it with the existing MSGMS function. The pro-

posed function can eﬀectively detect various anomalies.

3. Methods

Our reconstruction framework is based on an UNet-type

encoder-decoder network with the corrupted grayscale

edge as input. Speciﬁcally, we ﬁrst corrupted the original

image with certain noises; then we convert the corrupted

image into a grayscale edge; after that, we train a network

to reconstruct the original image from its corrupted edge;

ﬁnally, we discuss how to design the anomaly evaluation

function.

3.1. Get the corrupted edges

Our basic idea is to formulate an attribute removal-and-

restoration task that can be suitable for various industrial

anomaly detection scenarios. Speciﬁcally, we construct a

‘grayscale edge to RGB image’ task where we remove the

low frequency and color attributes in the original image

and train a network to restore them. This design is based

on two considerations. First, low-frequency and color con-

tents are general attributes in various images. We notice

that there also exist other tasks such as the restoration of

the geometrically transformed image [18,25–27], However,

compared with our design, these methods are less general,

e.g., the above geometric transformation framework can-

not be applied to spatially invariant textures, while our

design can be applied to both texture and object images.

Second, preserving edge information enables the network

to better reconstruct the details in normal patterns, which

can eﬀectively reduce the false positive rate in complex

normal areas. On the other hand, preserving the edges

may also lead to the model producing identity mappings

of the original high-frequency components. To avoid this,

we ﬁrst corrupt the original image with certain noise.

We adopt the strategy proposed in [31] to generate sim-

ulated anomalies whose textures are from external tex-

ture dataset DTD [32] with the shapes of randomly gen-

erated Perlin noise. However, we observe that if only

use these out-of-distribution textures as pseudo anoma-

lies, the model cannot well distinguish the foreground and

background areas. This makes it diﬃcult to detect struc-

tural defects caused by missing components. Therefore,

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ReconstructionfromedgeimagecombinedwithcolorandgradientdierenceforindustrialsurfaceanomalydetectionTongkunLiua,BingLia,b,ZhuoZhaoa,,XiaoDua,BingkeJianga,LeqiGengaaStateKeyLaboratoryforManufacturingSystemEngineering,Xi'anJiaotongUniversity,No.99YanxiangRoad,YantaDistrict,710054,Xi'an,Shaanxi,Chinab...

展开>> 收起<<

Reconstruction from edge image combined with color and gradient difference for industrial surface anomaly detection.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Reconstruction from edge image combined with color and gradient difference for industrial surface anomaly detection

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: