DyAnNet A Scene Dynamicity Guided Self-Trained Video Anomaly Detection Network

2025-08-18 2 0 9.15MB 10 页 10玖币

DyAnNet: A Scene Dynamicity Guided Self-Trained Video Anomaly Detection

Network

Kamalakar Vijay Thakare1, Yash Raghuwanshi1, Debi Prosad Dogra1, Heeseung Choi2,3, and Ig-Jae Kim2,3

1Indian Institute of Technology, Bhubaneswar, Odisha, 752050, India

2Artiﬁcial Intelligence and Robotics Institute, Korea Institute of Science and Technology, Seoul 02792, Republic of Korea

3Yonsei-KIST Convergence Research Institute, Yonsei University, Seoul 03722, Republic of Korea

{tkv15, yr15, dpdogra}@iitbbs.ac.in, {hschoi, drjay}@kist.re.kr

Abstract

Unsupervised approaches for video anomaly detection

may not perform as good as supervised approaches. How-

ever, learning unknown types of anomalies using an unsuper-

vised approach is more practical than a supervised approach

as annotation is an extra burden. In this paper, we use iso-

lation tree-based unsupervised clustering to partition the

deep feature space of the video segments. The RGB- stream

generates a pseudo anomaly score and the ﬂow stream gen-

erates a pseudo dynamicity score of a video segment. These

scores are then fused using a majority voting scheme to gen-

erate preliminary bags of positive and negative segments.

However, these bags may not be accurate as the scores are

generated only using the current segment which does not

represent the global behavior of a typical anomalous event.

We then use a reﬁnement strategy based on a cross-branch

feed-forward network designed using a popular I3D network

to reﬁne both scores. The bags are then reﬁned through a

segment re-mapping strategy. The intuition of adding the

dynamicity score of a segment with the anomaly score is

to enhance the quality of the evidence. The method has

been evaluated on three popular video anomaly datasets, i.e.,

UCF-Crime, CCTV-Fights, and UBI-Fights. Experimental

results reveal that the proposed framework achieves com-

petitive accuracy as compared to the state-of-the-art video

anomaly detection methods.

1. Introduction

Video Anomaly Detection (VAD) imposes a critical re-

quirement in visual surveillance. Generally, video anomaly

detection task covers a large spectrum including road trafﬁc

monitoring [

], violence detection [

], human

behaviour [

], crowd monitoring [

], etc. Visual

surveillance is primarily done by public and private agencies

Video Segment Optical Flow

Pseudo Label

Assignment

Regressors

Training

Optimized

Regressors

Bags of

Segments

Refining Bag Labels

Pass = 1

Pass > 1

Stage 1 Stage 2 Stage 3

Figure 1.

Overview

. In the ﬁrst stage, we obtain low-conﬁdence

pseudo labels. In the second stage, we incorporate iterative learning

to train regressor networks using these labels. After successful

training, we replace older labels with more conﬁdent labels in

the third stage and retrain the regressors. After a few passes, an

optimized version of regressors is used to predict the anomaly

score.

on a large scale. Hence researchers easily get humongous

data analytic task while analyzing and annotating large video

data. Moreover, recent existing video anomaly detection

methods [

] heavily depend on full or

weak supervision. However, generating annotations for such

huge datasets is labor-intensive and time-consuming.

In recent years, unsupervised approaches for video

anomaly detection are being outnumbered by supervised

or semi-supervised methods. Ravanbaksh et al. [

] have

trained Generative Adversarial Nets (GANs) for video

anomaly detection. Nguyen et al. [

] have concatenated

appearance and motion encoders and decoders for accom-

plishing the job. Gong et al. [

] have proposed Memory-

augmented Autoencoders (MemAEs) to detect video anoma-

lies. The main advantage of using GANs or AEs is their

capability to capture high-level video features. Recently,

Doshi et al. [

] have proposed a continual learning frame-

arXiv:2211.00882v1 [cs.CV] 2 Nov 2022

work in which the model incrementally trains as the data

arrives without forgetting the learnt (past) information. This

type of framework can be feasible in visual surveillance as

video data keep coming into the monitoring systems. How-

ever, all these approaches suffer a few limitations as follows:

(1) in continual learning, a separate mechanism needs to be

designed to avoid catastrophic forgetting [

], (2) GANs and

AEs are highly vulnerable to unstable training, i.e., a subtle

change in data imposes large changes in the labels, thus af-

fecting the normal distribution, (3) most of the state-of-art

VAD methods heavily depend on labeled normal/abnormal

data, and (4) VAD approaches either utilize appearance-

based features or deep features.

To address these limitations, we adopt an iterative learn-

ing [

] mechanism in which models are repeatedly tuned

with more reﬁned data during each pass. Moreover, we

aim to combine the technical advantages of continual and

AEs learning. Our proposed framework combines the power

of DNNs with well-justiﬁed handcrafted motion features.

These spatio-temporal features equipped with low-level mo-

tion features help to detect wide range of anomalies. The

framework can also be retrained in an end-to-end fashion as

input data arrives. The overview of the proposed framework

is depicted in Fig. 1. It is divided into three stages: i) pseudo

label assignment, ii) regressors training, and iii) reﬁnement

of labels using optimized regressors. For enabling the re-

gressors to understand subtle anomalies, we have obtained

motion features, namely dynamicity score using optical ﬂow.

In the ﬁrst stage, we do not know the actual labels; hence we

have obtained intermediate low conﬁdence anomaly labels

using OneClassSVM and iForest [

]. We also obtain the

dynamicity labels using dynamicity scores. We have trained

two regressor networks in the second stage by using the la-

bels generated in the ﬁrst stage. This is an iterative process to

improve the conﬁdence scores. In this way, both regressors

are trained over reﬁned labels and they learn discriminating

features. The iterative learning approach also ensures that

both the regressors learn new distinguish patterns without

losing the past information. We have experimentally found

that for ﬁrst few iterations, both regressors gradually learn

internal patterns and stabilizes after some iterations. Both

regressors are trained independently in parallel. Precisely,

in iterative learning, the model is retrained using reﬁned

data in each iteration. In this way, the proposed approach

do not need any level of supervision. However, some form

of supervision is mandatory for continual learning [

] or

weakly-supervised methods [

]. These methods

consider a video anomalous even if a small segment contains

anomaly. In contrast, we identify anomalous segments using

dynamicity and anomaly scores estimated using unsuper-

vised ways, thus eliminating the requirement of supervision.

To achieve this, we have made the following contributions:

•

design an unsupervised end-to-end video anomaly de-

tection framework that uses iterative learning to tune

the model using reﬁned labels in each iteration;

•

propose a novel technique to assign intermediate labels

in unsupervised scenarios by combining deep features

with well-justiﬁed motion features and;

•

conduct extensive experiments to understand the ef-

fectiveness of the proposed framework with respect to

other state-of-the-art methods.

The rest of the paper is organized as follows. In the next

section, we present the related work. In Sec. 3, we present the

proposed framework. Experiments and results are presented

in Sec. 4. The conclusions and future works are presented in

Sec. 5.

2. Related Work

Existing work in the Video Anomaly Detection (VAD)

domain largely draw motivation from activity recognition

and scene understanding [

]. These methods utilize various

types of video features, training procedures or both. In

this section, we brieﬂy discuss the main categories that are

extensively followed in very recent VAD approaches.

2.1. Reconstruction-based Approaches

Several VAD approaches [

]

employ Autoencoders (AEs), Generative Adversarial Nets

(GANs) and their variants under the assumption that the

models that are explicitly trained on normal data may not

be successful to reconstruct abnormal event as such samples

are usually absent in the training set. Park et al. [

] have

used AE to generate cuboids within normal frames using

spatial and temporal transformation. Zaheer et al. [

] have

generated good quality reconstructions using the current gen-

erator and used the previous state generator to obtain bad

quality examples. This way, the new discriminator learns

to detect even small distortions in abnormal input. Gong et

al. [

] have introduced a memory module to AE and con-

structed MemAE. This is an improved version of existing

AE. Szymanowicz et al. [

] have trained an AE to obtain

saliency maps using ﬁve consecutive frames and per-pixel

prediction error. Ravanbakhsh et al. [

] have imposed clas-

sic adversarial training using GANs to detect anomalous

activity. However, the effectiveness of these approaches

is highly dependent on the reconstruction capabilities of

the model. Failing which, it may signiﬁcantly degrade the

model’s performance.

2.2. Features-based Approaches

Primarily, features-based VAD approaches can be cate-

gorized by anomaly detection using either handcrafted or

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DyAnNet:ASceneDynamicityGuidedSelf-TrainedVideoAnomalyDetectionNetworkKamalakarVijayThakare1,YashRaghuwanshi1,DebiProsadDogra1,HeeseungChoi2,3,andIg-JaeKim2,31IndianInstituteofTechnology,Bhubaneswar,Odisha,752050,India2ArtificialIntelligenceandRoboticsInstitute,KoreaInstituteofScienceandTechnology,S...

展开>> 收起<<

DyAnNet A Scene Dynamicity Guided Self-Trained Video Anomaly Detection Network.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

DyAnNet A Scene Dynamicity Guided Self-Trained Video Anomaly Detection Network

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: