DyAnNet A Scene Dynamicity Guided Self-Trained Video Anomaly Detection Network

2025-08-18 1 0 9.15MB 10 页 10玖币
侵权投诉
DyAnNet: A Scene Dynamicity Guided Self-Trained Video Anomaly Detection
Network
Kamalakar Vijay Thakare1, Yash Raghuwanshi1, Debi Prosad Dogra1, Heeseung Choi2,3, and Ig-Jae Kim2,3
1Indian Institute of Technology, Bhubaneswar, Odisha, 752050, India
2Artificial Intelligence and Robotics Institute, Korea Institute of Science and Technology, Seoul 02792, Republic of Korea
3Yonsei-KIST Convergence Research Institute, Yonsei University, Seoul 03722, Republic of Korea
{tkv15, yr15, dpdogra}@iitbbs.ac.in, {hschoi, drjay}@kist.re.kr
Abstract
Unsupervised approaches for video anomaly detection
may not perform as good as supervised approaches. How-
ever, learning unknown types of anomalies using an unsuper-
vised approach is more practical than a supervised approach
as annotation is an extra burden. In this paper, we use iso-
lation tree-based unsupervised clustering to partition the
deep feature space of the video segments. The RGB- stream
generates a pseudo anomaly score and the flow stream gen-
erates a pseudo dynamicity score of a video segment. These
scores are then fused using a majority voting scheme to gen-
erate preliminary bags of positive and negative segments.
However, these bags may not be accurate as the scores are
generated only using the current segment which does not
represent the global behavior of a typical anomalous event.
We then use a refinement strategy based on a cross-branch
feed-forward network designed using a popular I3D network
to refine both scores. The bags are then refined through a
segment re-mapping strategy. The intuition of adding the
dynamicity score of a segment with the anomaly score is
to enhance the quality of the evidence. The method has
been evaluated on three popular video anomaly datasets, i.e.,
UCF-Crime, CCTV-Fights, and UBI-Fights. Experimental
results reveal that the proposed framework achieves com-
petitive accuracy as compared to the state-of-the-art video
anomaly detection methods.
1. Introduction
Video Anomaly Detection (VAD) imposes a critical re-
quirement in visual surveillance. Generally, video anomaly
detection task covers a large spectrum including road traffic
monitoring [
33
,
37
], violence detection [
21
,
24
,
31
], human
behaviour [
14
,
23
,
25
], crowd monitoring [
3
,
43
], etc. Visual
surveillance is primarily done by public and private agencies
Video Segment Optical Flow
Pseudo Label
Assignment
Regressors
Training
Training
Optimized
Regressors
Bags of
Segments
Refining Bag Labels
Pass = 1
Pass > 1
Stage 1 Stage 2 Stage 3
Figure 1.
Overview
. In the first stage, we obtain low-confidence
pseudo labels. In the second stage, we incorporate iterative learning
to train regressor networks using these labels. After successful
training, we replace older labels with more confident labels in
the third stage and retrain the regressors. After a few passes, an
optimized version of regressors is used to predict the anomaly
score.
on a large scale. Hence researchers easily get humongous
data analytic task while analyzing and annotating large video
data. Moreover, recent existing video anomaly detection
methods [
9
,
23
,
25
,
33
,
38
,
48
,
49
] heavily depend on full or
weak supervision. However, generating annotations for such
huge datasets is labor-intensive and time-consuming.
In recent years, unsupervised approaches for video
anomaly detection are being outnumbered by supervised
or semi-supervised methods. Ravanbaksh et al. [
36
] have
trained Generative Adversarial Nets (GANs) for video
anomaly detection. Nguyen et al. [
27
] have concatenated
appearance and motion encoders and decoders for accom-
plishing the job. Gong et al. [
10
] have proposed Memory-
augmented Autoencoders (MemAEs) to detect video anoma-
lies. The main advantage of using GANs or AEs is their
capability to capture high-level video features. Recently,
Doshi et al. [
8
] have proposed a continual learning frame-
arXiv:2211.00882v1 [cs.CV] 2 Nov 2022
work in which the model incrementally trains as the data
arrives without forgetting the learnt (past) information. This
type of framework can be feasible in visual surveillance as
video data keep coming into the monitoring systems. How-
ever, all these approaches suffer a few limitations as follows:
(1) in continual learning, a separate mechanism needs to be
designed to avoid catastrophic forgetting [
8
], (2) GANs and
AEs are highly vulnerable to unstable training, i.e., a subtle
change in data imposes large changes in the labels, thus af-
fecting the normal distribution, (3) most of the state-of-art
VAD methods heavily depend on labeled normal/abnormal
data, and (4) VAD approaches either utilize appearance-
based features or deep features.
To address these limitations, we adopt an iterative learn-
ing [
44
] mechanism in which models are repeatedly tuned
with more refined data during each pass. Moreover, we
aim to combine the technical advantages of continual and
AEs learning. Our proposed framework combines the power
of DNNs with well-justified handcrafted motion features.
These spatio-temporal features equipped with low-level mo-
tion features help to detect wide range of anomalies. The
framework can also be retrained in an end-to-end fashion as
input data arrives. The overview of the proposed framework
is depicted in Fig. 1. It is divided into three stages: i) pseudo
label assignment, ii) regressors training, and iii) refinement
of labels using optimized regressors. For enabling the re-
gressors to understand subtle anomalies, we have obtained
motion features, namely dynamicity score using optical flow.
In the first stage, we do not know the actual labels; hence we
have obtained intermediate low confidence anomaly labels
using OneClassSVM and iForest [
19
]. We also obtain the
dynamicity labels using dynamicity scores. We have trained
two regressor networks in the second stage by using the la-
bels generated in the first stage. This is an iterative process to
improve the confidence scores. In this way, both regressors
are trained over refined labels and they learn discriminating
features. The iterative learning approach also ensures that
both the regressors learn new distinguish patterns without
losing the past information. We have experimentally found
that for first few iterations, both regressors gradually learn
internal patterns and stabilizes after some iterations. Both
regressors are trained independently in parallel. Precisely,
in iterative learning, the model is retrained using refined
data in each iteration. In this way, the proposed approach
do not need any level of supervision. However, some form
of supervision is mandatory for continual learning [
8
] or
weakly-supervised methods [
27
,
38
,
48
]. These methods
consider a video anomalous even if a small segment contains
anomaly. In contrast, we identify anomalous segments using
dynamicity and anomaly scores estimated using unsuper-
vised ways, thus eliminating the requirement of supervision.
To achieve this, we have made the following contributions:
design an unsupervised end-to-end video anomaly de-
tection framework that uses iterative learning to tune
the model using refined labels in each iteration;
propose a novel technique to assign intermediate labels
in unsupervised scenarios by combining deep features
with well-justified motion features and;
conduct extensive experiments to understand the ef-
fectiveness of the proposed framework with respect to
other state-of-the-art methods.
The rest of the paper is organized as follows. In the next
section, we present the related work. In Sec. 3, we present the
proposed framework. Experiments and results are presented
in Sec. 4. The conclusions and future works are presented in
Sec. 5.
2. Related Work
Existing work in the Video Anomaly Detection (VAD)
domain largely draw motivation from activity recognition
and scene understanding [
38
]. These methods utilize various
types of video features, training procedures or both. In
this section, we briefly discuss the main categories that are
extensively followed in very recent VAD approaches.
2.1. Reconstruction-based Approaches
Several VAD approaches [
1
,
10
,
22
,
27
,
29
,
30
,
39
,
46
]
employ Autoencoders (AEs), Generative Adversarial Nets
(GANs) and their variants under the assumption that the
models that are explicitly trained on normal data may not
be successful to reconstruct abnormal event as such samples
are usually absent in the training set. Park et al. [
29
] have
used AE to generate cuboids within normal frames using
spatial and temporal transformation. Zaheer et al. [
46
] have
generated good quality reconstructions using the current gen-
erator and used the previous state generator to obtain bad
quality examples. This way, the new discriminator learns
to detect even small distortions in abnormal input. Gong et
al. [
10
] have introduced a memory module to AE and con-
structed MemAE. This is an improved version of existing
AE. Szymanowicz et al. [
39
] have trained an AE to obtain
saliency maps using five consecutive frames and per-pixel
prediction error. Ravanbakhsh et al. [
36
] have imposed clas-
sic adversarial training using GANs to detect anomalous
activity. However, the effectiveness of these approaches
is highly dependent on the reconstruction capabilities of
the model. Failing which, it may significantly degrade the
model’s performance.
2.2. Features-based Approaches
Primarily, features-based VAD approaches can be cate-
gorized by anomaly detection using either handcrafted or
摘要:

DyAnNet:ASceneDynamicityGuidedSelf-TrainedVideoAnomalyDetectionNetworkKamalakarVijayThakare1,YashRaghuwanshi1,DebiProsadDogra1,HeeseungChoi2,3,andIg-JaeKim2,31IndianInstituteofTechnology,Bhubaneswar,Odisha,752050,India2ArtificialIntelligenceandRoboticsInstitute,KoreaInstituteofScienceandTechnology,S...

展开>> 收起<<
DyAnNet A Scene Dynamicity Guided Self-Trained Video Anomaly Detection Network.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:9.15MB 格式:PDF 时间:2025-08-18

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注