The Devil is in the Conflict Disentangled Information Graph Neural Networks For Fraud Detection

2025-05-02 0 0 894.76KB 10 页 10玖币
侵权投诉
The Devil is in the Conflict: Disentangled
Information Graph Neural Networks For Fraud
Detection
Zhixun Li1,, Dingshuo Chen2,3,, Qiang Liu2,3, Shu Wu2,3,*
1School of Computer Science and Technology, Beijing Institute of Technology
2Center for Research on Intelligent Perception and Computing, National Laboratory of Pattern Recognition,
Institute of Automation, Chinese Academy of Sciences
3School of Artificial Intelligence, University of Chinese Academy of Sciences
lizhixun@bit.edu.cn, dingshuo.chen@cripac.ia.ac.cn, {qiang.liu, shu.wu}@nlpr.ia.ac.cn
Abstract—Graph-based fraud detection has heretofore received
considerable attention. Owning to the great success of Graph
Neural Networks (GNNs), many approaches adopting GNNs for
fraud detection has been gaining momentum. However, most
existing methods are based on the strong inductive bias of
homophily, which indicates that the context neighbors tend to
have same labels or similar features. In real scenarios, fraudsters
often engage in camouflage behaviors in order to avoid detection
system. Therefore, the homophilic assumption no longer holds,
which is known as the inconsistency problem. In this paper, we
argue that the performance degradation is mainly attributed to
the inconsistency between topology and attribute. To address
this problem, we propose to disentangle the fraud network
into two views, each corresponding to topology and attribute
respectively. Then we propose a simple and effective method
that uses the attention mechanism to adaptively fuse two views
which captures data-specific preference. In addition, we further
improve it by introducing mutual information constraints for
topology and attribute. To this end, we propose a Disentangled
Information Graph Neural Network (DIGNN) model, which
utilizes variational bounds to find an approximate solution to our
proposed optimization objective function. Extensive experiments
demonstrate that our model can significantly outperform state-
of-the-art baselines on real-world fraud detection datasets.
Index Terms—Graph Neural Networks, Fraud Detection, In-
formation Theory
I. INTRODUCTION
Graph-based fraud detection is a crucial task and has
tremendous impact in various applications, such as opinion
fraud detection [1], fake news detection [2], [3], review spams
[4] and financial fraud detection [5], [6]. In these scenarios, as
graph can effectively model the correlations among entities,
interactive activities on platform can be characterized as a
graph, where users or objects are often treated as nodes, and
transactions or relations between them are treated as edges.
Numerous techniques have been proposed to detect the
fraudsters. Recently, driven by the powerful representation
capability of graph structure and advances of Graph Neural
Networks (GNNs) [7]–[9], many approaches try to harness
The first two authors contributed equally to this work.
*Corresponding author.
GNNs for fraud detection on either homogeneous or hetero-
geneous graphs. The main idea is to leverage GNNs to learn
expressive node representations with the goal of distinguishing
abnormal nodes from the normal ones in the latent embedding
space. Message-Passing GNNs (MP-GNNs) are mainstream-
ing in recent years, which aggregate neighbor node features
and achieve local smoothing by stacking layers. Although
MP-GNNs can obtain satisfactory performance on most of
cases, the strong inductive bias of homophily limits their
representative ability on heterophilic graphs. Some works [10]
point out that plentiful GNNs can be seen as low-pass filters,
so their generalization ability on high frequency graph signals
are poor. In fraud detection task, fraudsters often imitate
normal users in order to camouflage themselves, hence they
will interact with normal users more frequently. For instance,
normal users account for 81% of the fraudsters’ neighbor
nodes in YelpChi dataset (Figure 1). In other words, fraudsters’
features are inconsistent with their behaviors (interactions,
e.g., topological structure). Thus, recall that MP-GNNs do
not work well on heterophilic graphs, they fail to tackle the
inconsistency phenomenon in graph-based fraud detection and
fraudsters could fool the detection system.
Recently, a few works have noticed this problem, and they
employ aggregating weights to reduce the adverse impact
of dissimilar neighbors, or set similarity-aware thresholds to
select and re-link similar nodes. For instance, GraphConsis
[11] computes consistent score between connected node pairs
as the sampling probability. PC-GNN [6] combines label
information and latent embeddings as distance function to
measure similarity. Although such methods can alleviate the
inconsistency problem in some extent, they discard a lot of
information during filting dissimilar neighbors out, thus they
may lead to sub-optimal performance.
In this paper, we analyze the inconsistency problem in
graph-based fraud detection task, which has been obstruct-
ing a full understanding of this field. First, we clarify that
the inconsistency problem is the bottleneck of graph fraud
detection. According to [12], the underlying optimization
process of GNNs is equivalent with minimizing the topology
arXiv:2210.12384v1 [cs.LG] 22 Oct 2022
and attribute constraints, and Yang et al. [13] indicates that
the degradation of performance is imputed to the compro-
mise between topology and attribute. Due to the camouflage
behaviors (topology) of fraudsters, which are inconsistent
with their essence (attribute), this conflict in fraud networks
may injure the discriminative ability of GNNs. Second, the
forefronts of different datasets are diverse, and most existing
methods are not satisfactory in fusing topological structures
and node attributes [14]. For example, fraudsters may possess
distinguishable attribute on some platforms, but their deceptive
behaviors can confuse the detection model. Therefore, we
are motivated to explore a novel method that is able to
minimize the conflict between topology and attribute and
meanwhile effectively extract most task-relevant information
from datasets.
We borrow the concept of multi-view learning problems
to graph-based fraud detection task and propose a simple
and effective model, Disentangled Information Graph Neural
Networks (DIGNN). Technically, we first disentangle fraud
networks into topology and attribute views. Next, we employ
attention mechanism to fuse two view embeddings adaptively
for extracting task-relevant information. Surprisingly, we ob-
serve that this simple method surpasses all state-of-the-art
baselines. This empirically proves that the conflict between
topology and attribute causes the inconsistency problem. Be-
sides, to further decrease the entanglement between topology
and attribute and improve the performance, we design a new
optimization objective based on information theory, which
resorts to variational bounds to minimize mutual information
between two views and maximize the mutual information
between view embeddings and original inputs.
We conduct extensive experiments to compare our proposed
model with existing graph-based fraud detection models, the
results demonstrate the effectiveness of our model. In sum-
mary, the contributions of this paper can be summarized as
follows:
We analyze the cause of the inconsistency problem, and
point out that it is mainly attributed to the conflict
between topology and attribute. In light of this, we
propose a simple yet effective model, DIGNN, which
firstly disentangles fraud network into two views and
fuses them by attention mechanism.
We propose a novel optimization objective based on
mutual information theory and theoretically derive its
upper bound for tractable calculation.
We verify the effectiveness of our model on real-world
fraud detection datasets. It is shown that our model is
able to significantly improve the performance in terms of
all commonly adopted metrics.
II. RELATED WORK
A. Graph-based Fraud Detection
The core idea of graph-based fraud detection task is tak-
ing the advantages of GNNs to get the discriminative node
embeddings, and find out the malicious ones in the latent
Benign Fraudster Heterophilic
Relation Homophilic
Relation
(a) (b)
Fig. 1. (a) Illustration of graph-based fraud detection. (b) Neighbor distribu-
tion of fraudsters and benign users in YelpChi dataset.
space. Examples include [11], [15], [16] for review fraud
detection, [2], [3] for fake news detection and [5], [6], [17]–
[19] for financial fraud detection. Ma et al. [20] provides a
comprehensive investigation on graph-based fraud detection.
Most of existing GNNs methods holds homophilic as-
sumption that neighbor nodes share same labels or similar
features. However, fraudsters will try to conceal themselves,
so that their features are inconsistent with their camouflage
behaviors. Some graph-based fraud detection works have no-
ticed this problem. GraphConsis [11] pioneers to formulate
and tackle the inconsistency problem. They introduce three
kinds of inconsistency phenomenon existing in fraud networks.
CARE-GNN [15] devises a label-aware similarity measure to
find informative neighboring nodes and utilizes reinforcement
learning to select similar neighbors. FRAUDRE [21] aggre-
gates difference between adjacent node pairs. PC-GNN [6]
devises a choose operation to select beneficial neighbors based
on feature similarity. IHGAT [22] is devised to encode both
sequence-like intentions and relationship among transactions
for leveraging the cross-interaction information.
Our model is different from all above works. We in-
novatively disentangle topology and attribute and consider
graph learning as a multi-view learning problem, instead of
measuring similarity between adjacent node pairs.
B. Multi-view on GNNs
Topology and attribute are two essential compositions of
graphs. However existing state-of-the-art GNN models are
disable to effectively fuse topological structure and node
attributes. AM-GCN [14] uses k-nearest neighbor to con-
struct feature graph and combine it with topological struc-
ture view and common embeddings. SCRL [23] designs a
self-supervised approach to maximize the agreement of the
embeddings in the topology graph and the feature graph. A
recent work [13] claims that the interference between topology
and attribute is mainly ascribed to compromises between
them. LINKX [24] processes node attributes and topological
structure in an orthogonal manner. In this paper, we also follow
摘要:

TheDevilisintheConict:DisentangledInformationGraphNeuralNetworksForFraudDetectionZhixunLi1,y,DingshuoChen2,3,y,QiangLiu2,3,ShuWu2,3,*1SchoolofComputerScienceandTechnology,BeijingInstituteofTechnology2CenterforResearchonIntelligentPerceptionandComputing,NationalLaboratoryofPatternRecognition,Institu...

展开>> 收起<<
The Devil is in the Conflict Disentangled Information Graph Neural Networks For Fraud Detection.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:894.76KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注