LaundroGraph Self-Supervised Graph Representation Learning for Anti-Money Laundering

2025-05-03 0 0 932.77KB 9 页 10玖币
侵权投诉
LaundroGraph: Self-Supervised Graph Representation Learning
for Anti-Money Laundering
Mário Cardoso
mario.cardoso@feedzai.com
Feedzai
Pedro Saleiro
pedro.saleiro@feedzai.com
Feedzai
Pedro Bizarro
pedro.bizarro@feedzai.com
Feedzai
ABSTRACT
Anti-money laundering (AML) regulations mandate nancial insti-
tutions to deploy AML systems based on a set of rules that, when
triggered, form the basis of a suspicious alert to be assessed by hu-
man analysts. Reviewing these cases is a cumbersome and complex
task that requires analysts to navigate a large network of nancial
interactions to validate suspicious movements. Furthermore, these
systems have very high false positive rates (estimated to be over
95%). The scarcity of labels hinders the use of alternative systems
based on supervised learning, reducing their applicability in real-
world applications. In this work we present LaundroGraph, a novel
self-supervised graph representation learning approach to encode
banking customers and nancial transactions into meaningful rep-
resentations. These representations are used to provide insights
to assist the AML reviewing process, such as identifying anoma-
lous movements for a given customer. LaundroGraph represents
the underlying network of nancial interactions as a customer-
transaction bipartite graph and trains a graph neural network on a
fully self-supervised link prediction task. We empirically demon-
strate that our approach outperforms other strong baselines on
self-supervised link prediction using a real-world dataset, improv-
ing the best non-graph baseline by
12
p.p. of AUC. The goal is to
increase the eciency of the reviewing process by supplying these
AI-powered insights to the analysts upon review. To the best of our
knowledge, this is the rst fully self-supervised system within the
context of AML detection.
CCS CONCEPTS
Computing methodologies Anomaly detection
;
Neural
networks;Learning latent representations.
KEYWORDS
anti-money laundering, self-supervision, graph neural networks
ACM Reference Format:
Mário Cardoso, Pedro Saleiro, and Pedro Bizarro. 2022. LaundroGraph: Self-
Supervised Graph Representation Learning for Anti-Money Laundering. In
3rd ACM International Conference on AI in Finance (ICAIF ’22), November
2–4, 2022, New York, NY, USA. ACM, New York, NY, USA, 9 pages. https:
//doi.org/10.1145/3533271.3561727
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from permissions@acm.org.
ICAIF ’22, November 2–4, 2022, New York, NY, USA
©2022 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-9376-8/22/10... $15.00
https://doi.org/10.1145/3533271.3561727
1 INTRODUCTION
Money laundering is a criminal activity concerned with concealing
the origin of funds obtained through illegal means such as terrorism
nancing, drug tracking or corruption, appearing legitimate until
a thorough analysis is performed. An estimated
1.7 to
4 trillions
(2% - 5% of global GDP) are estimated to be laundered annually [13].
To adhere to the AML regulations, nancial institutions employ
compliance experts that investigate suspicious activities alerted,
usually, through a rule-based system. These triggered rules are the
starting point of a process that can take several days to complete,
culminating in a decision of agging as suspicious activity or not.
When the former is identied, a suspicious activity report must be
led and delivered to a regulatory institution that proceeds with
due action. Non-compliance in reporting money laundering can
lead nancial institutions and their employees to face civil and
criminal penalties, such as heavy nes or prison time.
In Anti-Money Laundering (AML) reviewing, analysts investi-
gate alerts centered on an entity (e.g., bank accounts or customers),
comprised of a bulk of transactions that triggered one or more
rules in order to understand if any suspicious activity was involved.
Navigating the network of interactions sprawling from a complex
alert and keeping track of the ows of money, often times through
entities not directly connected to the one being investigated, is
a challenging and cumbersome task. To facilitate this procedure,
analysts resort to understanding the data through aggregations
of meaningful categories, such as grouping by entities interacted
with (known as counterparts) or amounts, as well as relying on
their past experience and prior knowledge of the customer under
review. Throughout the review process, there is a continuous eort
to lter the large bulk of transactions into a smaller set of abnormal
interactions that can be used to justify suspicious activity. There
are some challenges with the current reviewing process, namely:
1) New analysts lack the context more experienced analysts might
have, requiring an additional eort to familiarize themselves with
re-occurring customers. Similarly, additional eort is required to
contextualize new customers entering the system; 2) It is challeng-
ing to navigate the bulk of transactions and decide which move-
ments are particularly suspicious, and resorting to a macro-view of
the interactions can lead to missing the ne-grained details of each
transaction.
To mitigate the aforementioned challenges, in this work we
present
LaundroGraph
, a novel fully self-supervised approach
leveraging Graph Neural Networks (GNNs) to encode represen-
tations of customers and transactions within the context of AML
reviewing. We propose to represent the network of nancial in-
teractions as a directed bipartite customer-transaction graph
1
,
1
Other networks were considered but this was simultaneously the best performing
and most exible approach
arXiv:2210.14360v1 [cs.LG] 25 Oct 2022
ICAIF ’22, November 2–4, 2022, New York, NY, USA Mário Cardoso, Pedro Saleiro, and Pedro Bizarro
Figure 1: Proposed system training overview. Outgoing transactions are represented with lled arrows, and incoming transac-
tions with dashed arrows. First, the bipartite graph is built from a dataset comprised of raw transactions. Then, positive pairs
(green) and negative pairs (red) together with their 𝐾-hop subgraphs (𝐾=2in the gure) are extracted and their embeddings
obtained through the encoder. Finally, the decoder uses the aforementioned embeddings to generate the prediction for each
sampled edge.
with the GNN trained through a link prediction task between pairs
of customer and transaction nodes, essentially corresponding to
an anomaly prediction task. As a result, anomalous movements
within the context of each customer can be automatically identi-
ed and shown to the analyst upon review, providing a starting
point of potentially suspicious movements and alleviating the ef-
fort required to lter the bulk of transactions. Furthermore, the
derived representations can be used as building blocks for addi-
tional insights to support the reviewing process, such as clustering
the per-customer transactions, and comparing how the behavior of
a customer evolves over time. The former can be a useful approach
to group the information shown to the analyst beyond simple ag-
gregations, and the latter can quickly provide context surrounding
a customer under review. Unlike most existing works in the graph
self-supervised literature landscape, in this work self-supervision
is both the starting point and the end goal, as there are no anomaly
labels or supervised downstream tasks. The objective is for this
system to be integrated within a broader system for AML review-
ing that handles the necessary workload of assessment creation.
Within this system, these insights will be digested and provided in
an easy-to-understand manner through tailor-made visualizations
for AML as soon as the investigation starts. These visualizations
are beyond the scope of this work and they will not be described.
In summary, this work’s main contributions are:
A novel fully self-supervised approach to derive represen-
tations of customers and nancial transactions useful for a
variety of insights to support the AML reviewing process.
A new way to represent the network of nancial interactions
as a customer-transaction bipartite graph.
Validation of our method on a real-world banking dataset
in the self-supervised task of link prediction, achieving an
improvement of
12
p.p. of AUC compared to using only the
raw features.
2 RELATED WORK
Most of the approaches to detect AML used by nancial institutions
are based on a set of rules aligned with regulations. Machine learn-
ing methods for AML are becoming more popular, and can broadly
be separated into supervised and unsupervised approaches, with the
latter being more common due to the lack of available labels. When
labels are available, several works have compared the performance
of dierent classiers and training strategies in predicting money
laundering. Examples include benchmarking several popular clas-
siers and sampling schemes [28], comparing the performance of
an XGBoost classier when trained exclusively with alerted events
or with all events [8], and comparing the performance of an SVM
classier under dierent hyperparameter congurations [9].
Unsupervised approaches typically apply an anomaly detection
algorithm by comparing events with the expected behavior through
deviation metrics. Denitions of expected behaviour include clus-
ters of transactions by the same customer [14], the nearest large
cluster [3], or the k-nearest neighbors [16]. To handle the lack of
real-world data, several approaches have proposed to generate syn-
thetic data, either generating entire datasets [16, 17, 25], or just
patterns of suspicious behavior [3, 24].
The majority of works using machine learning for AML rely en-
tirely on feature sets that characterize individual events or entities.
This naturally disregards the underlying contextual information
摘要:

LaundroGraph:Self-SupervisedGraphRepresentationLearningforAnti-MoneyLaunderingMárioCardosomario.cardoso@feedzai.comFeedzaiPedroSaleiropedro.saleiro@feedzai.comFeedzaiPedroBizarropedro.bizarro@feedzai.comFeedzaiABSTRACTAnti-moneylaundering(AML)regulationsmandatefinancialinsti-tutionstodeployAMLsystem...

展开>> 收起<<
LaundroGraph Self-Supervised Graph Representation Learning for Anti-Money Laundering.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:932.77KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注