LaundroGraph Self-Supervised Graph Representation Learning for Anti-Money Laundering

2025-05-03 0 0 932.77KB 9 页 10玖币

侵权投诉

LaundroGraph: Self-Supervised Graph Representation Learning

for Anti-Money Laundering

Mário Cardoso

mario.cardoso@feedzai.com

Feedzai

Pedro Saleiro

pedro.saleiro@feedzai.com

Feedzai

Pedro Bizarro

pedro.bizarro@feedzai.com

Feedzai

ABSTRACT

Anti-money laundering (AML) regulations mandate nancial insti-

tutions to deploy AML systems based on a set of rules that, when

triggered, form the basis of a suspicious alert to be assessed by hu-

man analysts. Reviewing these cases is a cumbersome and complex

task that requires analysts to navigate a large network of nancial

interactions to validate suspicious movements. Furthermore, these

systems have very high false positive rates (estimated to be over

95%). The scarcity of labels hinders the use of alternative systems

based on supervised learning, reducing their applicability in real-

world applications. In this work we present LaundroGraph, a novel

self-supervised graph representation learning approach to encode

banking customers and nancial transactions into meaningful rep-

resentations. These representations are used to provide insights

to assist the AML reviewing process, such as identifying anoma-

lous movements for a given customer. LaundroGraph represents

the underlying network of nancial interactions as a customer-

transaction bipartite graph and trains a graph neural network on a

fully self-supervised link prediction task. We empirically demon-

strate that our approach outperforms other strong baselines on

self-supervised link prediction using a real-world dataset, improv-

ing the best non-graph baseline by

p.p. of AUC. The goal is to

increase the eciency of the reviewing process by supplying these

AI-powered insights to the analysts upon review. To the best of our

knowledge, this is the rst fully self-supervised system within the

context of AML detection.

CCS CONCEPTS

•Computing methodologies →Anomaly detection

;

Neural

networks;Learning latent representations.

KEYWORDS

anti-money laundering, self-supervision, graph neural networks

ACM Reference Format:

Mário Cardoso, Pedro Saleiro, and Pedro Bizarro. 2022. LaundroGraph: Self-

Supervised Graph Representation Learning for Anti-Money Laundering. In

3rd ACM International Conference on AI in Finance (ICAIF ’22), November

2–4, 2022, New York, NY, USA. ACM, New York, NY, USA, 9 pages. https:

//doi.org/10.1145/3533271.3561727

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than the

author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or

republish, to post on servers or to redistribute to lists, requires prior specic permission

and/or a fee. Request permissions from permissions@acm.org.

ICAIF ’22, November 2–4, 2022, New York, NY, USA

ACM ISBN 978-1-4503-9376-8/22/10... $15.00

https://doi.org/10.1145/3533271.3561727

1 INTRODUCTION

Money laundering is a criminal activity concerned with concealing

the origin of funds obtained through illegal means such as terrorism

nancing, drug tracking or corruption, appearing legitimate until

a thorough analysis is performed. An estimated

€

1.7 to

€

4 trillions

(2% - 5% of global GDP) are estimated to be laundered annually [13].

To adhere to the AML regulations, nancial institutions employ

compliance experts that investigate suspicious activities alerted,

usually, through a rule-based system. These triggered rules are the

starting point of a process that can take several days to complete,

culminating in a decision of agging as suspicious activity or not.

When the former is identied, a suspicious activity report must be

led and delivered to a regulatory institution that proceeds with

due action. Non-compliance in reporting money laundering can

lead nancial institutions and their employees to face civil and

criminal penalties, such as heavy nes or prison time.

In Anti-Money Laundering (AML) reviewing, analysts investi-

gate alerts centered on an entity (e.g., bank accounts or customers),

comprised of a bulk of transactions that triggered one or more

rules in order to understand if any suspicious activity was involved.

Navigating the network of interactions sprawling from a complex

alert and keeping track of the ows of money, often times through

entities not directly connected to the one being investigated, is

a challenging and cumbersome task. To facilitate this procedure,

analysts resort to understanding the data through aggregations

of meaningful categories, such as grouping by entities interacted

with (known as counterparts) or amounts, as well as relying on

their past experience and prior knowledge of the customer under

review. Throughout the review process, there is a continuous eort

to lter the large bulk of transactions into a smaller set of abnormal

interactions that can be used to justify suspicious activity. There

are some challenges with the current reviewing process, namely:

1) New analysts lack the context more experienced analysts might

have, requiring an additional eort to familiarize themselves with

re-occurring customers. Similarly, additional eort is required to

contextualize new customers entering the system; 2) It is challeng-

ing to navigate the bulk of transactions and decide which move-

ments are particularly suspicious, and resorting to a macro-view of

the interactions can lead to missing the ne-grained details of each

transaction.

To mitigate the aforementioned challenges, in this work we

present

LaundroGraph

, a novel fully self-supervised approach

leveraging Graph Neural Networks (GNNs) to encode represen-

tations of customers and transactions within the context of AML

reviewing. We propose to represent the network of nancial in-

teractions as a directed bipartite customer-transaction graph

Other networks were considered but this was simultaneously the best performing

and most exible approach

arXiv:2210.14360v1 [cs.LG] 25 Oct 2022

ICAIF ’22, November 2–4, 2022, New York, NY, USA Mário Cardoso, Pedro Saleiro, and Pedro Bizarro

Figure 1: Proposed system training overview. Outgoing transactions are represented with lled arrows, and incoming transac-

tions with dashed arrows. First, the bipartite graph is built from a dataset comprised of raw transactions. Then, positive pairs

(green) and negative pairs (red) together with their 𝐾-hop subgraphs (𝐾=2in the gure) are extracted and their embeddings

obtained through the encoder. Finally, the decoder uses the aforementioned embeddings to generate the prediction for each

sampled edge.

with the GNN trained through a link prediction task between pairs

of customer and transaction nodes, essentially corresponding to

an anomaly prediction task. As a result, anomalous movements

within the context of each customer can be automatically identi-

ed and shown to the analyst upon review, providing a starting

point of potentially suspicious movements and alleviating the ef-

fort required to lter the bulk of transactions. Furthermore, the

derived representations can be used as building blocks for addi-

tional insights to support the reviewing process, such as clustering

the per-customer transactions, and comparing how the behavior of

a customer evolves over time. The former can be a useful approach

to group the information shown to the analyst beyond simple ag-

gregations, and the latter can quickly provide context surrounding

a customer under review. Unlike most existing works in the graph

self-supervised literature landscape, in this work self-supervision

is both the starting point and the end goal, as there are no anomaly

labels or supervised downstream tasks. The objective is for this

system to be integrated within a broader system for AML review-

ing that handles the necessary workload of assessment creation.

Within this system, these insights will be digested and provided in

an easy-to-understand manner through tailor-made visualizations

for AML as soon as the investigation starts. These visualizations

are beyond the scope of this work and they will not be described.

In summary, this work’s main contributions are:

•

A novel fully self-supervised approach to derive represen-

tations of customers and nancial transactions useful for a

variety of insights to support the AML reviewing process.

•

A new way to represent the network of nancial interactions

as a customer-transaction bipartite graph.

•

Validation of our method on a real-world banking dataset

in the self-supervised task of link prediction, achieving an

improvement of

p.p. of AUC compared to using only the

raw features.

2 RELATED WORK

Most of the approaches to detect AML used by nancial institutions

are based on a set of rules aligned with regulations. Machine learn-

ing methods for AML are becoming more popular, and can broadly

be separated into supervised and unsupervised approaches, with the

latter being more common due to the lack of available labels. When

labels are available, several works have compared the performance

of dierent classiers and training strategies in predicting money

laundering. Examples include benchmarking several popular clas-

siers and sampling schemes [28], comparing the performance of

an XGBoost classier when trained exclusively with alerted events

or with all events [8], and comparing the performance of an SVM

classier under dierent hyperparameter congurations [9].

Unsupervised approaches typically apply an anomaly detection

algorithm by comparing events with the expected behavior through

deviation metrics. Denitions of expected behaviour include clus-

ters of transactions by the same customer [14], the nearest large

cluster [3], or the k-nearest neighbors [16]. To handle the lack of

real-world data, several approaches have proposed to generate syn-

thetic data, either generating entire datasets [16, 17, 25], or just

patterns of suspicious behavior [3, 24].

The majority of works using machine learning for AML rely en-

tirely on feature sets that characterize individual events or entities.

This naturally disregards the underlying contextual information

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

LaundroGraph:Self-SupervisedGraphRepresentationLearningforAnti-MoneyLaunderingMárioCardosomario.cardoso@feedzai.comFeedzaiPedroSaleiropedro.saleiro@feedzai.comFeedzaiPedroBizarropedro.bizarro@feedzai.comFeedzaiABSTRACTAnti-moneylaundering(AML)regulationsmandatefinancialinsti-tutionstodeployAMLsystem...

展开>> 收起<<

LaundroGraph Self-Supervised Graph Representation Learning for Anti-Money Laundering.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

LaundroGraph Self-Supervised Graph Representation Learning for Anti-Money Laundering

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: