CLASSIFICATION BASED CREDIT RISK ANALYSIS THE CASE OF LENDING CLUB AADIGUPTA PRIYA GULATISIDDHARTHA P. C HAKRABARTY Abstract

2025-04-27 0 0 211KB 9 页 10玖币
侵权投诉
CLASSIFICATION BASED CREDIT RISK ANALYSIS: THE CASE OF LENDING CLUB
AADI GUPTA *PRIYA GULATI SIDDHARTHA P. CHAKRABARTY
Abstract
In this paper, we performs a credit risk analysis, on the data of past loan applicants of a company named
Lending Club. The calculation required the use of exploratory data analysis and machine learning classifica-
tion algorithms, namely, Logistic Regression and Random Forest Algorithm. We further used the calculated
probability of default to design a credit derivative based on the idea of a Credit Default Swap, to hedge against
an event of default. The results on the test set are presented using various performance measures.
Keywords: Credit risk; Classification algorithm; Exploratory data analysis
1 INTRODUCTION
Lending Club, headquartered in San Francisco, was the first peer-to-peer lending institution to offer its secu-
rities through the Securities and Exchange Commission (SEC) and enter the secondary market [1]. This article is
essentially a case study, on how financial engineering problems can be addressed using Machine Learning (ML)
and Exploratory Data Analysis (EDA) approaches. Lending Club specializes in extending different types of loans
to urban customers, which is decided on the basis of the applicant’s profile. Accordingly, the data considered in
this work, contains information about past loan applications and whether they were “defaulted” or “not”. One of
the main objectives of this work is the calculation of the Probability of Default (PD) and (in case of a default)
the determination of Loss Given Default (LGD), Exposure at Default (EAD), and finally, the Expected Loss (EL),
making use of the historical data. Also, using the “Recovery Rate” (estimated while calculating “LGD” from
“EAD”) and the PD, a simple credit derivative is implemented, based on the concept of Credit Default Swaps
(CDS) which can be used to hedge against such defaults. It may be noted that the total value a lender is exposed
to when a loan defaults, is the EAD and the consequent unrecoverable amount for the lender is the LGD [2].
Accordingly, EL is defined as,
EL =EAD ×LGD ×PD.
The driver of this work is the notion of “Classification Algorithm”, which weighs the input data (of the applicant,
in this case) to classify the input features into positive and negative classes [3] (default on the loan or not, in
this case). Accordingly, we consider two “Classification Algorithms”, namely, the Logistic Regression and the
Random Forest.
The main idea behind Logistic Regression is to determine the probability of a particular data point belonging to
the positive class (in the case of binary classification) [4]. The model does so by establishing a “linear relationship”
between the independent and the dependent variables. The weights of the linear relations is determined through
*Department of Mathematics, Indian Institute of Technology Guwahati, Guwahati-781039, India, e-mail: aadi18@alumni.iitg.ac.in
Department of Mathematics, Indian Institute of Technology Guwahati, Guwahati-781039, India, e-mail: gulati18@alumni.iitg.ac.in
Department of Mathematics, Indian Institute of Technology Guwahati, Guwahati-781039, India, e-mail: pratim@iitg.ac.in
1
arXiv:2210.05136v1 [q-fin.RM] 11 Oct 2022
the minimization of a Loss Function, which is achieved by using the “Gradient Descent Optimization Algorithm”.
Since Logistic Regression first predicts the probability of belonging to the positive class, therefore it creates a
linear decision boundary (based on a threshold, set by the user), separating the two classes from one another. This
decision boundary can now be represented as a conditional probability [5]. Implementation of the Random Forest
algorithm involves the training stage construction of several decision trees [6], and predictions emanating from
these trees are averaged to arrive at a final prediction. Since the algorithm uses an average of results to make
the final prediction, the Random Forest algorithm is referred to as an ensemble technique. Decision Trees are
designed to optimally split the considered dataset into smaller and smaller subsets, in order to predict the value
being targeted [7, 8]. Some of the criterion used to calculate the purity or impurity of a node, include Entropy
and Gini Impurity. In summary, a decision tree splits the nodes on all the attributes present in the data, and then
chooses the split with the most Information Gain, with the Decision Tree model of Classification and Regression
Trees (CART), being used in this paper.
The firm based model uses the value of a firm to represent the event of default, with the default event being
represented by the boundary conditions of the process and the dynamics of the firm value [9]. In particular, we
refer to two well-established models. Firstly, we mention the Merton model based on the the seminal paper of
Black and Scholes [10], which is used to calculate the default probability of a reference entity. In the context,
the joint density function for the first hitting times is determined [11]. Secondly, we have the Black-Cox model,
which addresses some of the disadvantages of the Merton model [10]. In order to hedge against credit risk, the
usage of credit derivatives is a customary approach [12, 13, 14], with Credit Default Swaps (CDS) being the most
common choice of credit derivatives. This type of contracts entail the buyer of the CDS to transfer the credit risk
of a reference entity (“Loans” in this case constitute the reference entity) to the seller of the protection, until the
credit has been settled. In return, the protection buyer pays premiums (predetermined payments) to the protection
seller, which continues until the maturity of the CDS or a default, whichever is earlier [10]. The interested reader
may refer to [10] for the formula of CDS spread per annum, to be used later in this paper. Another widely used
credit derivative, albeit more sophisticated than the CDS are the Collateralized Debt Obligation (CDO), which is
a structured product, based on tranches [14]. CDOs can further classified into cash, synthetic and hybrid. The
interested reader may refer to [14] for a detailed presentation on pricing of synthetic CDOs.
2 METHODOLOGY
The goal of this exercise is the approximation of a classification model, on the data considered (from the
peer-to-peer lending company Lending Club), in order to predict as to whether an applicant (whose details are
contained in the considered database) is likely to default on the loan or not. Accordingly, to this end, it is necessary
to identify and understand the essential variables, and take into account the summary statistics, in conjunction
with data visualization. The dataset used for the estimation of PD, EAD, LGD and EL was obtained from Kaggle
[15]. The data contains details of all the applicants who had applied for a loan at Lending Club. There were
separate files obtained for the accepted and the rejected loans. The file “accepted loans” was only used, since the
observations were made on the applicants who ultimately paid the loan, and those who defaulted on the loan. The
data involved the details of the applicants for the loan, such as FICO score, loan amount, interest rate, purpose of
loan etc. Machine Learning (ML) algorithms were applied on this data to predict the PD after data exploration,
data pre-processing and feature cleaning.
The feature selection was performed on the considered data, bearing in mind the goals of predictive modelling.
2
摘要:

CLASSIFICATIONBASEDCREDITRISKANALYSIS:THECASEOFLENDINGCLUBAADIGUPTA*PRIYAGULATI†SIDDHARTHAP.CHAKRABARTY‡AbstractInthispaper,weperformsacreditriskanalysis,onthedataofpastloanapplicantsofacompanynamedLendingClub.Thecalculationrequiredtheuseofexploratorydataanalysisandmachinelearningclassica-tionalgor...

展开>> 收起<<
CLASSIFICATION BASED CREDIT RISK ANALYSIS THE CASE OF LENDING CLUB AADIGUPTA PRIYA GULATISIDDHARTHA P. C HAKRABARTY Abstract.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:211KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注