1. Project Introduction
A) Defining the problem statement
Retail lending is generally considered risk-free as the bank might have done necessary due diligence including collateral requirement and credit score. However, it has been recently witnessed that this segment has started to default, in turn impacting the revenue and profitability for the bank.
Hence, there is a need to build a model to predict default loan which will help the bank to take required actions including –
i. Avoiding the exposure
ii. Intensify the collection efforts
iii. Initiate the collateral sale
iv. Avoiding certain customer or product segment
B) Need of the study/project
Retail lending is an important division of any large bank and it helps the bank to grow at rapid pace by earning significant fee income, interest income while also fetching savings and current accounts.
In some of the banks, it contributes more than 70% of the banks’ assets and revenues. In some cases, higher delinquency has even led to bank run and closure of a bank. Recent example of YES Bank is well known where lending to risky customers led to unprecedented by RBI and impacted the brand image of the bank.
C) Understanding business/social opportunity
If the bank is able to effectively predict the chance of loan default before the disbursement, then the future delinquency can be reduced significantly. This will help the bank to maintain good profitability and avoid any capital erosion.
This model will also enable the bank to identify the customer and product segments which have lower delinquency and high profitability. It will help the bank to expand into new territories and segments where they have not ventured before (for e.g., tier III, tier IV towns).
2. Data Report
We attempt to predict the risk of the loan being default based on the past loan data. Hence, we will take an overview the given data:
A) Understanding how data was collected in terms of time, frequency and methodology
The data contains the details of Loans which have been issued between June 2007 and December 2015 period.
Maximum last payment date for the loans is January 2016. Hence we can consider data is collected post January 2016. Based on the loan issue date, it shows Monthly frequency of data collection.
B) Visual inspection of data (rows, columns, descriptive details) There are 226,786 rows and 41 columns.
Out of which 25 are numeric columns, 11 character columns and 5 date columns.
The last variable ‘loan_status’ is the dependent variable.
C) Understanding of attributes (variable info, renaming if required)
We have renamed the column ‘earliest_cr_line’ to ‘earliest_cr_line_mnth’ as it shows the month a borrower's earliest reported credit line was opened.
3. Exploratory Data Analysis
The various steps followed to analyze the case study is mentioned and explained below.
3.1 Univariate Analysis
We are analyzing the all the 41 independent variable from data set give which we have stored in the data frame ‘loanData’. The ‘loan_status’ variable is the dependent variable.
We perform Univariate analysis.
• Nearly 80% customers have 0 number of 30+ days past-due incidences of delinquency in the borrower's credit file for the past 2 years which confirms that these customers have good track record
• Most of the loans are disbursed in the range of 5000 to 15000.
• Around 80% of customer who have availed loan has annual income less than 100,000
• Large part of customer has been lent within 30% of DTI. This gives comfort about the loan portfolio
• Few variables like revol_bal, out_prncp, out_prncp_inv and total_rec_late_fee are concentrated to a particular range of values. Hence there is difference in mean and median.
• The summary and box plot shows there is an outlier in most of the continuous variables. On further analysis we found that those are acceptable values.
Please refer Appendix A for Source Code.
3.2 Bi-Variate Analysis
We will analyze loan_status with the independent variables from data set ‘loanData’. Most of the variables do not seem to have much effect whether loan will default or not. Customers who have mostly defaulted belong to E, F and G grade compared to fully paid in the same grade. Customers with loan Grade B have fully paid the loan maximum time.
Customers have borrowed higher amount of loan mostly for credit card, debt consolidation, home improvement, house and small business
Please refer Appendix A for Source Code
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme