Decision trees are widely used in the banking industry due to their high accuracy and ability to formulate a statistical model in plain language. Since government organizations in many countries carefully monitor lending practices, executives must be able to explain why one applicant was rejected for a loan while others were approved. This information is also useful for customers hoping to determine why their credit rating is unsatisfactory.
It is likely that automated credit scoring models are employed for instantly approving credit applications on the telephone and the web. In this section, we will develop a simple credit approval model using C5.0 decision trees. We will also see how the results of the model can be tuned to minimize errors that result in a financial loss for the institution.
Step 1 – collecting data
The idea behind our credit model is to identify factors that make an applicant at higher risk of default. Therefore, we need to obtain data on a large number of past bank loans and whether the loan went into default, as well as information about the applicant.
Data with these characteristics are available in a dataset donated to the UCI Machine Learning Data Repository (http://archive.ics.uci.edu/ml) by Hans Hofmann
of the University of Hamburg. They represent loans obtained from a credit agency in Germany.
The data presented in this chapter has been modified slightly from the original one for eliminating some preprocessing steps. To follow along with the examples, download the credit.csv file from Packt Publishing's website and save it to your R working directory.
The credit dataset includes 1,000 examples of loans, plus a combination of numeric and nominal features indicating characteristics of the loan and the loan applicant. A class variable indicates whether the loan went into default. Let's see if we can determine any patterns that predict this outcome.
Step 2 – exploring and preparing the data
As we have done previously, we will import the data using the read.csv() function. We will ignore the stringsAsFactors option (and therefore use the default value, TRUE) as the majority of features in the data are nominal. We'll also look at the structure of the credit data frame we created:
> credit <- read.csv("credit.csv")
> str(credit)
The first several lines of output from the str() function are as follows:
'data.frame':1000 obs. of 17 variables:
$ checking_balance : Factor w/ 4 levels "< 0 DM","> 200 DM",..
$ months_loan_duration: int 6 48 12 ...
$ credit_history : Factor w/ 5 levels "critical","good",..
$ purpose : Factor w/ 6 levels "business","car",..
$ amount : int 1169 5951 2096 ...
We see the expected 1,000 observations and 17 features, which are a combination of factor and integer data types.
Let's take a look at some of the table() output for a couple of features of loans that seem likely to predict a default. The checking_balance and savings_balance features indicate the applicant's checking and savings account balance, and are recorded as categorical variables:
> table(credit$checking_balance)
< 0 DM > 200 DM 1 - 200 DM unknown
274 63 269 394
> table(credit$savings_balance)
< 100 DM > 1000 DM 100 - 500 DM 500 - 1000 DM unknown
603 48 103 63 183
Since the loan data was obtained from Germany, the currency is recorded in Deutsche Marks (DM). It seems like a safe assumption that larger checking and savings account balances should be related to a reduced chance of loan default.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme