Task 1 Predictive Analytics Case Study (40 Marks)
The goal of the Predictive Analytics Case Study is to predict whether a customer is likely to become a loan delinquency and default on a loan for ACME Bank (see Table 1 Data Dictionary for loan-delinq.csv data set below). In completing Task 1 you will apply business understanding, data understanding, data preparation, modelling and evaluation phases of the CRISP DM data mining process. It is important that you understand this data set to complete Task 1 and four sub tasks.
Table 1 Data dictionary for loan-delinq.csv
Variable Name Description Data
Type
Record_ID Unique record id for customer Integer
SeriousDlqin2yrs Person experienced 90 days past due
delinquency or worse Yes = 1 or
0 = No
RevolvingUtilizationOfUnsecuredLines Total balance on credit cards and personal lines of credit except real estate and no instalment debt like car
loans divided by sum of credit limits Percentage
Age Age of borrower in years Integer
NumberOfTime30-
59DaysPastDueNotWorse Number of times borrower 30-59 days
past due but no worse in last 2 years. Integer
DebtRatio Monthly debt payments, alimony, living costs divided by monthly gross
income. Percentage
MonthlyIncome Monthly Income Real
NumberOfOpenCreditLinesAndLoans Number of Open loans (installment like car loan or mortgage) and Lines
of credit (e.g. credit cards). Integer
NumberOfTimes90DaysLate Number of times borrower has been
90 days or more past due. Integer
NumberRealEstateLoansOrLines Number of mortgage and real estate loans including home equity lines of
credit. Integer
NumberOfTime60- 89DaysPastDueNotWorse Number of times borrower has been 60-89 days past due but no worse in
last 2 years. Integer
NumberOfDependents Number of dependents in family excluding themselves (spouse,
children etc.) Integer
1.1 Exploratory data analysis and date preparation Conduct an exploratory data analysis and data preparation of loan-delinq.csv data set using RapidMiner to understand the characteristics of each variable and relationship of each variable to other variables. Summarise the findings of your exploratory data analysis and data preparation in terms of describing key characteristics of each variable in the loan-delinq.csv data set such as maximum, minimum values, average, standard deviation, most frequent values (mode), missing values and invalid values etc and relationships with other variables, transformation of existing variables, creation of new variables in a table named Task 1.1 Results of Exploratory Data Analysis and Data Preparation.
Hint: Statistics Tab and Chart Tab in RapidMiner provide a lot of descriptive statistical information and useful charts like Barcharts, Scatterplots required for Task 1.1 etc. You might also like to look at running some correlations and/or chi square tests depending on whether a variable is a categorical variable or a numeric variable. Indicate in Table 1.1 which variables which contribute most to predicting whether a customer is likely to become a loan delinquency and default on a loan or not. You could also consider transforming some variables and creating new variables and converting target/label variable into a binominal variable to facilitate analysis in Tasks 1.2, 1.3 and 1.4.
Briefly discuss the key findings of your exploratory data analysis and data preparation and justification for variables most likely to predict whether a customer is likely to become a loan delinquency and default on a loan or not (10 marks 500 words).
1.2 Decision Tree Model Build a Decision Tree model for predicting whether a customer is likely to become a loan delinquency and default on a loan or not, on the loan-
delinq.csv data set using RapidMiner and a set of data mining operators in part determined by your exploratory data analysis in Task 1.1. Provide these outputs from RapidMiner (1) Final Decision Tree Model process, (2) Final Decision Tree diagram and (3) Decision tree rules. Briefly explain your final Decision Tree Model Process, and discuss the results of the Final Decision Tree Model drawing on key outputs (Decision Tree Diagram, Decision Tree Rules) for predicting whether a customer is likely to become a loan delinquency and default on a loan or not based on key contributing variables and relevant supporting literature on interpretation of decision trees (10 marks 150 words).
1.3 Logistic Regression Model Build a Logistic Regression model for predicting whether a customer is likely to become a loan delinquency and default on a loan or not using RapidMiner and an appropriate set of data mining operators and loan-delinq.csv data set determined in part by your exploratory data analysis in Task 1.1. Provide these outputs from RapidMiner (1) Final Logistic Regression Model process (2) Key outputs from Logistic Regression Model. Hint for Task 1.3 Logistic Regression Model you may need to change data types of some variables. Briefly explain your final Logistic Regression Model Process and discuss the results of the Final Logistic Regression Model drawing on the key outputs (Coefficients, Standardised Coefficients, Odds Ratios, P Values etc) for predicting whether a customer is likely to become a loan delinquency and default on a loan or not based on key contributing variables and relevant supporting literature on interpretation of logistic regression models (10 marks 150 words).
1.4 Model Validation and Performance: You will need to validate your Final Decision Tree Model and Final Logistic Regression Model using the Cross-Validation Operator, Apply Model Operator and Performance Operator in your data mining processes. Discuss and compare the performance of the Final Decision Tree Model with the Final Logistic Regression Model for predicting whether a customer is likely to become a loan delinquency and default on a loan or not based on key results of the confusion matrix presented in Table 1.4 Model Performance Metrics (Decision Tree vs Logistic Regression). Table 1.4 will compare the Final Decision Tree Model with the Final Logistic Regression Model using following model performance metrics – (1) accuracy (2) sensitivity (3) specificity and (4) F1 score (10 marks 200 words).
Note the important outputs from the data mining analyses conducted in RapidMiner for Task 1 must be included in your Report 3 to provide support for your conclusions reached regarding each analysis conducted for 1.1, 1.2, 1.3 and 1.4. Note you can export important outputs from RapidMiner as jpg image files and include these screenshots in the relevant Task 1 parts of your Assessment 3 Report.
Note you will find the North Text book and RapidMiner Tutorials useful references for the data mining process activities conducted in Task 1 in relation to the exploratory data analysis and data preparation, decision tree analysis, logistic regression analysis and evaluation of the performance of the Final Decision Tree model and the Final Logistic Regression model.
These concepts are covered in Module RapidMiner Practicals and Chapters 3, 4, 9, 10 and 13 of North Textbook and RapidMiner Tutorials contained within RapidMiner.
Research and critically review the study materials and other relevant literature to provide a suitable written response to each of the following tasks 2, 3 and 4 supported with an appropriate level of in-text referencing:
Task 2 Social media analytics (15 marks 500 words)
2.1 Explain why social media analytics is such an important activity for business (7 Marks 250 words)
2.2 Choose and describe a widely used application of social media analytics and explain how impact of social media can be measured in this application area using social media analytics (8 marks 250 words)
Task 3 Big Data Technologies 15 marks 500 words)
3.1 Explain why streaming analytics is such an important concept in big data management, illustrate your answer with a real world application of streaming analytics (8 marks 250 words).
3.2 Discuss the key technology building blocks of IoT in the context of a real world application of IoT (7 marks 250 words).
Task 4 Artificial Intelligence: GPT Chat transforming work and ethical considerations of using Chat GPT(20 marks 1000 words)
4.1 First, discuss how configurations of humans and artificial intelligence will evolve in workplaces as organisations increasingly drive automation and augmentation through the adoption of AI applications such as GPT Chat (10 marks 500 words).
4.2 Second identify and discuss the ethical implications for organisations in relation to (1) privacy (2) transparency (3) bias and discrimination and (4) governance and accountability of using Chat GPT to drive automation and augmentation of work (10 marks 500 words).
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme