logo Hurry, Grab up to 30% discount on the entire course
Order Now logo
625 Times Downloaded

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Isaac TorressCriminology
(5/5)

837 Answers

Hire Me
expert
Zayed KhanEconomics
(5/5)

611 Answers

Hire Me
expert
K AndersonAccounting
(5/5)

562 Answers

Hire Me
expert
Alan DuderMarketing
(5/5)

631 Answers

Hire Me
R Programming
(5/5)

This task will require you to develop and deploy a classification model on a product purchase data set.

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Classification Modelling 

This task will require you to develop and deploy a classification model on a product purchase data set. 

You are now a Data Scientist working for an international consulting firm. An automotive manufacturer has approached you to help them target existing customers for a re-purchase campaign. The aim of this campaign is to send a communication to customers who are highly likely to purchase a new vehicle. All customers have already purchased at least one vehicle. 

The automotive company has supplied a data set of customer demographics, previous car type bought, the age of the vehicle, and servicing details. Note that the servicing details are only for mechanics at official dealerships. You have looked at this data set and already transformed many of the variables to make the modelling easier. There was a lot of noise in the numeric variables so you decided to transform all the numeric variables into deciles (integers 1 to 10, each has a similar number of customers). The deciles can be treated as numeric or factors in R. The data dictionary for this data set is given below: 

Field Data Type Description 

ID Unique ID Unique ID of the customer 

Target Integer Model target. 1 if the customer has purchased more than 1 vehicle, 0 if they have only purchased 1. 

age_band Categorical Age banded into categories 

gender Categorical Male, Female or Missing 

car_model Categorical The model of vehicle, 18 models in total 

car_segment Categorical The type of vehicle 

age_of_vehicle_years Integer Age of their last vehicle, in deciles 

sched_serv_warr Integer Number of scheduled services (e.g. regular check-ups) used under warranty, in deciles 

non_sched_serv_warr Integer Number of non-scheduled services (e.g. something broke out of the service cycle) used under warranty, in deciles 

sched_serv_paid Integer Amount paid for scheduled services, in deciles 

non_sched_serv_paid Integer Amount paid for non scheduled services, in deciles 

total_paid_services Integer Amount paid in total for services, in deciles 

total_services Integer Total number of services, in deciles 

mth_since_last_serv Integer The number of months since the last service, in deciles 

annualised_mileage Integer Annualised vehicle mileage, in deciles 

num_dealers_visited Integer Number of different dealers visited for servicing, in deciles 

num_serv_dealer_purchased Integer Number of services had at the same dealer where the vehicle was purchased, in deciles 

 

Tasks: 

For this task you have been given two csv files, ‘repurchase_training.csv’ and ‘repurchase_validation.csv’. The tasks below are to be undertaken on ‘repurchase_training.csv’ since you have the target variable included. 

1. Undertake EDA on this dataset. Justify all decisions you make relating to data processing. 

2. Build a linear classification model to predict which customers are most likely to repurchase. You can use any technique we used in class with a classification target. 

a. Create the confusion matrix & calculate precision, recall, F1 & AUC. 

b. Given known best practice for classification tasks and the particular situation at hand, which metric will you use to decide your final model? 

3. Build a tree based classification model to predict which customers are most likely to repurchase. 

a. How does your tree-based model perform relative to the linear model? (Hint: If you did not pick a good metric in 2.b you will not be able to compare the models!) 

b. Discuss the variable importance measures from both types of models (i.e what does ‘variable importance’ mean for these different model types). 

c. In this particular task, do they propose different levels of importance for the features? Why do you think this is? Provide intuitive interpretation of these importances, given the context at hand (i.e What do these importances mean in a ‘business sense’) 

d. For your tree-based model, construct partial dependency plots for the top 5 most important features. Do you notice anything interesting about any of these plots? 

4. With the best model created above, use the variables in ‘repurchase_validation.csv’ to output both probabilities and class predictions. You will see this file contains a validation data set, but the target variable has been excluded. Your model will be marked against this data set. It is important to follow the following submission guidelines: 

i. a. name the file repurchase_validation_STUDENTNUMBER.csv So if your number is 1234 the file should be repurchase_validation_1234.csv 

ii. Include only three columns, named as follows, in this order. Note that target_class must be 1 for purchase and 0 for not. The probability here is for the positive (1) class. No need for two columns with two probabilities 

1. ID

2. target_probability

3. target_class

5. Reporting (500 – 1000 words)

a. Using all the notes and answers you have above, wrap up all your work into a report for the manager of the automotive firm that follows the CRISP-DM methodology. Remember to consider your audience! 

 

 

(5/5)
Attachments:

Expert's Answer

625 Times Downloaded

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme