(5/5)

Buy Now $35 USD

886 Times Downloaded

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Drop Files Here Or Click to Upload

Or Get Complete Course Help

Jesse RossesManagement

(5/5)

612 Answers

Hire Me

Robert NjueSocial sciences

(/5)

519 Answers

Hire Me

Varun JoharStatistics

(5/5)

710 Answers

Hire Me

ShreyeshComputer science

(/5)

527 Answers

Hire Me

R Programming

(5/5)

This task will require you to develop and deploy a classification model on a product purchase data set.

INSTRUCTIONS TO CANDIDATES

ANSWER ALL QUESTIONS

Classification Modelling

You are now a Data Scientist working for an international consulting firm. An automotive manufacturer has approached you to help them target existing customers for a re-purchase campaign. The aim of this campaign is to send a communication to customers who are highly likely to purchase a new vehicle. All customers have already purchased at least one vehicle.

The automotive company has supplied a data set of customer demographics, previous car type bought, the age of the vehicle, and servicing details. Note that the servicing details are only for mechanics at official dealerships. You have looked at this data set and already transformed many of the variables to make the modelling easier. There was a lot of noise in the numeric variables so you decided to transform all the numeric variables into deciles (integers 1 to 10, each has a similar number of customers). The deciles can be treated as numeric or factors in R. The data dictionary for this data set is given below:

Field Data Type Description

ID Unique ID Unique ID of the customer

Target Integer Model target. 1 if the customer has purchased more than 1 vehicle, 0 if they have only purchased 1.

age_band Categorical Age banded into categories

gender Categorical Male, Female or Missing

car_model Categorical The model of vehicle, 18 models in total

car_segment Categorical The type of vehicle

age_of_vehicle_years Integer Age of their last vehicle, in deciles

sched_serv_warr Integer Number of scheduled services (e.g. regular check-ups) used under warranty, in deciles

non_sched_serv_warr Integer Number of non-scheduled services (e.g. something broke out of the service cycle) used under warranty, in deciles

sched_serv_paid Integer Amount paid for scheduled services, in deciles

non_sched_serv_paid Integer Amount paid for non scheduled services, in deciles

total_paid_services Integer Amount paid in total for services, in deciles

total_services Integer Total number of services, in deciles

mth_since_last_serv Integer The number of months since the last service, in deciles

annualised_mileage Integer Annualised vehicle mileage, in deciles

num_dealers_visited Integer Number of different dealers visited for servicing, in deciles

num_serv_dealer_purchased Integer Number of services had at the same dealer where the vehicle was purchased, in deciles

Tasks:

For this task you have been given two csv files, ‘repurchase_training.csv’ and ‘repurchase_validation.csv’. The tasks below are to be undertaken on ‘repurchase_training.csv’ since you have the target variable included.

1. Undertake EDA on this dataset. Justify all decisions you make relating to data processing.

2. Build a linear classification model to predict which customers are most likely to repurchase. You can use any technique we used in class with a classification target.

a. Create the confusion matrix & calculate precision, recall, F1 & AUC.

b. Given known best practice for classification tasks and the particular situation at hand, which metric will you use to decide your final model?

3. Build a tree based classification model to predict which customers are most likely to repurchase.

a. How does your tree-based model perform relative to the linear model? (Hint: If you did not pick a good metric in 2.b you will not be able to compare the models!)

b. Discuss the variable importance measures from both types of models (i.e what does ‘variable importance’ mean for these different model types).

c. In this particular task, do they propose different levels of importance for the features? Why do you think this is? Provide intuitive interpretation of these importances, given the context at hand (i.e What do these importances mean in a ‘business sense’)

d. For your tree-based model, construct partial dependency plots for the top 5 most important features. Do you notice anything interesting about any of these plots?

4. With the best model created above, use the variables in ‘repurchase_validation.csv’ to output both probabilities and class predictions. You will see this file contains a validation data set, but the target variable has been excluded. Your model will be marked against this data set. It is important to follow the following submission guidelines:

i. a. name the file repurchase_validation_STUDENTNUMBER.csv So if your number is 1234 the file should be repurchase_validation_1234.csv

ii. Include only three columns, named as follows, in this order. Note that target_class must be 1 for purchase and 0 for not. The probability here is for the positive (1) class. No need for two columns with two probabilities

1. ID

2. target_probability

3. target_class

5. Reporting (500 – 1000 words)

a. Using all the notes and answers you have above, wrap up all your work into a report for the manager of the automotive firm that follows the CRISP-DM methodology. Remember to consider your audience!

(5/5)

Attachments:

Instructions Files

Expert's Answer

Buy Now $35 USD

886 Times Downloaded

Hurry, Grab up to 30% discount on the entire course

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Jesse RossesManagement

Robert NjueSocial sciences

Varun JoharStatistics

ShreyeshComputer science

R Programming

This task will require you to develop and deploy a classification model on a product purchase data set.

ANSWER ALL QUESTIONS

Attachments:

Instructions Files

Expert's Answer

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

Other Services

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Jesse RossesManagement

Robert NjueSocial sciences

Varun JoharStatistics

ShreyeshComputer science

R Programming

This task will require you to develop and deploy a classification model on a product purchase data set.

ANSWER ALL QUESTIONS

Attachments:

Instructions Files

Expert's Answer

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer