(5/5)

Buy Now $35 USD

1589 Times Downloaded

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Drop Files Here Or Click to Upload

Or Get Complete Course Help

Martin BlumbergFinance

(5/5)

608 Answers

Hire Me

Sikhar AggarwalMathematics

(5/5)

999 Answers

Hire Me

Jyanat KhuranaComputer science

(5/5)

929 Answers

Hire Me

Abhishek MisraEconomics

(5/5)

878 Answers

Hire Me

Others

(5/5)

you will build a simple decision tree with depth 1 using this dataset for predicting the annual income target feature using the Gini Index split criterion

INSTRUCTIONS TO CANDIDATES

ANSWER ALL QUESTIONS

Question 1

(50 points)

This question is inspired by Exercise 3 in Chapter 4 in the textbook. On Canvas, you will see a CSV file named "A2_Q1.csv". You will use this dataset for this question. In our version of the dataset, the annual income target variable has been renamed as low, mid, and high.

For this question, you will build a simple decision tree with depth 1 using this dataset for predicting the annual income target feature using the Gini Index split criterion. You will present your results as Pandas data frames.

You are allowed to use any Python code available on our website here. In fact, you are recommended to use some of this code.

Part A (5 points)

Compute the impurity of the target feature.

Part B (30 points)

In this part, you will determine the root node for your decision tree. Please refer Chapter 4 slides on Canvas and Part (c) of this exercise question in the textbook for handling the Age continuous feature.

Your answer to this part needs to be a Pandas data frame and it needs to be called "df_splits". Also, it needs to have the following columns:

Split
Remainder
Information_Gain
Is_Optimal (True or False - only the optimal split's Is_Optimal flag needs to be True and the others need to be False)

You can populate this data frame line by line by referring to Cell 6 in our Pandas tutorial.

You will populate and display your df_splits data frame. As an example for your "df_splits" data frame, consider the spam prediction example in Table 4.2 in the textbook on page 121, which was also covered in lectures. The df_splits data frame would look something like the table below.

Split	Remainder	Information_Gain	Is_Optimal
suspicious words	?	?	True
unknown sender	?	?	False
contains images	?	?	False

PART B CLARIFICATION: In your "df_splits" data frame, please do not bundle all age splits together and call it "Age". Rather, please have a separate row for each age threshold value that qualifies as a split candidate. Please name these splits as "Age_{YY}" where {YY} represents a numerical age threshold (in years). For example, if there are only two threshold values that qualify as a split candidate, say 20 and 30, you would add two rows to "df_splits" with Split values of "Age_20" and "Age_30". Apparently, you will need to populate the rest of the columns with correct values for these two rows.

Part C (15 points)

In this part, you will assume the Education descriptive feature is at the root node (NOTE: This feature may or may not be the optimal root node, but you will just assume it is). Under this assumption, you will make predictions for the annual income target variable.

Your answer to this part needs to be a Pandas data frame and it needs to be called "df_prediction". Also, it needs to have the following columns:

Leaf_Condition
Low_Income_Prob (Probability)
Mid_Income_Prob
High_Income_Prob
Leaf_Prediction

Assuming that Education is the root node, you will populate and display your df_prediction data frame. As an example, continuing the spam prediction problem, assume the suspicious words descriptive feature is at the root node. The df_prediction data frame would look something like the table below.

Leaf_Condition	Spam_Prob	Ham_Prob	Leaf_Prediction
suspicious words == true	?	?	?
suspicious words == false	?	?	?

HINT: Your df_prediction data frame should have only 3 rows.

Question 1 Wrap-up

For marking purposes, please display your "df_splits" and "df_prediction" data frames (in separate cells!) as the last thing in your notebook for this question. Thank you.

Question 2

(50 points)

This question is inspired from Exercise 6 in Chapter 8 in the textbook. On Canvas, you will see a CSV file named "A2_Q2.csv". You will use this dataset for this question.

Part A (10 points)

Assume true is the positive target level. Using a score threshold of 0.5, work out the confusion matrix (using pd.crosstab() or as a Jupyter notebook table) with appropriate row and column labels.

Part B (20 points, 4 points each)

Compute the following 5 metrics:

Error Rate
Precision
TPR (True Positive Rate) (also known as Recall)
F1-Score
FPR (False Positive Rate)

You will need to display your answers as a Pandas data frame called "df_metrics" (with 5 rows, one row for each metric) with the following 2 columns:

Metric
Value

Marking Note for Part B: If your confusion matrix is incorrect, you will not get full credit for a correct follow-through.

Part C (15 points)

By varying the score threshold from 0.1 to 0.9 (both inclusive) in 0.1 increments, compute TPR and FPR values. You will need to display your answers as a Pandas data frame called "df_roc" with the following columns:

Threshold
TPR
FPR

HINT: Your df_roc data frame should have 9 rows. For this part and the next, you might find Cells #20 and #21 useful in our SK4 Tutorial here.

Part D (5 points)

Using your answer in the above part, display an ROC curve with appropriate axes labels and a title.

Question 2 Wrap-up

For marking purposes, please display your "df_metrics" and "df_roc" data frames (in separate cells!) as the last thing in your notebook right before your plot for Part D. Thank you.

(5/5)

Attachments:

Instructions Files

Expert's Answer

Buy Now $35 USD

1589 Times Downloaded

Hurry, Grab up to 30% discount on the entire course

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Martin BlumbergFinance

Sikhar AggarwalMathematics

Jyanat KhuranaComputer science

Abhishek MisraEconomics

Others

you will build a simple decision tree with depth 1 using this dataset for predicting the annual income target feature using the Gini Index split criterion

ANSWER ALL QUESTIONS

Question 1

(50 points)

Part A (5 points)

Part B (30 points)

Part C (15 points)

Question 1 Wrap-up

Question 2

(50 points)

Part A (10 points)

Part B (20 points, 4 points each)

Part C (15 points)

Part D (5 points)

Question 2 Wrap-up

Attachments:

Instructions Files

Expert's Answer

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

Other Services

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Martin BlumbergFinance

Sikhar AggarwalMathematics

Jyanat KhuranaComputer science

Abhishek MisraEconomics

Others

you will build a simple decision tree with depth 1 using this dataset for predicting the annual income target feature using the Gini Index split criterion

ANSWER ALL QUESTIONS

Question 1

(50 points)

Part A (5 points)

Part B (30 points)

Part C (15 points)

Question 1 Wrap-up

Question 2

(50 points)

Part A (10 points)

Part B (20 points, 4 points each)

Part C (15 points)

Part D (5 points)

Question 2 Wrap-up

Attachments:

Instructions Files

Expert's Answer

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer