logo Hurry, Grab up to 30% discount on the entire course
Order Now logo
1385 Times Downloaded

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Willard BoiceeManagement
(5/5)

813 Answers

Hire Me
expert
Manning DaleResume writing
(5/5)

830 Answers

Hire Me
expert
Shruti TalpadeScience
(5/5)

875 Answers

Hire Me
expert
Paul BurlingComputer science
(5/5)

743 Answers

Hire Me
Others
(5/5)

you will build a simple decision tree with depth 1 using this dataset for predicting the annual income target feature using the Gini Index split criterion

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Question 1

(50 points)

This question is inspired by Exercise 3 in Chapter 4 in the textbook. On Canvas, you will see a CSV file named "A2_Q1.csv". You will use this dataset for this question. In our version of the dataset, the annual income target variable has been renamed as low, mid, and high.

For this question, you will build a simple decision tree with depth 1 using this dataset for predicting the annual income target feature using the Gini Index split criterion. You will present your results as Pandas data frames.

You are allowed to use any Python code available on our website here. In fact, you are recommended to use some of this code.

Part A (5 points)

Compute the impurity of the target feature.

Part B (30 points)

In this part, you will determine the root node for your decision tree. Please refer Chapter 4 slides on Canvas and Part (c) of this exercise question in the textbook for handling the Age continuous feature.

Your answer to this part needs to be a Pandas data frame and it needs to be called "df_splits". Also, it needs to have the following columns:

  • Split
  • Remainder
  • Information_Gain
  • Is_Optimal (True or False - only the optimal split's Is_Optimal flag needs to be True and the others need to be False)

You can populate this data frame line by line by referring to Cell 6 in our Pandas tutorial.

You will populate and display your df_splits data frame. As an example for your "df_splits" data frame, consider the spam prediction example in Table 4.2 in the textbook on page 121, which was also covered in lectures. The df_splits data frame would look something like the table below.

Split Remainder Information_Gain Is_Optimal
suspicious words ? ? True
unknown sender ? ? False
contains images ? ? False

PART B CLARIFICATION: In your "df_splits" data frame, please do not bundle all age splits together and call it "Age". Rather, please have a separate row for each age threshold value that qualifies as a split candidate. Please name these splits as "Age_{YY}" where {YY} represents a numerical age threshold (in years). For example, if there are only two threshold values that qualify as a split candidate, say 20 and 30, you would add two rows to "df_splits" with Split values of "Age_20" and "Age_30". Apparently, you will need to populate the rest of the columns with correct values for these two rows.

Part C (15 points)

In this part, you will assume the Education descriptive feature is at the root node (NOTE: This feature may or may not be the optimal root node, but you will just assume it is). Under this assumption, you will make predictions for the annual income target variable.

Your answer to this part needs to be a Pandas data frame and it needs to be called "df_prediction". Also, it needs to have the following columns:

  • Leaf_Condition
  • Low_Income_Prob (Probability)
  • Mid_Income_Prob
  • High_Income_Prob
  • Leaf_Prediction

Assuming that Education is the root node, you will populate and display your df_prediction data frame. As an example, continuing the spam prediction problem, assume the suspicious words descriptive feature is at the root node. The df_prediction data frame would look something like the table below.

Leaf_Condition Spam_Prob Ham_Prob Leaf_Prediction
suspicious words == true ? ? ?
suspicious words == false ? ? ?

HINT: Your df_prediction data frame should have only 3 rows.

Question 1 Wrap-up

For marking purposes, please display your "df_splits" and "df_prediction" data frames (in separate cells!) as the last thing in your notebook for this question. Thank you.

 

Question 2

(50 points)

This question is inspired from Exercise 6 in Chapter 8 in the textbook. On Canvas, you will see a CSV file named "A2_Q2.csv". You will use this dataset for this question.

Part A (10 points)

Assume true is the positive target level. Using a score threshold of 0.5, work out the confusion matrix (using pd.crosstab() or as a Jupyter notebook table) with appropriate row and column labels.

Part B (20 points, 4 points each)

Compute the following 5 metrics:

  1. Error Rate
  2. Precision
  3. TPR (True Positive Rate) (also known as Recall)
  4. F1-Score
  5. FPR (False Positive Rate)

You will need to display your answers as a Pandas data frame called "df_metrics" (with 5 rows, one row for each metric) with the following 2 columns:

  • Metric
  • Value

Marking Note for Part B: If your confusion matrix is incorrect, you will not get full credit for a correct follow-through.

Part C (15 points)

By varying the score threshold from 0.1 to 0.9 (both inclusive) in 0.1 increments, compute TPR and FPR values. You will need to display your answers as a Pandas data frame called "df_roc" with the following columns:

  • Threshold
  • TPR
  • FPR

HINT: Your df_roc data frame should have 9 rows. For this part and the next, you might find Cells #20 and #21 useful in our SK4 Tutorial here.

Part D (5 points)

Using your answer in the above part, display an ROC curve with appropriate axes labels and a title.

Question 2 Wrap-up

For marking purposes, please display your "df_metrics" and "df_roc" data frames (in separate cells!) as the last thing in your notebook right before your plot for Part D. Thank you.

(5/5)
Attachments:

Expert's Answer

1385 Times Downloaded

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme