logo Hurry, Grab up to 30% discount on the entire course
Order Now logo

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
StatAnalytica ExpertManagement
(5/5)

970 Answers

Hire Me
expert
Elyza Marice GamisManagement
(5/5)

783 Answers

Hire Me
expert
Dushyant ChertriLaw
(5/5)

744 Answers

Hire Me
expert
Thaissa LannesLaw
(5/5)

753 Answers

Hire Me
Others
(5/5)

Suppose we build a logistic regression model to classify patients into HEALTHY or NOT-HEALTHY based on their age in years above sixty

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Instructions: There are 9 questions for a total of 37 points. This test must be done individually, with no collaboration allowed. Everyone taking this test is expected to act with the highest integrity and not cheat on this test in any way, shape, or form. Suspected violations will be reported to the Associate Dean.

 Please show all your work. Correct answers with no work shown will receive a grade of zero.

 Allowed aids: course textbook; course notes and other course material posted on Learn; external material linked to in the lecture slides.

 Please submit your solutions, in pdf format, to Learn, under Submit -> Dropbox -> Test 2.

 Question 1: Linear models [4 points] 

Suppose we have the following (x,y) datapoints: (0,-1), (1,0), (1,2), (2,3), (3,4), (4,8). Suppose we are given a linear model of the form y=mx+b that uses x to predict y. Suppose that running the model on the above data points gives the residual plot shown below (with x on the horizontal axis and residuals on the vertical axis).

Without making any other assumptions, what is the linear model?

Question 2: Logistic regression [3 points]

Suppose we build a logistic regression model to classify patients into HEALTHY or NOT-HEALTHY based on their age in years above sixty (call it p; for example, for a 62-year-old person, years above sixty = 2), fitness (call it q; zero if the patient exercises regularly, one otherwise), and cholesterol level (call it r). Suppose the model is

Will this model make the same or different prediction for the following patients? 

Patient 1: 60-year-old person who exercises regularly, with cholesterol level 3.0

Patient 2: 60-year-old person who does not exercise regularly, with cholesterol level 3.5

Question 3: Naïve Bayes Classifier [4 points] 

Suppose we have the following data set in which A and B are features and C is the class label. 

A

B

C

No

30

No

No

35

No

No

60

Yes

Yes

52

Yes

No

50

Yes

Yes

40

Yes

Yes

48

Yes

No

45

No

Yes

32

No

 

For a new datapoint with A = Yes and B = 45, would Naïve Bayes predict C = Yes or C = No?

Question 4: Decision Trees [4 points] 

Consider the following dataset in which F, G and H are features and C is the class label. Explain which root node would be chosen by the entropy-based decision tree classifier described in the lecture slides. 

F

G

H

C

Maybe

Yes

Yes

Yes

Yes

Yes

No

No

No

Yes

No

No

Yes

Yes

Yes

Yes

Maybe

Yes

No

No

Yes

Yes

Yes

Yes

Maybe

Yes

Yes

No

No

Yes

No

No

Maybe

Yes

Yes

Yes

 

Question 5: Cross-validation [4 points] 

Suppose you are the manager of a data analytics company. One of your junior analysts has applied a decision tree classification algorithm to a labeled data set D and calculated two things: 

1)  the accuracy of the decision tree on D using 10-fold cross validation, and

2)  the accuracy of the decision tree on D using an 80%-train / 20%-test split of the data 

She says to you: “With 10-fold cross validation, we get a 12% improvement in accuracy compared to the 80/20 split. I think that for classifying new data, we should use 10-fold cross validation”. You realize that the analyst found something interesting, but she is also lacking a fundamental understanding of data mining. What is incorrect in the analyst’s statement that “I think that for classifying new data, we should use 10-fold cross validation”?

Question 6: Leave-one-out cross-validation [4 points] 

Consider a Boolean classification problem with a yes/no class label (and some features). Consider a classifier called VERY_SIMPLE that works as follows. Let C be the class label that occurs more frequently in the training dataset (if both yes and no are equally frequent, let C = “yes” to break the tie). When given a new datapoint to classify, VERY_SIMPLE will always output C, regardless of the values of the features of the new datapoint.

 Suppose we have a dataset with exactly one half of the records with class "yes" and exactly one half with class "no". Compute the accuracy of VERY_SIMPLE on this dataset using leave-one-out cross validation.

Question 7: The PRISM Rule-Based Classifier [6 points]

Consider the following dataset in which F, G and H are features and C is the class label. Give the output of the PRISM algorithm on this data set.

 

F

G

H

C

No

Yes

Yes

Yes

No

No

Yes

No

Maybe

Yes

Yes

Yes

No

Yes

Yes

Yes

Yes

Yes

No

No

Maybe

Yes

No

No

Maybe

Yes

Yes

Yes

Yes

Yes

No

No

No

Yes

No

No

 

 

(5/5)
Attachments:

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme