(5/5)

Buy Now $35 USD

1301 Times Downloaded

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Drop Files Here Or Click to Upload

Or Get Complete Course Help

Rob RouseData mining

(5/5)

793 Answers

Hire Me

Trevor MitchellPhilosophy

(5/5)

803 Answers

Hire Me

StatAnalytica ExpertEnglish

(5/5)

566 Answers

Hire Me

Saksham ChepruComputer science

(5/5)

983 Answers

Hire Me

Applied Statistics

(5/5)

In this exercise, we will investigate the distributions of hypothesis tests for logistic regression

INSTRUCTIONS TO CANDIDATES

ANSWER ALL QUESTIONS

Directions

Students are encouraged to work together on homework. However, sharing, copying, or providing any part of a homework solution or code is an infraction of the University’s rules on Academic Integrity. Any violation will be punished as severely as possible.

• Be sure to remove this section if you use this .Rmd file as a template.

• You may leave the questions in your final document.

Exercise 1 (Simulating Wald and Likelihood Ratio Tests)

In this exercise, we will investigate the distributions of hypothesis tests for logistic regression. For this exercise, we will use the following predictors.

Recall that Consider the true model where

• β0 = 0.4

• β1 = −0.35

p(x) = P [Y = 1 | X = x]

log 1 p(x) ) = β0 + β1x1

(a) To investigate the distributions, simulate from this model 2500 times. To do so, calculate P [Y = 1 | X = x]

for an observation, and then make a random draw from a Bernoulli distribution with that success probability. (Note that a Bernoulli distribution is a Binomial distribution with parameter n = 1. There is no direction function in R for a Bernoulli distribution.)

Each time, fit the model:

log 1 p(x) ) = β0 + β1x1 + β2x2 + β3x3

Store the test statistics for two tests:

• The Wald test for H0 : β2 = 0, which we say follows a standard normal distribution for “large” samples

• The likelihood ratio test for H0 : β2 = β3 = 0, which we say follows a χ2 distribution (with some degrees of freedom) for “large” samples

(b) Plot a histogram of the empirical values for the Wald test statistic. Overlay the density of the true distribution assuming a large sample.

(c) Use the empirical results for the Wald test statistic to estimate the probability of observing a test statistic larger than 1. Also report this probability using the true distribution of the test statistic assuming a large sample.

(d) Plot a histogram of the empirical values for the likelihood ratio test statistic. Overlay the density of the true distribution assuming a large sample.

(e) Use the empirical results for the likelihood ratio test statistic to estimate the probability of observing a test statistic larger than 5. Also report this probability using the true distribution of the test statistic assuming a large sample.

(f) Repeat (a)-(e) but with simulation using a smaller sample size of 10. Based on these results, is this sample size large enough to use the standard normal and χ2 distributions in this situation? Explain.

Exercise 2 (Surviving the Titanic)

For this exercise use the ptitanic data from the rpart.plot package. (The rpart.plot package depends on the rpart package.) Use ?rpart.plot::ptitanic to learn about this dataset. We will use logistic regression to help predict which passengers aboard the Titanic will survive based on various attributes.

For simplicity, we will remove any observations with missing data. Additionally, we will create a test and train dataset.

(a) Consider the model log 1 p(x) ) = β0 + β1x1 + β2x2 + β3x3 + β4x4 + β5x3x4 where p(x) = P [Y = 1 | X = x] is the probability that a certain passenger survives given their attributes

• x1 is a dummy variable that takes the value 1 if a passenger was 2nd class.

• x2 is a dummy variable that takes the value 1 if a passenger was 3rd class.

• x3 is a dummy variable that takes the value 1 if a passenger was male.

• x4 is the age in years of a passenger.

Fit this model to the training data and report its deviance.

(b) Use the model fit in (a) and an appropriate statistical test to determine if class played a significant role in surviving on the Titanic. Use α = 0.01. Report:

• The null hypothesis of the test

• The test statistic of the test

• The p-value of the test

• A statistical decision

• A practical conclusion

(c) Use the model fit in (a) and an appropriate statistical test to determine if an interaction between age and sex played a significant role in surviving on the Titanic. Use α = 0.01. Report:

• The null hypothesis of the test

• The test statistic of the test

• The p-value of the test

• A statistical decision

• A practical conclusion

(d) Use the model fit in (a) as a classifier that seeks to minimize the misclassification rate. Classify each of the passengers in the test dataset. Report the misclassification rate, the sensitivity, and the specificity of this classifier. (Use survived as the positive class.)

Exercise 3 (Breast Cancer Detection)

For this exercise we will use data found in wisc-train.csv and wisc-test.csv, which contain train and test data, respectively. wisc.csv is provided but not used. This is a modification of the Breast Cancer Wisconsin (Diagnostic) dataset from the UCI Machine Learning Repository. Only the first 10 feature variables have been provided. (And these are all you should use.)

• UCI Page

• Data Detail

You should consider coercing the response to be a factor variable if it is not stored as one after importing the data.

(a) The response variable class has two levels: M if a tumor is malignant, and B if a tumor is benign. Fit three models to the training data.

• An additive model that uses radius, smoothness, and texture as predictors

• An additive model that uses all available predictors

• A model chosen via backwards selection using AIC. Use a model that considers all available predictors as well as their two-way interactions for the start of the search.

For each, obtain a 5-fold cross-validated misclassification rate using the model as a classifier that seeks to minimize the misclassification rate. Based on this, which model is best? Relative to the best, are the other two underfitting or over fitting? Report the test misclassification rate for the model you picked as the best.

(b) In this situation, simply minimizing misclassifications might be a bad goal since false positives and false negatives carry very different consequences. Consider the M class as the “positive” label. Consider each of the probabilities stored in cutoffs in the creation of a classifier using the additive model fit in (a).

cutoffs = seq(0.01, 0.99, by = 0.01)

That is, consider each of the values stored in cutoffs as c. Obtain the sensitivity and specificity in the test set for each of these classifiers. Using a single graphic, plot both sensitivity and specificity as a function of the cutoff used to create the classifier. Based on this plot, which cutoff would you use? (0 and 1 have not been considered for coding simplicity. If you like, you can instead consider these two values.)

(5/5)

Attachments:

Instructions Files

Expert's Answer

Buy Now $35 USD

1301 Times Downloaded

Hurry, Grab up to 30% discount on the entire course

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Rob RouseData mining

Trevor MitchellPhilosophy

StatAnalytica ExpertEnglish

Saksham ChepruComputer science

Applied Statistics

In this exercise, we will investigate the distributions of hypothesis tests for logistic regression

ANSWER ALL QUESTIONS

Attachments:

Instructions Files

Expert's Answer

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

Other Services

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Rob RouseData mining

Trevor MitchellPhilosophy

StatAnalytica ExpertEnglish

Saksham ChepruComputer science

Applied Statistics

In this exercise, we will investigate the distributions of hypothesis tests for logistic regression

ANSWER ALL QUESTIONS

Attachments:

Instructions Files

Expert's Answer

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer