(5/5)

Buy Now $35 USD

1407 Times Downloaded

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Drop Files Here Or Click to Upload

Or Get Complete Course Help

Jim TavareLaw

(5/5)

582 Answers

Hire Me

Gianna SimsPsychology

(5/5)

936 Answers

Hire Me

Bryan ThomsonPsychology

(5/5)

527 Answers

Hire Me

StatAnalytica ExpertPolitical science

(5/5)

545 Answers

Hire Me

Applied Statistics

(5/5)

Use statistical software to create the interaction VerbMath, and the transformations Verb2 and Math2, and fit the full quadratic model

INSTRUCTIONS TO CANDIDATES

ANSWER ALL QUESTIONS

Assignment 4

Q1.

This exercise uses the COLLGPA dataset and covers influential points (see pages 189-199). First, you'll fit a basic model with no transformations in part (a), discover that this model is unsatisfactory, and then improve the model by adding an interaction term and two quadratic terms; you'll use this improved model for all remaining parts of the exercise. You'll then investigate whether there are any outliers or points with high leverage for this model. Finally, you'll use Cook's distance to give you some additional insight into how influential the most influential points are.

This problem is adapted from one in McClave et al. (2005). The COLLGPA data file contains data that can be used to determine whether college grade point average (GPA) for 40 students (Gpa) can be predicted from:

Verb = verbal score on a college entrance examination (percentile),

Math = mathematics score on a college entrance examination (percentile).

Admission decisions are often based on college entrance examinations (among other things), so this is a common use of regression modeling.

a. Use statistical software to fit the model E(Gpa) = b0 + b1Verb + b2Math.

Save the studentized residuals and draw a scatterplot with these studentized residuals on the vertical axis and Verb on the horizontal axis. Repeat with Math on the horizontal axis. Add a "loess" fitted line to each of your plots to help you assess the zero mean regression errors assumption [computer help #35, #15, and #36]. Do either of the plots suggest that this assumption is violated for this model?

b. Use statistical software to create the interaction VerbMath, and the transformations Verb2 and Math2, and fit the full quadratic model:

E(Gpa) = b0 + b1Verb + b2Math + b3VerbMath + b4Verb2 + b5Math2

Again save the studentized residuals, and in addition save leverages and Cook's distances [computer help #35, #37, and #38]. Draw a scatterplot of the studentized residuals on the vertical axis versus the (standardized) predicted values on the horizontal axis [computer help #15, #36, and #39]. This residual plot suggests that the zero mean, constant variance, and independence regression error assumptions are satisfied for this quadratic model. Briefly describe what you see (or fail to see) in this residual plot that leads you to this conclusion.

Note: Standardized predicted values are calculated by subtracting their sample mean and dividing by their sample standard deviation (so that they have a mean of 0 and standard deviation of 1). Some software packages can draw residual plots with standardized predicted values on the horizontal axis automatically. Such plots are essentially identical in appearance to residual plots with ordinary (unstandardized) predicted values on the horizontal axis (the only difference is the scale of the horizontal axis of the resulting plot). Plot whatever is easier to do in your particular software. In a real-life regression analysis you would also go on to check the zero mean, constant variance, and independent regression error assumptions in residual plots with Verb on the horizontal axis and also with Math on the horizontal axis. If you were to do that here, you would find that the assumptions are also satisfied in these residual plots.

c. Draw a histogram and QQ-plot of the studentized residuals for the quadratic model you fit in part (b) What do they suggest about the normality regression error assumption?

d. Are there any outliers for the quadratic model you fit in part (b)? (Remember to justify your answer).

e. Draw a scatterplot with the leverages from the quadratic model you fit in part (b) on the vertical axis and ID on the horizontal axis [computer help #15]. Which student has the highest leverage, and why? If you think you should investigate further, do so by seeing what happens if we exclude this student from the analysis.

Hint: Look at this student's predictor values to see why his/her leverage is high, and see if you should investigate further (if so, delete this student, refit the model, and see how much regression parameter estimates change).

f. Draw a scatterplot with Cook's distances from the quadratic model in part (b) on the vertical axis and ID on the horizontal axis. Which student has the highest Cook's distance, and why? If you think you should investigate further, do so by seeing what happens if we exclude this student from the analysis.

Hint: Look at this student's studentized residual and leverage to see why his/her Cook's distance is high, and consult Section 5.1.3 to see if you should investigate further.

Q2. (20 Points)

This exercise is an open-ended challenge to fit a multiple linear regression model to some data on restaurants. strive to come up with a "good" model (for this application, a "good" model should have an R-squared value of approximately 0.94 and a regression standard error, s, of approximately 10). You could potentially spend many hours on this exercise, but it should be possible to come up with a decent model within an hour or so; if you find yourself spending much more time than this, chances are you're on the wrong track or you're working too hard!

The following problem provides a challenging dataset that you can use to practice multiple linear regression model building. You've been asked to find out how profits for 120 restaurants in a particular restaurant chain are affected by certain characteristics of the restaurants. You would like to build a regression model for predicting Profit = annual profits (in thousands of dollars) from five potential predictor variables:

Cov = number of covers or customers served (in thousands)

Fco = food costs (in thousands of dollars)

Oco = overhead costs (in thousands of dollars)

Lco = labor costs (in thousands of dollars)

Region = geographical location (Mountain, Southwest, or Northwest)

Note that region is a qualitative (categorical) variable with three levels; the RESTAURANT data file contains two indicator variables to code the information in region: DSw = 1 for Southwest, 0 otherwise, and DNw = 1 for Northwest, 0 otherwise. Thus, the Mountain region is the reference level with 0 for both DSw and DNw. Build a suitable regression model and investigate the role of each of the predictors in the model through the use of predictor effect plots. You may want to consider the following topics in doing so:

• models with both quantitative and qualitative variables;

• polynomial transformations.

• interactions.

• comparing nested models.

You may use the following for terms in your model:

• Cov, Fco, Oco, Lco, DSw, DNw;

• interactions between each of the quantitative predictors and the indicator variables, such as DSwCov, DNwCov, etc.;

• quadratic terms, such as Cov2 (do not use terms like DSw2, however!);

• use Profit as the response variable [i.e., do not use loge(Profit) or any other transformation].

(5/5)

Attachments:

Instructions Files

Expert's Answer

Buy Now $35 USD

1407 Times Downloaded

Hurry, Grab up to 30% discount on the entire course

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Jim TavareLaw

Gianna SimsPsychology

Bryan ThomsonPsychology

StatAnalytica ExpertPolitical science

Applied Statistics

Use statistical software to create the interaction VerbMath, and the transformations Verb2 and Math2, and fit the full quadratic model

ANSWER ALL QUESTIONS

Attachments:

Instructions Files

Expert's Answer

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

Other Services

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Jim TavareLaw

Gianna SimsPsychology

Bryan ThomsonPsychology

StatAnalytica ExpertPolitical science

Applied Statistics

Use statistical software to create the interaction VerbMath, and the transformations Verb2 and Math2, and fit the full quadratic model

ANSWER ALL QUESTIONS

Attachments:

Instructions Files

Expert's Answer

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer