Let us work with the same dataset HERS. We would like to check the influential cases. This time we will build a model to investigate the relationship between BMI, exercise at least 3 times per week, age, and race (raceth).
Q1. First, we will build the model
Carry out a regression with BMI as a dependent variable and the other variables listed as predictor variables. Cut and paste your output in the box below. Provide the syntax. What is the value of the R2 for the model (report R2 in percentage with two digits decimal).
Hint: Make sure that you add “i.” in front of the categorical variables like race. For instance the code for race should be added as i.raceth.
Q2. interpret the relationship between the exercise, age, race and BMI at both the model and the predictor levels.
Q3. Let us check the outliers of Y, first.
Remember, when we are looking at outliers on Y, we aren’t merely looking at whether a case has an unusually high or unusually low value (that’s actually expected). What we really want to know is when their observed outcome (yi) is very different than the outcome that the model would predict (y ̂_i).
Consider the following scenarios.
Remember that we sill use the studentized residuals to identify cases (participants) for which our model does not predict the outcome well.
Data Prep
Before we just jump into diagnosing our model, we need to generate case identifiers—i.e., numbers that uniquely identify each participant. Many data sets will already include case identifiers, but HERS does not. So, let’s create this:
gen case_id = _n
Next, let’s generate the studentized residual (the difference between yi and y ̂_i).
(Hint: look back at the lecture to see how we do this)
Diagnostics on Y
Now, you might have noticed that when you created the residual, Stata gave you the following message: “(5 missing values generated)”.
Q3.1. Why might we have five missing values? (select One)
This is an error in Stata, so we should go back and check our syntax
Not all cases had complete data for all included variables. Cases without complete data have missing residuals.
Some cases have erroneous data, and this caused the missing values.
Gremlins. It’s always gremlins.
________________________________________
Now, we have a few ways to approach figuring out whether we have any problematic Y residuals.
First, let’s generate a histogram of the residuals to get a global sense of the distribution of the Y residuals. Remember, we start to get concerned if the residuals are ±2.5 standard deviations and really worried if they are greater than ±3 standard deviations. Here is my histogram. (BTW, if you wonder why my histogram may look different than yours, it’s because I’m using the “set scheme” command—Stata has a ton of different visual schemes. Check out https://github.com/asjadnaqvi/Stata-schemes and find the one that speaks to you. My style is “white tableau”. There is even a scheme in honor of Taylor Swift’s Red album [huh?].)
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme