I. True/False answers.
1. The standard error of the regression
The standard error of the regression helps a researcher to judge the adequacy (“goodness-of-fit”) of an estimated regression model. It represents the average dispersion of the Y- Y-values around the regression line. SER can be thought of as the standard deviation of the residuals and the size of an average (typical) residual. Is it true or false?
Answer: True
Feedback:
None
2. The difference between the standard error of the regression AND of the regression parameter estimator.
The standard regression error is the standard deviation of the regression errors (residuals) whereas the estimated standard error of the slope coefficient is the standard deviation of the sampling distribution of the OLS estimator of the regression slope coefficient. Formally, the standard error of the slope coefficient can be estimated using the following formula: Var(b1) = su2/∑i (xi – x)2, whereas the formula for the SER (i.e., standard deviation of the error term) is given by the square root of the variance of the error term (ûi = yi –ŷi): SER =
√[∑i(yi –ŷi)2/(n–2)].
Is it true or false? Answer: True Feedback:
None
3. What does the i.i.d. random sample mean?
Simple random sampling means that n objects are selected at random from a population and each member of the population is equally likely to be included in the sample. We say that Y1,..,Yn are identically distributed if knowing the value of Y1,..,Yn are randomly drawn from the same population, the marginal distribution of Y1,..,Yn is the same for each i = 1,..,n Example: Had we selected 5 days at random to record the commuting time to work, we would have obtained the sample of 5 observations Y1,..,Y5; However, had we chosen 5 different days, we would have recorded 5 different values for our commuting time variable
Y. Under simple random sampling, Y1 is distributed independently of Y2,..,Yn In other words, we say that under simple random sampling Y1 provides no information about Y2, so the conditional distribution of Y2 given Y1 is the same as the marginal distribution of Y2 Is it true or false?
Answer: True Feedback: None
4. The measures of goodness-of-fit.
Suppose we have estimated two simple regressions, using the same data (i.e., same observations on X and Y): Y = ao + a1X and X = bo+b1Y. We should expect the goodness-of-fit measures (SER and R2) to be the same in these two models? Is it true or false?
Answer: False
Feedback:
The Root MSE is the averaged sum of squared residuals and it is different in the two models because in the first model they are calculated as a vertical distance between the regression line and the observations on Y, whereas in the second model the residuals are computed as the distance from the regression line to observations on X (swapped axes).
The coefficient of determination in the simple (i.e., with one regressor only) regression framework is equal the square of the correlation coefficient between the two variables: the square of the ratio of the covariance between the two variables divided by their standard deviations. Therefore, R2 is the same in both models because it measures the degree of linear association of the two variables. Note that in the R2 formula, R2 = ESS/TSS = 1 – SSR/TSS, both, the numerator (SSR) and denominator (TSS), change when we swap the dependent and independent variables, so that the overall value is unaltered.
II. Interpreting the regression output in Stata. (commands in Stata are NOT on the exam; however, you need to know how to interpet the descriptive statistics and regression output).
These data are taken from the US National Health Interview Survey for 1994. They are a subset of the data used in Anne Case and Christina Paxson’s paper “Stature and Status: Height, Ability, and Labor Market Outcomes,” Journal of Political Economy, 2008, 116(3): 499-532.
Earnings = annual salary, measured in USD (for farming category of occupation only); Height = height without shoes (in inches)
a) The population model is Yi = ao + a1Heighti+ ui, where Y is Earnings and u is the error term. The estimated model is Ŷi = âo + â1Heighti. Please write down the estimated equation using the regression output above.
From Stata output we have: Ŷ = –37,768.04 + 1049.201•X
b) How would you interpret the estimated intercept?
The intercept âo = –37,768.04 ($/year; the units of the dependent variable) has only geometrical interpretation: it’s the vertical intercept of the fitted regression line, and indicates the value of the dependent variable when the independent variable takes on the value of zero.
c) How would you interpret the estimated slope coefficient on Height?
The estimated slope is â1 = 1049.201 ($/inch; the units of the dependent variable per units of the independent variable) indicates that each additional inch of height will lead to an increase in hourly earnings of about $1049 per year, on average.
d) Is the coefficient for Height statistically significant at the 5 percent level of significance? Explain.
The statistical significance of a regression coefficient means that it is significantly different from zero in statistical sense – either much greater or far less than 0. Thus this question can be formulated as a two-sided statistical hypothesis test:
The null hypothesis Ho: a1 = 0 against the alternative hypothesis H1: a1 ≠ 0. There are two approaches to hypothesis testing: the critical value and p-value.
The first step is the same in both approaches: to compute the test statistic which is called the t-statistic in the test of the true value of a regression coefficient.
Here, the t-statistic is
t = [â1 – a1,Ho]/SE(â1)= [â1 – 0]/SE(â1) = 1049.20 /308.8158 = 3.397.
If the sample size is large (as a rule of thumb, exceeds 100 observations) the sampling distribution of the t-statistic is standard normal centered at 0 with the standard deviation of 1. This follows from the central limit theorem (CLT).
Since this distribution is standard normal we can use the standard normal table to determine if the value of the t-statistic falls in the tail(s) of the distribution (i.e., the region of the unlikely values of the coefficient given the value of the coefficient specified in the null hypothesis).
Formally, the decision rule is to check if the t-statistic falls in the rejection region by comparing its value to the critical value (the percentile of the standard normal distribution determined by the chosen level of significance; here, α = 0.05).
Under the critical value approach, the rejection region and, hence, the critical values of the t-statistic, are pre-assigned by selecting the level of significance.
The critical value of the t-statistic is 1.96 for the two-sided test at the 5% level of significance (i.e., the 0.025 and 1–0.025 = 0.975 percentiles of the standard normal probability distribution).
The rule is based on the alternative hypothesis:
• The alternative is given by: H1: a1 ≠ constant: If the absolute value of the t-statistic exceeds the critical value (the absolute value of the α/2 percentile of the standard normal distribution), reject the null at the prespecified level of significance, α = 0.05 or α =0.01. If the absolute value of the t-statistic is less than or equal to the critical value, fail to reject the null at the prespecified level of significance, α = 0.05 or α
=0.01.
• The alternative is given by: H1: a1 > constant: If the t-statistic exceeds the critical value, the (1–α)th percentile of the standard normal distribution, reject the null at the prespecified level of significance, α =
0.05 or α =0.01. If the value of the t-statistic is less than or equal to the critical value, fail to reject the null at the prespecified level of significance, α = 0.05 or α =0.01.
• The alternative is given by: H1: a1 < constant: If the t-statistic is less than the critical value, the αth percentile of the standard normal distribution, reject the null at the prespecified level of significance, α=0.05 or α=0.01. If the value of the computed t-statistic is greater than or equal to the critical value, fail to reject the null at the prespecified level of significance, α=0.05 or α=0.01.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme