logo Hurry, Grab up to 30% discount on the entire course
Order Now logo
1645 Times Downloaded

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Herman BerensLaw
(5/5)

577 Answers

Hire Me
expert
Rajiv BhatiyaHistory
(5/5)

696 Answers

Hire Me
expert
Aunty DonnaEngineering
(5/5)

900 Answers

Hire Me
expert
Malachi HousePhilosophy
(4/5)

577 Answers

Hire Me
R Programming
(5/5)

Suppose further that the mean and variance of the participation rate of women in 1968 are 0.5 and 0.005, respectively.

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

# Question 1

In a recent, exciting, but also controversial Science article, [\blc Tomasetti and Vogelstein\bc](http://science.sciencemag.org/content/347/6217/78.full) attempt to explain why cancer incidence varies drastically across tissues (e.g. why one is much more likely to develop lung cancer rather than pelvic bone cancer). The authors show that a higher average lifetime risk for a cancer in a given tissue correlates with the rate of replication of stem cells in that tissue. The main inferential tool for their statistical analysis was a simple linear regression, which we will replicate here. 

You can download the dataset as follows:

```{R} tomasetti = read.csv("http://people.math.binghamton.edu/qiao/data501/data/Tomasetti.csv")head(tomasetti)```

The dataset contains information about 31 tumour types. The `Lscd` (Lifetime stem cell divisions) column refers to the total number of stem cell divisions during the average lifetime, while `Risk` refers to the lifetime risk for cancer of that tissue type.

1.  Fit a simple linear regression model to the data with `log(Risk)` as the response variable and `log(Lscd)` as the predictor variable. ```{r}

tomasetti.lm <- lm(...~..., data = tomasetti)```

2. Plot the estimated regression line and the data. ``{r}plot(..., log(tomasetti$Risk) )``

3. Add upper and lower 95% **prediction** bands to predict the response given a range of covariates on the plot, using `predict()`. That is, produce one line for the upper limit of each interval over a sequence of densities, and one line for the lower limits of the intervals.

4. Interpret the above bands at a `Lscd` = $10^{10}$.

5. Add upper and lower 95% **confidence** bands for the conditional mean response on the plot, using `predict()`. That is, produce one line for the upper limit of each interval over a sequence of densities, and one line for the lower limits of the intervals. 

6. Interpret the above bands at a `Lscd` = $10^{10}$.

7. Test whether the slope in this regression is equal to 0 at level $\alpha=0.05$. State the null hypothesis, the alternative hypothesis, the conclusion, and the $p$-value. ```{r}summary(tomasetti.lm)``````{r}names(summary(tomasetti.lm))```

8. What are assumptions you made for question (7) above.

9. Give a 95% confidence interval for the slope of the regression line. 

10. Interpret your interval in (9).

11. Report the $R^2$ of the model.

12. Report the adjusted $R^2$ of the model.

13. Report an estimate of the variance of the errors in the model.

14. Provide an interpretation of the $R^2$ you calculated above, ideally to your neighbor who does not know much about statistics.

15. According to a [Reuters article](http://www.reuters.com/article/health-cancer-luck-idUSL1N0UE0VF20150101) "Plain old bad luck plays a major role in determining who gets cancer and who does not, according to researchers who found that two-thirds of cancer incidence of various types can be blamed on random mutations and not heredity or risky habits like smoking." Is this a correct interpretation of $R^2$?

# Question 2 

From our textbook **CH** page 51, Exercie 2.9.

Let $Y$ and $X$ denote the labor force participation rate of women in 1972 and 1968, respectively, in each of 19 cities in the United States. The regression output for this data set is shown in the following table. It was also found that $\text{SSR} = .0358$ and $\text{SSE} = .0544$. Suppose that the model $Y = \beta_{0} + \beta_{1}X + \epsilon$ satifies the ususal regression assumptions.

| Variable |Coefficient|s.e|t-Test|p-value|

|--|--|--|--|--|

|Constant |.203311|.0976|2.08|.0526|

|X|.656040|.1961|3.35|$<.0038$|

|--|--|--|--|--|

|n = 19|$R^{2} = .397$|$R^{2}_{a} = .362$|$\hat{\sigma} = .0566$|df = 17|

In this table **s.e** is the standard error of the estimate, **t-Test** is the value of the test statistics under the null hypothesis, **p-value** is the p-value of the test.

1. Compute $\widehat{\text{Var}}\left(Y\right)$ and $\widehat{\text{Cov}}\left(Y, X\right)$. Hint: for $\widehat{\text{Var}}\left(Y\right)$, check and compare the definitions of SST and sample variance. For $\widehat{\text{Cov}}\left(Y, X\right)$, (1) note that the correlation between $X$ and $Y$ can be computed from $R^2$; (2) compare the formulae for the sample correlation and the OLS estimate $\widehat\beta$; (3) note $\widehat{\text{Var}}\left(Y\right)$ can be computed. 

2. Suppose the participation rate of women in 1968 in a given city is $x=45\%$. What is the estimated participation rate of women in 1972 $y$ for the same city?

3. Suppose further that the mean and variance of the participation rate of women in 1968 (i.e., the sample mean and variance of the $x$ values) are 0.5 and 0.005, respectively. Construct the 95\% **confidence** interval for the estimate in (2). Hint: you may use either equation (2.37) or (2.40) in the textbook to calculate the standard error, which is needed in this confidence interval. First determine which formula to use.

4. Construct the 95\% confidence interval for the slope of the true regression line $\beta_{1}$. Hint: you may use (2.25) in the textbook to compute the standard error of the slope estimator $\widehat\beta_1$. Alternatively, it should have been reported in the above table.

5. Test the hypothesis: $\text{H}_{0}: \beta_{1} = 1$ versus $\text{H}_{a}: \beta_{1} > 1$ at the 5\% significance level. Hint: note that you must not use the T-test reported in the above table since it is based on the null hypothesis that $\beta_1=0$, not $\beta_0=0$.

6. Compute the $R^{2}$ for this simple linear regression from the values of SSR and SSE.

7. If $X$ and $Y$ were reversed in the above regression, what would you expect $R^{2}$ to be?

(5/5)
Attachments:

Expert's Answer

1645 Times Downloaded

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme