logo Hurry, Grab up to 30% discount on the entire course
Order Now logo

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Barbara ElseMathematics
(5/5)

954 Answers

Hire Me
expert
Samina KhanFinance
(5/5)

920 Answers

Hire Me
expert
Narender PatelMathematics
(5/5)

747 Answers

Hire Me
expert
Sunil AgnihotriComputer science
(5/5)

855 Answers

Hire Me
Biostatistics
(5/5)

Alternative hypothesis: there is a relationship between the plasma concentration of

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Statistical Methods for Biology

Part 1: Alzheimer's disease is a degenerative neurological condition that affects approximately 50 million people worldwide.  It has no cure and eventually leads to dementia, memory loss, and death. Early intervention and treatment may slow the disease's progression. However, because an Alzheimer's diagnosis requires patients to display symptoms of dementia, the disease is usually not recognized until it has become relatively advanced. There is therefore considerable interest in the development of blood tests or imaging tools that would allow patients to be diagnosed before symptoms appear.  

Physically, Alzheimer's disease is closely associated with a protein called amyloid-(A), which builds up and develops plaques in the hippocampus and other parts of the brain. Plaques begin to develop 10-20 years before patients show symptoms, so in theory, A levels could serve as an early diagnostic marker for Alzheimer's. However, the proteins are not normally detectable in the blood. Under natural conditions they can only be measured by autopsy. 

In 2001, DeMattos et al. (Proceedings of the National Academy of Sciences 98:8850) described a method to make A detectable in blood samples from mice.  Their technique used a monoclonal antibody to A.  When injected into the mice, the antibody made it easier for A to enter the bloodstream and also slowed the proteins' degradation.  As a result, the researchers saw a large increase in detectable A within 24-hours after injection.   

In this homework, you will analyze data from a follow-up study (DeMattos et al. 2002, Science 5563:2264-2267) in which the researchers investigated the relationship between A levels in blood plasma and A loads in the brains of 49 mice.  In each mouse, the plasma concentration of A was measured in pg/ml using ELISA (a standard laboratory method for quantifying proteins), and the A load in the hippocampus was measured as a percentage of total area using immunofluorescent staining (see Fig. 2 in the paper for example images).  

You can find the data on the website in the file demattos_et_al_2002_amyloid.csv.  In addition, you will find a PDF file containing the "original" data records (I read these data from a figure in the paper; they are not the author's actual notes).  The data include two potential explanatory variables.  The variable brain gives the percentage A load in the mouse's brain, and the variable severity classifies each mouse into one of four ordinal classes based on its A load.  Please use these files to answer Questions 1–15. 

We will analyze the data using both ANOVA and regression (or their non-parametric alternatives).  Our goals are to determine whether plasma concentrations of A are related to brain A load, and to describe the relationship if it exists.  Our methods will be somewhat different from those used by the authors

1. [1 point] State the null and alternative hypotheses for the one-way ANOVA, using severity as the explanatory variable.  

Null hypothesis: There is no relation between the plasma concentration of Aand the brain Aload in the brains of 49 mice. 

Alternative hypothesis: there is a relationship between the plasma concentration of Aand the brain Aload in the brains of 49 mice. 

2. Load demattos_et_al_2002_amyloid.csv into R. Make sure that severity is a factor.  Then fit the model, obtain the residuals, and add them to your data frame as a new column (mutate() may be useful).  Use the residuals to complete the following preliminary tasks:

a. [1 point] Check for possible outliers. Present any graphs that you use (with captions), and briefly explain what you found.  If the plot(s) suggest that an outlier exists, (i) identify the row number for each potential outlier in the data, and (ii) explain how you have addressed the issue. If you find a mistake in the .csv file, correct it.  If you leave an outlier unchanged, say so, and explain your reasoning.

Some guidance:

Review Lecture 2.9.  Remember the following principles:

1. Outliers are only a problem if they cause the data to violate the model’s distributional assumptions. 

2. You should always check for outliers using the same values that you will check the distributional assumptions.  You may want to run these checks (in 2b) before you decide how to handle any outliers.

3. Consider the overall distribution of the data.  If a point looks like a possible outlier in a boxplot, but is not very extreme relative to the outermost point in the opposite direction, and the data appear to be normal, then the "outlier" may not have a substantial effect on the analysis.  

4. Data should never be removed from an analysis unless it is unambiguously flawed and you cannot fix it. 

5. If you change anything in the data, you must recheck all of the model assumptions, including rechecking for outliers. Your answer should discuss any follow-up checks that you run, but you only need to include your original plot(s).

The arrange(), which.min(), or which.max() functions can help you figure out which row the outlier is in.

If you need to fix a data-entry error, you can either correct the .csv file in Excel and reload it, or you can correct it in R using code similar to:

b. [1 point] Check the model's assumptions (you may assume independence and random sampling).  Please (i) clearly identify each assumption that you are checking, (ii) state whether or not it has been met, and (iii) present the evidence that you are using to check it. Evidence can take the form of plots, formal goodness-of-fit tests, or calculations based on statistics. Please only provide the evidence that you feel is really needed to verify the assumptions.   

4. Suppose that we are interested in a planned contrast between severity class 0 and class 1.

a. [1 point] Using the output from summary() and confint(), report a point estimate, standard error, and 95% confidence interval for the contrast

6. There are a total of 5 possible pairwise comparisons (or unplanned contrasts) that we can make among the four treatment levels in this study.  This exercise will illustrate how our choice of multiple testing correction can affect the results of these comparisons.  The cheat sheet titled Methods to control error inflation in multiple comparisons (available on the website) may be helpful.

b. [0.5 point] Which method is the best choice for the current analysis, and why? Your answer should explain the 𝑃-values that you got in 6a, but should not be based on those 𝑃-values.

c. [0.5 point] Suppose that instead of an ANOVA, we had decided to run a non-parametric Kruskal-Wallis test. Which multiple-testing method would be preferable in this scenario, and why?

8. [1 point] In these data, what proportion of the variation in plasma A concentration is accounted for by differences among the means for the different severity classes (i.e., by the effect of plaque severity)? Please identify the statistic used to find this answer.

9. [1 point] Briefly explain your biological interpretation of the results.  For evidence, your discussion can cite the ANOVA table results, hypothesis testing results for pairwise comparisons, and/or the graph in question 8.  You do not need to repeat any statistics here. Simply explain what they mean, biologically.  In particular, consider the goals of the analysis

10. [1 point] In general terms, how confident are you in the repeatability of these results?  Please explain your reasoning.

11. [1 point] Fit a new linear model for plasma, this time using the percent coverage of plaques in the brain column as your explanatory variable.  Because brain is numeric, lm() will fit a regression instead of a one-way ANOVA.  As you did in Question 2, add the residuals for the new model to the dataset and use them to check the model assumptions.  Provide any plots that you use, along with any test results and your conclusions. Note that you should be able to reuse most of your code from question 2 here

(5/5)
Attachments:

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme