STAT 503 – Statistical Methods for Biology
Part 1: Alzheimer's disease is a degenerative neurological condition that affects approximately 50 million people worldwide. It has no cure and eventually leads to dementia, memory loss, and death. Early intervention and treatment may slow the disease's progression. However, because an Alzheimer's diagnosis requires patients to display symptoms of dementia, the disease is usually not recognized until it has become relatively advanced. There is therefore considerable interest in the development of blood tests or imaging tools that would allow patients to be diagnosed before symptoms appear.
Physically, Alzheimer's disease is closely associated with a protein called amyloid-b (Ab), which builds up and develops plaques in the hippocampus and other parts of the brain. Plaques begin to develop 10-20 years before patients show symptoms, so in theory, Ab levels could serve as an early diagnostic marker for Alzheimer's. However, the proteins are not normally detectable in the blood. Under natural conditions they can only be measured by autopsy.
In 2001, DeMattos et al. (Proceedings of the National Academy of Sciences 98:8850) described a method to make Ab detectable in blood samples from mice. Their technique used a monoclonal antibody to Ab. When injected into the mice, the antibody made it easier for Ab to enter the bloodstream and also slowed the proteins' degradation. As a result, the researchers saw a large increase in detectable Ab within 24-hours after injection.
In this homework, you will analyze data from a follow-up study (DeMattos et al. 2002, Science 5563:2264-2267) in which the researchers investigated the relationship between Ab levels in blood plasma and Ab loads in the brains of 49 mice. In each mouse, the plasma concentration of Ab was measured in pg/ml using ELISA (a standard laboratory method for quantifying proteins), and the Ab load in the hippocampus was measured as a percentage of total area using immunofluorescent staining (see Fig. 2 in the paper for example images).
You can find the data on the website in the file demattos_et_al_2002_amyloid.csv. In addition, you will find a PDF file containing the "original" data records (I read these data from a figure in the paper; they are not the author's actual notes). The data include two potential explanatory variables. The variable brain gives the percentage Ab load in the mouse's brain, and the variable severity classifies each mouse into one of four ordinal classes based on its Ab load. Please use these files to answer Questions 1–15.
We will analyze the data using both ANOVA and regression (or their non-parametric alternatives). Our goals are to determine whether plasma concentrations of Ab are related to brain Ab load, and to describe the relationship if it exists. Our methods will be somewhat different from those used by the authors.
1. [1 point] State the null and alternative hypotheses for the one-way ANOVA, using severity as the explanatory variable.
Null hypothesis: There is no relation between the plasma concentration of Ab and the brain Ab load in the brains of 49 mice.
Alternative hypothesis: there is a relationship between the plasma concentration of Ab and the brain Ab load in the brains of 49 mice.
2. Load demattos_et_al_2002_amyloid.csv into R. Make sure that severity is a factor. Then fit the model, obtain the residuals, and add them to your data frame as a new column (mutate() may be useful). Use the residuals to complete the following preliminary tasks:
a.[1 point] Check for possible outliers. Present any graphs that you use (with captions), and briefly explain what you found. If the plot(s) suggest that an outlier exists, (i) identify the row number for each potential outlier in the data, and (ii) explain how you have addressed the issue. If you find a mistake in the .csv file, correct it. If you leave an outlier unchanged, say so, and explain your reasoning.
Some guidance:
· Review Lecture 2.9. Remember the following principles:
1. Outliers are only a problem if they cause the data to violate the model’s distributional assumptions.
2. You should always check for outliers using the same values that you will check the distributional assumptions. You may want to run these checks (in 2b) before you decide how to handle any outliers.
3. Consider the overall distribution of the data. If a point looks like a possible outlier in a boxplot, but is not very extreme relative to the outermost point in the opposite direction, and the data appear to be normal, then the "outlier" may not have a substantial effect on the analysis.
4.Data should never be removed from an analysis unless it is unambiguously flawed and you cannot fix it.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme