BRFSS Week 2 Assignment
This week you will continue to work on your analysis by cleaning your BRFSS data. Data cleaning is always the most tedious part of the analysis process, and often takes much longer than expected. Starting the data cleaning process well-ahead is an important practice in avoiding binge analysis, which tends to lead to mistakes.
To Submit for Week 2:
(1) Your Week 2 DO file.
(2) A Word document that includes the following, in this order:
- The total observations left in your subsample for the three variables listed below in after updating the include variable in the data cleaning process. Don’t forget to only include the observations where include == 1. Please write this in the following format: “Total observations after data cleaning: N=XXXXX”
(The XXs are where you fill in the number of observations)
- Screenshots of your output from the summarize command above for BMI, ssbweekcat, and evercvd that you ran at the end of the data cleaning process. Don’t forget to only include the observations where include == 1. Note: the observations for all three variables should be the same. (Screenshots are preferable to copying and pasting the graphs directly because that will make the file very large)
__________________
The demographic variables selected as covariates for this analysis are factors that are typically adjusted for in regression analysis as they represent some of the individual characteristics that may increase risk for obesity and are common confounders in a variety of diet-disease relationships. As we discuss in the course materials for Week 2, these variables are important risk factors to evaluate, and they are especially important to address as potential confounders in a data analysis project. Without removing the effect of sex, age, race, and the other factors detailed below, we can’t be sure that the relationship we observe between SSBs and BMI is a true effect.
Adjusting for them in the multiple linear regression will remove their confounding influence from the model and allow us to see the true relationship between SSBs and BMI more clearly.
One thing to note is that the 2016 BRFSS data does not include other dietary variables besides SSB consumption. As we’ll discuss over the coming weeks, diet plays a critical role in the risk for developing overweight and obesity. The lack of another dietary data, such as from a 24-hour recall or food frequency questionnaire, or even just vegetable consumption that might serve as a proxy for the rest of the individual’s diet, is a limitation of this dataset. In a later assignment, you will be asked to discuss this limitation and its potential effect on the results.
However, right now we need to clean the data to make sure our regression model will be based on data that meets the assumptions of linear regression.
Data cleaning means looking at each variable, examining the distribution for the presence of outliers or errors, and then making a decision about each of these data points. Do you retain or exclude? This assignment leads you step-by-step through the data cleaning process for each of the variables we’ll be using in the crude and adjusted analyses.
Data cleaning is a process that involves a lot of individual decisions, and sometimes the decisions are not clear-cut. There may not be any black and white answers. Knowing how to clean data is a process that requires practice and experience. In the end, there is always a component of subjective judgement on the part of the analyst. Our process here represents one way to handle the data cleaning, based on the accepted principles that:
1. Outliers may represent unusual circumstances or exposures that may be so different from the rest of the data that the same relationships we see in most of the data may not hold.
2. We want to describe the majority of the data points with our regression analysis.
3. How we define outliers does depend to some degree on what the research questions are.
4. Individual decisions about specific variables are described in our data analysis plan (See lecture 2.4)
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme