logo Hurry, Grab up to 30% discount on the entire course
Order Now logo

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Fiona EwinggPsychology
(5/5)

825 Answers

Hire Me
expert
Tanusree KunduMathematics
(/5)

870 Answers

Hire Me
expert
Pankaj KukrejaSociology
(5/5)

699 Answers

Hire Me
expert
Thaissa LannesLaw
(5/5)

719 Answers

Hire Me
STATA
(5/5)

We want to describe the majority of the data points with our regression analysis.

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

BRFSS Week 2 Assignment

This week you will continue to work on your analysis by cleaning your BRFSS data. Data cleaning is always the most tedious part of the analysis process, and often takes much longer than expected. Starting the data cleaning process well-ahead is an important practice in avoiding binge analysis, which tends to lead to mistakes.

 

To Submit for Week 2:

(1) Your Week 2 DO file.

 

(2) A Word document that includes the following, in this order:

 

- The total observations left in your subsample for the three variables listed below in after updating the include variable in the data cleaning process. Don’t forget to only include the observations where include == 1.  Please write this in the following format: “Total observations after data cleaning: N=XXXXX”

(The XXs are where you fill in the number of observations)

- Screenshots of your output from the summarize command above for BMI, ssbweekcat, and evercvd that you ran at the end of the data cleaning process. Don’t forget to only include the observations where include == 1. Note: the observations for all three variables should be the same.  (Screenshots are preferable to copying and pasting the graphs directly because that will make the file very large)

__________________

The demographic variables selected as covariates for this analysis are factors that are typically adjusted for in regression analysis as they represent some of the individual characteristics that may increase risk for obesity and are common confounders in a variety of diet-disease relationships. As we discuss in the course materials for Week 2, these variables are important risk factors to evaluate, and they are especially important to address as potential confounders in a data analysis project. Without removing the effect of sex, age, race, and the other factors detailed below, we can’t be sure that the relationship we observe between SSBs and BMI is a true effect.

 

Adjusting for them in the multiple linear regression will remove their confounding influence from the model and allow us to see the true relationship between SSBs and BMI more clearly.

 

One thing to note is that the 2016 BRFSS data does not include other dietary variables besides SSB consumption. As we’ll discuss over the coming weeks, diet plays a critical role in the risk for developing overweight and obesity. The lack of another dietary data, such as from a 24-hour recall or food frequency questionnaire, or even just vegetable consumption that might serve as a proxy for the rest of the individual’s diet, is a limitation of this dataset. In a later assignment, you will be asked to discuss this limitation and its potential effect on the results.

However, right now we need to clean the data to make sure our regression model will be based on data that meets the assumptions of linear regression.

Data cleaning means looking at each variable, examining the distribution for the presence of outliers or errors, and then making a decision about each of these data points. Do you retain or exclude? This assignment leads you step-by-step through the data cleaning process for each of the variables we’ll be using in the crude and adjusted analyses.

 

Data cleaning is a process that involves a lot of individual decisions, and sometimes the decisions are not clear-cut. There may not be any black and white answers. Knowing how to clean data is a process that requires practice and experience. In the end, there is always a component of subjective judgement on the part of the analyst. Our process here represents one way to handle the data cleaning, based on the accepted principles that:

 

 

1. Outliers may represent unusual circumstances or exposures that may be so different from the rest of the data that the same relationships we see in most of the data may not hold.

2. We want to describe the majority of the data points with our regression analysis.

3. How we define outliers does depend to some degree on what the research questions are.

4. Individual decisions about specific variables are described in our data analysis plan (See lecture 2.4)

 

(5/5)
Attachments:

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme