Purpose:
The purpose of this assignment is to demonstrate your understanding and appli- cation of logistic and linear regression, including statistical analysis and inter- pretation of analytics and results, and your ability to visually represent linear regression output. In the following class session, we’ll work on standardization, causation, and mean differences.
Instructions:
To complete this in-class assignment, you will use the brfss dataset using Stata in Apporto.
I would suggest creating a Word document to upload to Canvas that con- tains your answers to the questions below. Only Word documents or PDFs are accepted. I’d advise using Excel to format any output tables you produce
Be sure to follow all instructions and read questions carefully.
Part I:
We are interested in determining if self-reported number of days in poor health influences daily drinking events.
For this portion of the assignment, you’ll need to work with three variables in the BRFSS dataset: ”sexvar”, ”drocdy3”, and ”poorhlth”.
Make sure to conduct all cleanup needed before starting.
Linear Regression
1. Create a table that shows number of days in poor health and number of daily drinking events by sex. Make sure to include the mean, standard deviation, median, and mode for women and men.
2. Assess linearity by creating scatterplots for the relationship between num- ber of days in poor health and drinking events by sex. You should have
two scatterplots: one for women and one for men. Interpret the visual linear relationship between these variables.
3. Run the linear regression (make sure to include sex as a categorical vari- able).
4. Check for homoscadasticity, linearity, normality of residuals, and check for outliers using at least three outlier checks. Make sure to comment on observations that have undue influence on your results.
5. Create a table showing regression results and appropriate graphics that explain your results.
6. Briefly interpret your results, based on your results and assumption checks. Comment on any differences you see in your models for women vs. men.
Include in your document
• Your sample description table
• Your scatterplots, linearity test, homoscadasticity test, normality of resid- uals graphics
• Two tables and two scatterplots showing regression results
• Your interpretation of your results in plain words, including incorporation of assumptions.
• Based on your results, what conclusions can you derive about causation in your results? What could be done to infer causation if it cannot be inferred from your conclusions?
Part II:
Given our results from our previous work, we’ve decided it might be more ap- propriate to examine our outcome, number of daily drinking events, in terms of categorizing binge drinkers vs. non-binge-drinkers.
For this, we’ll be changing our outcome variable from ”drocdy3” to ”rf- bing5”.
Logistic Regression
1. Conduct appropriate data cleaning steps for the new outcome variable.
2. Find the mean and sd of number of poor health days by individuals marked as binge drinkers versus those who were not
3. Calculate the odds ratio for binge drinking (exposed) versus not binge- drinking (unexposed) for men and women.
4. Run the logistic regression.
5. Create a table with the logistic regression results and interpret using fit statistics (Pseudo R2 and Count R2)
6. Interpret the odds ratio for predictor
Include in your document
• Your descriptives table(s), including IRs for binge drinking in men and women
• A table with your logistic regression results, including ORs
• Your interpretation of your final results, including your interpretation of the odds ratios calculated by linear regression vs. hand-calculation of ORs
for men and women.
• Based on your results, what conclusions can you derive about causation in your results? What could be done to infer causation if it cannot be inferred from your conclusions?
• How do your logistic regression and OR information differ and/or add information compared to your initial OR calculations?
Part III:
Now we’d like to examine differences in daily drinking events. We hypothesize a few variables will predict differences in drinking, including region of residence (Northeast, Southeast, Southwest, Midwest, and West), being told one has a de- pressive disorder (ADDEPEV3), marital status (MARITAL), and employment status (EMPLOY1).
ANOVA
1. Conduct the appropriate data cleaning steps for the new predictor vari- ables. Note that you will need to create a new variable indicating region of residence.
2. Find the mean, sd, median, and range of daily drinking events for each of the groups listed, and put these into a demographics table including n for each group who have a daily drinking event count.
3. Create a visualization showing differences in number of daily drinking events by group.
4. Conduct the ANOVA and appropriate post-hoc tests.
5. Create a table showing ANOVA results and write a brief interpretation paragraph for your results.
Include in your document
• Your descriptives table(s)
• A table showing your ANOVA and post-hoc test results
• Your interpretation paragraph
• Based on your results, what conclusions can you derive about causation in your results? What could be done to infer causation if it cannot be
inferred from your conclusions?
• Your ”do” file with all relevant calculations from all three parts
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme