Introduction to SAS for Public Health Students
Datasets:
The National Health and Nutrition Examination Survey (NHANES) 4-Year (2011-2014) Data
• nhanes_4yr.sas7bdat
2012-2014 National Health Interview Survey (NHIS) person files
• personsx12.sas7bdat
• personsx13.sas7bdat
• personsx14.sas7bdat
2012-2014 National Health Interview Survey (NHIS) family files
• familyxx12.sas7bdat
• familyxx13.sas7bdat
• familyxx14.sas7bdat
2012-2014 National Health Interview Survey (NHIS) injury/poisoning files
• injpoiep12.sas7bdat
• injpoiep13.sas7bdat
• injpoiep14.sas7bdat
Question 1 is based on the NHANES 4-year data. You can find the description of variables in its codebook (attached with a final project in Carmen).
1. (20 points)
(a) (i) Read the dataset ‘nhanes_4yr.sas7bdat into SAS.
(ii) Create the variables BMICAT and HYPTS using the following conditions.
For non-missing values, BMICAT is “Underweight” if body mass index (BMI) is less than 18.5, BMICAT is “Healthy” if BMI is 18.5 but less than 25, BMICAT is “Overweight” if BMI is 25 but less than 30, and BMICAT is “Obese” if BMI is 30 and above. For non-missing values, HYPTS is equal to 1 if systolic blood pressure is greater than or equal to 120 or if diastolic blood pressure is greater than or equal to 80. And HYPTS is equal to 0 if systolic blood pressure is less than 120 and if diastolic blood pressure is less than 80.
Also, create labels for the variables BMICAT and HYPTS as follows:
BMICAT ~ BMI Categories
HYPTS ~ Hypertension Status
Print out the first 20 observations for only the variables SEQN, BMICAT, and HYPTS. To obtain full credit, ensure that your output contains the labels for each variable (and not the variable names).
Copy and paste your output (with an appropriate title) here.
(b) Create formats for the age variable RIDAGEYR according to the following scheme.
RIDAGEYR (in years) 13-18 ~ Teen
19-64 ~ Adults
65 and above ~ Elderly
For each of the age category above (Teen, Adults, Elderly), perform appropriate procedures to generate the following summary statistics listed below (rounded to 3 decimal places) for the variables ‘RIDAGEYR’, ‘BMXWT’, ‘BMXHT’, ‘BMXBMI’:
Number of non-missing values, number of missing values, mean, median, standard deviation, minimum value, maximum value, and 97% confidence interval for the population mean.
Note: Here, exclude respondents 12 years and below in the analysis.
Copy and paste your output (with an appropriate title) here.
(c) (i) Create formats for the variable HYPTS you created in part-(a)-(ii) according to the following scheme.
HYPTS 0 ~ No hypertension
1 ~ Pre hypertension/Stage 1 or 2 HTN Perform appropriate procedures to answer the following questions.
(ii) Using the created formats in parts -(b) and -(c)-(i), generate two-way contingency tables (without the total percentages) for the variables:
RIDAGEYR and BMICAT
RIDAGEYR and HYPTS
Note: Here, exclude respondents 12 years and below in the analysis.
Copy and paste your output (with an appropriate title) here.
(iii) Based on your output in part-(c)-(ii), determine and report the percentage of Teens considered to be overweight or obese in this dataset.
(iv) Based on your output in part-(c)-(ii), which age group has the largest proportion among people with pre hypertension or stage 1 or 2 HTN in this dataset?
Question 2 is based on your modified NHANES 4-year data (from question 1 part-a). You can find the description of variables in its codebook (attached with final project in Carmen).
2. (a) (8 points)
(i) Use PROC UNIVARIATE to perform a 2-sided one sample t-test (and using alpha = 0.01) to test the null hypothesis: the mean weight = 65 kg.
Make sure to produce the following graphs/statistics:
Histogram with KERNEL density plot, mean, median, standard deviation, 25%
quantile, 75% quantile, 90th percentile (Note: place the summary statistics with an appropriate header at the middle-right portion of the plot window).
Copy and paste your output (with an appropriate title) here. Only include results for the t-test and the histogram with the KERNEL density plot and summary statistics.
(ii) Is there evidence to reject the null hypothesis? Briefly give your reasons and clearly state your conclusions.
(b) (10 points)
(i) Perform an ANOVA test (and using alpha = 0.05) to examine whether the mean
body mass index differs by race and Hispanic origin. Create format for the race/hispanic variable ‘RIDRETH1’ as below and apply the created format in your analysis so that your output will be easy to read/understand.
RIDRETH1 1, 2 ~ Hispanic
3 ~ Non-Hispanic White
4 ~ Non-Hispanic Black
5 ~ Other Race
Copy and paste your output (with an appropriate title) here. Only include results for the ANOVA test and the graph of boxplots.
(ii) Report the F-statistic and the p-value for the F-test?
(iii) Is there evidence that the mean body mass index differs by race of Hispanic origin? Briefly give your reasons.
(iv) If your answer to part-(b)-(iii) is YES, then use TUKEY’s method to determine where there are pairwise differences.
Copy and paste your output here. Include ONLY the table for the comparisons.
(c) (14 points) Investigators want to examine the linear relationship between body mass index (BMI) and HDL-Cholesterol. They decided to regress BMI on HDL-Cholesterol.
Note: Here, exclude respondents 12 years and below in the analysis.
(i) Use the appropriate procedure to estimate the intercept (0) and slope (1) of the least squares (or regression) line. Also, produce a scatter plot (with the regression line, and confidence and prediction bands) and a diagnostic plot.
Copy and paste your output (with an appropriate title) here.
(ii) From your diagnostic plot, what can say about the normality assumption for the error term?
(iii) Give practical interpretations to the estimates of the intercept (0) and the slope (1).
Is the interpretation of the intercept (0) practically meaningful? Briefly explain.
(iv) Write down the equation of the least squares (regression) line.
(v) What is the expected BMI for a participant with an HDL-Cholesterol of 85 mg/dL in the study? (Round your answer to 2 decimal places.)
(vi) Is there a significant (linear) relationship between BMI and HDL-Cholesterol? Use the appropriate results in the output to briefly explain.
(d) (18 points) Another researcher is also interested in how variables, such as RIDAGEYR, BMICAT, RIDRETH1, LBDLDL, and RIAGENDR, affect hypertension status (HYPTS).
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme