This assignment is due after Module 14, on Wednesday, April 28 at 11:59pm. Please copy and paste all of your SAS code, clean log, and output, fill out the below table, and submit the document through Canvas.
The objective is to determine whether there are differences in selected demographic or clinical characteristics among 2019 National Health Interview Survey respondents who ever used e-cigarettes. must use the following proc format code, attachments do not work:
VALUE SA002X
18-84 = "18-84 years"
85 = "85 85+ years"
97 = "97 Refused"
98 = "98 Not Ascertained"
99 = "99 Don't Know";
VALUE SA115X
1 = "1 Male"
2 = "2 Female"
7 = "7 Refused"
8 = "8 Not Ascertained"
9 = "9 Don't Know";
VALUE SA005X
59-76 = "59-76 inches"
96 = "96 Not available"
97 = "97 Refused"
98 = "98 Not Ascertained"
99 = "99 Don't Know";
VALUE SA031X
100-299 = "100-299 pounds"
996 = "996 Not available"
997 = "997 Refused"
998 = "998 Not Ascertained"
999 = "999 Don't Know";
VALUE SA125X
1 = "1 Yes"
2 = "2 No"
7 = "7 Refused"
8 = "8 Not Ascertained"
9 = "9 Don't Know";
VALUE SA082X
00 = "00 Never attended/kindergarten only"
01 = "01 Grade 1-11"
02 = "02 12th grade, no diploma"
03 = "03 GED or equivalent"
04 = "04 High School Graduate"
05 = "05 Some college, no degree"
06 = "06 Associate degree: occupational, technical, or vocational program"
07 = "07 Associate degree: academic program"
08 = "08 Bachelor's degree (Example: BA, AB, BS, BBA)"
09 = "09 Master's degree (Example: MA, MS, MEng, MEd, MBA)"
10 = "10 Professional School degree (Example: MD, DDS, DVM, JD)"
11 = "11 Doctoral degree (Example: PhD, EdD)"
97 = "97 Refused"
98 = "98 Not Ascertained"
99 = "99 Don't Know";
• Bring data into SAS (2 points)
• Combine datasets for former and current e-cigarette users (2 points)
o Make sure that in the combined dataset, there is a variable that will allow you to determine if an observation is a current e-cigarette user or not.
• Prepare and clean combined data set (15 points)
o Please check variable names and values (also listed in Codebook). All included variables are numeric.
o Use the provided stroke, heart attack, angina and coronary heart disease variables to create a new composite variable for any cardiovascular disease (named any_cvd)
Participants who reported that they had at least one of these conditions would be coded as “Yes”
Participants who answered “No” to all of the questions asking if they had any of these conditions would be coded as “No”
All other participants should be coded as missing
o Exclude anyone from the dataset if the person has a missing value for any_cvd or invalid (Refused, Not Available, Don’t Know, Not Ascertained) value for one or more of the other variables.
o Recode education variable into a new categorical variable (named educ_cat), new categories are listed below, with values of original variable in parentheses
<HS Graduate (00, 01, 02)
HS Graduate/GED (03, 04)
Some College (05)
Associate Degree (06, 07)
Bachelors Degree or Higher (08, 09, 10, 11)
o Use height and weight variables and the following equation to calculate body mass index (BMI) (named bmi_calc), rounded to nearest tenth.
Note: SAS expression for exponentiation to the nth power: variable**n
• Save final cleaned dataset as a permanent dataset (1 point)
• Produce descriptive statistics (2.5 points, 0.5 point for each characteristic)
• Produce p-values (2.5 points, 0.5 point for each characteristic)
• Complete final report (5 points). The final report should address the following:
o Describe the process you went through to clean/recode the data. Essentially, your report should allow someone else to replicate your work, based on information provided in the report, and come to the same findings/conclusions.
For each original variable:
• How many observations had missing or invalid values (e.g., Don’t Know, Refused)? How did you handle this?
• Did you have to recode this variable? If yes, how did you do it?
• Did you use this variable to create another variable?
• What is the impact of the approaches you used to clean the data?
What are the attributes of the cleaned data set that you saved permanently?
• How many total included observations?
• Number of observations having missing/invalid/implausible values for one or more variables
• Distribution of each variable
What procedures did you use to obtain the descriptive statistics and the p-values?
• Why?
The data for the extra credit question come from the 2020 Pew Research Center’s American Trends Panel: Wave 67 Climate and Coronavirus, conducted April 29 – May 5, 2020. Please complete the following:
1. Import the data set (1 point)
2. Please use ARRAY and DO loops to recode variables corresponding to the following questions in the questionnaire (3 points):
a. COVID_SCI6_a_W67, COVID_SCI6_b_W67, COVID_SCI6_c_W67, COVID_SCI6_d_W67
b. Recode “Definitely will happen” and “Probably will happen” to “Likely to Happen” (create new variables)
c. Recode “Probably will NOT happen” and “Definitely will NOT happen” to “Unlikely to happen” (create new variables).
d. Please create 4 frequency tables for each original variable against its newer version.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme