logo Hurry, Grab up to 30% discount on the entire course
Order Now logo

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Roshan NehraPhilosophy
(5/5)

591 Answers

Hire Me
expert
Norman RoxManagement
(4/5)

821 Answers

Hire Me
expert
Sahil SachdevaComputer science
(5/5)

918 Answers

Hire Me
expert
Joan DomettEngineering
(4/5)

653 Answers

Hire Me
R Programming
(5/5)

It contains information on the causes of death in 2019 and 2020 (up to August 15th, 2020) in the U.S.

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Covid Assignment

 

General Issues:

1.  Please include both R-code and output in your answers.

2.  Please keep you output to the minimal but sufficient to answer the questions

3.  Please shrink your plots without compromising their intelligibility

4.  Please include commonly used R functions such as “select,” “rename,” “recode” etc. even when it is not part of the question but necessary for you to obtain answers.

5.  There is no need to elaborate on social, economic, political, or cultural implication of your findings. Please keep your answers short and relevant.

This project is based on the data set named “causes_of_death_19_20.csv.” It contains information on the causes of death in 2019 and 2020 (up to August 15th, 2020) in the U.S. by the week and the state. However, the “jurisdiction of occurrence” variable contains a few categories that are not states; ex. New York City, DC, and the United States. Moreover, there are two variables that measure the number of covid-19 deaths as indicated in the table below. We will focus on the one named “COVID-19 (U071, Multiple Cause of Death).”

Here is a description of the variables included:

Column Name

Description

Jurisdiction of Occurrence

Jurisdiction of Occurrence

MMWR Year

MMWR Year (MMWR stands for Morbidity and Mortality Weekly Report from CDC)

MMWR Week

MMWR Week

Week Ending Date

Week Ending Date

All Cause

All Cause

Natural Cause

Natural Cause

Septicemia (A40-A41)

Septicemia (A40-A41)

Malignant neoplasms (C00-C97)

Malignant neoplasms (C00-C97)

Diabetes mellitus (E10-E14)

Diabetes mellitus (E10-E14)

Alzheimer disease (G30)

Alzheimer disease (G30)

Influenza and pneumonia (J09-J18)

Influenza and pneumonia (J09-J18)

Chronic lower respiratory diseases (J40-J47)

Chronic lower respiratory diseases (J40-J47)

Other diseases of respiratory system (J00-J06,J30-J39,J67,J70-J98)

Other diseases of respiratory system (J00-J06,J30-J39,J67,J70-J98)

Nephritis, nephrotic syndrome and nephrosis (N00-N07,N17-N19,N25-N27)

Nephritis, nephrotic syndrome and nephrosis (N00-N07,N17-N19,N25-N27)

Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified (R00-R99)

Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified (R00-R99)

Diseases of heart (I00-I09,I11,I13,I20-I51)

Diseases of heart (I00-I09,I11,I13,I20-I51)

Cerebrovascular diseases (I60-I69)

Cerebrovascular diseases (I60-I69)

COVID-19 (U071, Multiple Cause of Death)

COVID-19 (U071, Multiple Cause of Death)

COVID-19 (U071, Underlying Cause of Death)

COVID-19 (U071, Underlying Cause of Death)

 

1.      Create a data frame in R to include only information on the state, year, week, week ending day, deaths of all causes, and the variable “COVID-19 (U071, Multiple Cause of Death). (2 points) 

2.      Rename variables and recode some variables as you see fit. (2 points) (Hint: Most of the variables have long names) 

3.      Convert the “week ending date” variable into a “Date” data type. Check if you have succeeded in the conversion. (1 points) (Hint: the format you need to indicate as an argument in one of the variations of “mm-dd-yyyy” is not the format you intend to have but the format in the original data). 

4.      (5 points)

a.      Create a subset that contains only information of 2020 and includes only variables of the state, the week, and "covid-19 Multiple Cause of Death."

b.      Then take a further step to only select the states (city) of Arizona, California, Florida, New Jersey, New York, New York City, and Texas. (NYC and DC are included as non-state jurisdiction)

c.       As a last step, "spread" the data frame by week (making "week" into columns) on the values of the variable "covid-19 Multiple Cause of Death ".

d.      At which week did covid-19 deaths started to occur in most of the 6 states and NYC? 

5.      (6 points)

a.      Use the data frame from step 4 (the one that contains only 2020 information) to plot the number of covid-19 deaths (Multiple Cause of Death) by the week; and add a dimension of "state" by color.

b.      Which state (or city) saw deaths peak the earliest? At which week?

c.       What state(s) (or city) saw a second wave of death? During which weeks? 

6.      (7 points)

a.      (Use the data frame from step 5) Use the "summarise" function in "dplyr" to create a data set that contains only “the total number of death of covid (with multiple causes)” and the state (city) name. Then "merge" the new data frame with "population_by_state_2019.csv" (available on Canvas) to combine the death toll with information on state population. You should end up with a data frame containing 3 variables: state, total covid death, and state population.

b.      Obtain a bar-chart to see which state has the highest covid death toll.

c.       Calculate a new variable of “percentage of covid death in the total population” and save it to the new (small) data frame (Hint: you can use either the mutate function or simple base R function).

d.      Please plot the percentage you obtain in the previous step by the state. Which state has the highest percentage of covid death in the total population? 

7.      (7 points)

a.      Formulate a data frame for 2020 that contains only

1.      the state,

2.      covid death (Multiple Cause of Death),

3.      and deaths of all causes (basically the number of all people who died in the first 33 weeks in 2020)  

This data frame should contain 3 variables but all cases.

b.      Then use the "summarise" function to create a smaller data frame with aggregated information that contains

1.      total covid deaths and

2.      total deaths of all causes

c.       Next, reduce the data frame to include only the 6 states above and NYC, as we did in the previous questions.

d.      Then create a variable: the percentage of covid deaths in total deaths; plot this percentage by state. Which state (city) has the highest percentage of covid deaths of all deaths?

(5/5)
Attachments:

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme