Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Drop Files Here Or Click to Upload

Or Get Complete Course Help

Cole GrayyEconomics

(5/5)

534 Answers

Hire Me

Kevin ConnellyComputer science

(4/5)

792 Answers

Hire Me

Cesar MeyerStatistics

(5/5)

813 Answers

Hire Me

StatAnalytica ExpertScience

(5/5)

617 Answers

Hire Me

R Programming

(5/5)

Design Logistic and univariate time series models of cars and air pollution data, respectively.

INSTRUCTIONS TO CANDIDATES

ANSWER ALL QUESTIONS

Project # 2

Modeling Categorical and Univariate Time Series

Assignment: Design Logistic and univariate time series models of cars and air pollution data, respectively. The data set for this project is on the class web page under AirQualityUCI.csv and cars data.csv. To complete this assignment, answer the questions and follow the instructions given below. Submit your assignment before 11:59 pm on August 5, 2020. Note: No extension possible

Assignment Objective: In this assignment, you will demonstrate your ability to model categorical and time-series data.

Data:

1. The car's data is a modified version of the UCI Auto MPG Data Set present at https://archive.ics.uci.edu/ml/datasets/Auto+MPG.

2. The data in AirQualityUCI.csv provides hourly observations of ambient pollutant concentrations in an Italian city from March 2004-February 2005. You will first need to impute missing values (currently -200). You will then need to aggregate the hourly data to daily maxima using the ‘aggregate’ function in R. You will use this dataset to build models. of ambient daily maximum carbon monoxide (CO), and nitrogen dioxide (NO2) concentrations

Instructions:

1. You may discuss this assignment with other students in the class. However, work with your group and reference any contributions from others. You must pledge your submission.

2. Use the assignment page on the Collab site for submission.

3. For this assignment, you will turn in a detailed R Markdown notebook in both Rmd and PDF formats with your analysis and responses to the questions below.

4. Make sure you label all your answers appropriately in the R Markdown file.

Assignment:

1. Modelling categorical data: Explore the cars data (cars data.csv on collab) and address the following items (65 points)

(a) Create a binary variable, mpg01, that contains a 1 if mpg contains a value above its median, and a 0 if mpg contains a value below its median. Make Honda as a base case in brand variable. ( 5 (= 2.5 + 2.5))

(b) Explore the data graphically in order to investigate the association between mpg01 and the other features. Which of the other features seem most likely to be useful in predicting mpg01? Scatterplots and boxplots may be useful tools to answer this question. Describe your findings. ( 10 (= 5 + 5))

(d) Perform logistic regression on the training data in order to predict mpg01 using the variables that seemed most associated with mpg01 in (b). Compute confusion matrix at a threshold of 0.5 . (5 (=3+2))

(e) Perform the logistic regression after Log transforming your continuous predictors. Use categorical features as such, if any. Follow 80:20 data split rule. Perform prediction and compute confusion matrix at a threshold of 0.5. ( 10 (= 5 + 5))

(f) Perform Principal Components Regression using all the predictors as the input such that the principal components account for 95% of the variance. Make sure you follow the 80:20 rule of data division. Com- pute the confusion matrix at a threshold of 0.5. ( 10 (= 5 + 5))

(g) Compute Confusion Matrices for (f) at thresholds of 0.2, 0.5, and 0.8. Describe your findings (5 = (2+ 3))

(h) Draw ROC curves using the models obtained in (d), (e), and (f). Describe your findings. (5 = (2+ 3))

(i) Identify the interactions between the predictors. (10 points)

i. Use graphical approach to identify interactions. Describe each plot. (5)

ii. Use aov() function as well. Describe each result. (5)

2. Building Univariate Time Series Models: Build two univariate time series models of daily maximum carbon monoxide (CO) and nitrogen diox- ide (NO2) concentrations (one model for each pollutant). Make sure to address the following items: (30 points)

(a) How you discovered and modeled any seasonal components, if appli- cable. (5 )

(b) How you discovered and modeled any trends, if applicable. (5)

(d) How you assessed your models (e.g. adjusted R2, AIC, diagnostics, etc.) to select one model for each pollutant. Assessments should dis- cuss diagnostics and at least one metric. Show and discuss diagnostics of both the linear models of trends and seasonality and the ARIMA models of the residuals. (10)

(e) What problems, if any, remain in the diagnostics of the selected models. (5)

Formatting (5 points) will be based on organization and readability of knitted R markdown file (in .pdf form) and responses. Make sure the text wraps neatly, is near figures and models being discussed, and all group members’ names are listed on the assignment.

(5/5)

Hurry, Grab up to 30% discount on the entire course

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Cole GrayyEconomics

Kevin ConnellyComputer science

Cesar MeyerStatistics

StatAnalytica ExpertScience

R Programming

Design Logistic and univariate time series models of cars and air pollution data, respectively.

ANSWER ALL QUESTIONS

Attachments:

Instructions Files

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

Other Services

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Cole GrayyEconomics

Kevin ConnellyComputer science

Cesar MeyerStatistics

StatAnalytica ExpertScience

R Programming

Design Logistic and univariate time series models of cars and air pollution data, respectively.

ANSWER ALL QUESTIONS

Attachments:

Instructions Files

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer