logo Hurry, Grab up to 30% discount on the entire course
Order Now logo

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Anthony BidiniBusiness
(5/5)

813 Answers

Hire Me
expert
Rajiv BhatiyaHistory
(5/5)

563 Answers

Hire Me
expert
Lakshay AggarwalEngineering
(5/5)

967 Answers

Hire Me
expert
Kate DuggannMathematics
(5/5)

612 Answers

Hire Me
R Programming
(5/5)

Next, create a simple regression model predicting Ozone based on Wind. Refer to page

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

The chapter on linear models (“Lining Up Our Models”) introduces linear predictive modeling using the tool known as multiple regression. The term “multiple regression” has an odd history, dating back to an early scientific observation of a phenomenon called “regression to the mean.” These days, multiple regression is just an interesting name for using linear modeling to assess the connection between one or more predictor variables and an outcome variable. 

In this exercise, you will predict Ozone air levels from three predictors.

Please make sure you have included an attribution statement (see syllabus if you have questions).

A. We will be using the airquality data set available in R. Copy it into a dataframe called air and use the appropriate functions to summarize the data. 

B. In the analysis that follows, Ozone will be considered as the outcome variable, and Solar.R, Wind, and Temp as the predictors. Add a comment to briefly explain the outcome and predictor variables in the dataframe using ?airquality.

C. Inspect the outcome and predictor variables – are there any missing values? Show the code you used to check for that.

D. Use the na_interpolation() function from the imputeTS package from HW 6 to fill in the missing values in each of the 4 columns. Make sure there are no more missing values using the commands from Step C.

E. Create 3 bivariate scatterplots (X-Y) plots for each of the predictors with the outcome. Hint: In each case, put Ozone on the Y-axis, and a predictor on the X-axis. Add a comment to each, describing the plot and explaining whether there appears to be a linear relationship between the outcome variable and the respective predictor.

F. Next, create a simple regression model predicting Ozone based on Wind. Refer to page 202 in the text for syntax and explanations of the lm( ) command. In a comment, report the coefficient (aka slope or beta weight) of Wind in the regression output and, if it is statistically significant, interpret it with respect to Ozone. Report the adjusted R-squared of the model and try to explain what it means. 

G. Create a multiple regression model predicting Ozone based on Solar.R, Wind, and Temp. Make sure to include all three predictors in one model – NOT three different models each with one predictor.

H. Report the adjusted R-Squared in a comment – how does it compare to the adjusted R-squared from Step F? Is this better or worse? Which of the predictors are statistically significant in the model? In a comment, report the coefficient of each predictor that is statistically significant. Do not report the coefficients for predictors that are not significant.

I. Create a one-row data frame like this: 

predDF <- data.frame(Solar.R=290, Wind=13, Temp=61) and use it with the predict( ) function to predict the expected value of Ozone.

J. Create an additional multiple regression model, with Temp as the outcome variable, and the other 3 variables as the predictors. Review the quality of the model by commenting on its adjusted R-Squared.  

Association mining can be applied to many data problems beyond the well-known example of finding relationships between different products in customer shopping data. In this homework assignment, we will explore real data from the banking sector and look for patterns associated with the likelihood of responding positively to a direct marketing campaign and signing up for a term deposit with the bank (stored in the variable “y”). You can find out more about the variables in this dataset here: https://archive.ics.uci.edu/ml/datasets/bank+marketing

Please make sure you have included an attribution statement (see syllabus if you have questions).

Part 1: Explore Data Set

A. Copy the contents of the following URL to a dataframe called bank:

Hint: Even though this is a .csv file, chances are R won’t be able to read it in correctly using the read_csv() function. If you take a closer look at the contents of the URL file, you may notice each field is separated by a semicolon (;) rather than a comma. In situations like this, consider using something like this:

bank <- read.table(url, sep=";", header = TRUE)

Make sure there are 41,188 rows and 21 columns in your bank df.

B. Next, we will focus on some key factor variables from the dataset, and convert a few numeric ones to factor variables. Execute the following commands and write a comment describing how the conversion for each numeric variable works and what the variables in the resulting dataframe are.

bank_new <- data.frame(job=bank$job,

                     marital=bank$marital,

                     housing_loan=bank$housing,

                     young=as.factor((bank$age<median(bank$age))),

                     contacted_more_than_once=as.factor((bank$campaign>1)),

                     contacted_before_this_campaign=as.factor((bank$previous<0)),

                     success=(bank$y))

C. Count the number of successful term deposit sign-ups, using the table( ) command on the success variable.

D. Express the results of problem C as percentages by sending the results of the table( ) command into the prop.table( ) command

E. Using the same techniques, show the percentages for the marital and housing_loan variables as well.

Part 2: Coerce the data frame into transactions

F. Install and library two packages: arules and arulesViz.

G. Coerce the bank_new data frame into a sparse transactions matrix called bankX.

H. Use the itemFrequency( ) and itemFrequencyPlot( ) commands to explore the contents of bankX. What do you see?

I. This is a fairly large dataset, so we will explore only the first 10 observations in the bankX transaction matrix: inspect(bankX[1:10]) 

Explain the difference between bank_new and bankX in a block comment.

Part 3: Use arules to discover patterns

Support is the proportion of times that a particular set of items occurs relative to the whole dataset. Confidence is proportion of times that the consequent occurs when the antecedent is present..  

J. Use apriori to generate a set of rules with support over 0.005 and confidence over 0.3, and trying to predict who successfully signed up for a term deposit. Hint: You need to define the right-hand side rule (rhs).

K. Use inspect()to review of the ruleset. 

L. Use the output of inspect( ) or inspectDT( ) and describe any 2 rules the algorithm found.  

(5/5)
Attachments:

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme