logo Hurry, Grab up to 30% discount on the entire course
Order Now logo

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Sam BingamanAccounting
(5/5)

653 Answers

Hire Me
expert
Bhargav AppasaniiEngineering
(/5)

765 Answers

Hire Me
expert
Violeta BaoMarketing
(5/5)

916 Answers

Hire Me
expert
Noormehal MohamaadFinance
(5/5)

949 Answers

Hire Me
R Programming
(5/5)

we use the linear regression model to predict the number of rented bikes on a particular day, given weather and calendar information.

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Assignment 2: Modeling, Uncertainties and Feature Importance

M Loecher

source("utils.R")

1 Introductory Comments

This assignment serves two purposes. First, it will train your modeling and data manipulation skills, in particular, linear regression and summarizing data the dplyr way. Second, it reinforces your computational statistics (or: understanding statistics by simulation) competencies by comparing theoretical standard errors to those obtained by simulation. And as a bonus, you get to explore a cool data set and learn about feature importance.

You should work in groups (ideally about 3 students per group). Each group must submit at least one R-file containing well-documented functions and test cases to test the functions. You may use two files (one for the functions and another one for the testcases), but this is not necessary. Write your answers and explanations as comments into the R-File. We strongly encourage you to submit an Rmd-file (plus its compiled version) instead of an R-File.

2 Data

You need to download the Bike Sharing Dataset from the UCI Machine Learning Repository http://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset, read in the day.csv data and preprocess your data as follows.

bikes <- read.csv("Bike-Sharing-Dataset/day.csv", stringsAsFactors = FALSE)

#bikes$days_since_2011 = as.numeric(as.Date(bikes$dteday)-as.Date("2011-01-01"))

bike.features.of.interest = c("season","holiday","workingday", "weathersit","temp", "hum", "windspeed", "days_since_2011", "cnt") # colnames(bike)[c(1,4,6,7,8,9,10,12)]

bikes = clean.bike.data(bikes)[,bike.features.of.interest]

options(digits=2)

#datatable(bikes[1:50,c(bike.features.of.interest, "cnt")])

kable(bikes[1:5,])

season holiday workingday weathersit temp hum windspeed days_since_2011 cnt

WINTER NO HOLIDAY NO WORKING DAY MISTY 8.2 81 11 0 985

WINTER NO HOLIDAY NO WORKING DAY MISTY 9.1 70 17 1 801

WINTER NO HOLIDAY WORKING DAY GOOD 1.2 44 17 2 1349

WINTER NO HOLIDAY WORKING DAY GOOD 1.4 59 11 3 1562

WINTER NO HOLIDAY WORKING DAY GOOD 2.7 44 13 4 1600

Create a random subset of the data, which leaves a “hold out data set” for testing

set.seed(123) nTrain=round(nrow(bikes)/2) ranRows = sample(nrow(bikes),nTrain) train = bikes[ranRows, ] test = bikes[-ranRows, ]

3 Data Summaries

(2 points) Using dplyr (group_by() , summarise() ) and ggplot2:

Compute the average bike rental counts as a function of

weathersit

workingday

all combinations of weathersit and workingday

(2 points) Repeat the above using the function lm() only.

4 Standard Errors in Linear Regression

In this example, we use the linear regression model to predict the number of rented bikes on a particular day, given weather and calendar information. For the interpretation, we examine the estimated regression weights. The features consist of numerical and categorical features. For each feature, the table shows the estimated weight, the standard error of the estimate (SE), and the absolute value of the t-statistic (|t|).

#data(bike)

mod = lm(cnt ~ ., data = train, x = TRUE)

lm_summary = summary(mod)$coefficients

lm_summary[,'t value'] = abs(lm_summary[,'t value'])

rownames(lm_summary) = pretty_rownames(rownames(lm_summary))

kable(lm_summary[,c('Estimate', 'Std. Error', 't value')], digits = 1, col.names = c('Weight', 'SE', "|t|"))

(5/5)
Attachments:

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme