(5/5)

pizza

For this assignment, name your R file pizza.R

• For all questions you should load tidyverse. You should not need to use any other libraries.

o Load tidyverse with suppressPackageStartupMessages(library(tidyverse))

• Download the pizza.csv file from Brightspace and place it in the same folder/directory as your script file. Then in RStudio, set your Working Directory to your Source File location:

• Load the pizza.csv file like this:

pizza <- read_csv('pizza.csv')

• Round all float/dbl values to two decimal places.

o If your rounding does not work the way you expect, convert the tibble to a dataframe by using as.data.frame()

• All statistics should be run with variables in the order I state

o E.g., “Run a regression predicting mileage from mpg, make, and type” would be:

lm(mileage ~ mpg + make + type...)

• In each of these you must use at least two dplyr functions. You may use Google to look up how to do certain aspects.

Before attempting to answer these, or if you lose points upon an attempt, please review all CodeGrade information provided in the CodeGrade overview submodule - if you do not you are likely to lose points.

1. Create a dataframe containing driver names of instances where free_wine = 1, discount_customer = 1, and the order contained more than 4 pizzas. (There will be repeated names).

• The answers should look like the following:

1 [value]

2 [value]

3 [value]

4 [value]

5 [value]

6 [value]

7 [value]

8 [value]

9 [value]

• If your CodeGrade output is <fct> instead of <chr>, you can use as.character(driver) to convert it

• Assign that to Q1

1. Create a variable that is the ratio of bill to pizza, called ratio. What is the mean of that value (call the value mean_ratio)?

• Assign this to Q2

1. For each day of the week, what is the variance in pizzas?

• The created values should be called var_pizzas.

• The answer should be assigned to Q3 and show look like the following:

1 Friday [value]

2 Monday [value]

3 Saturday [value]

4 Sunday [value]

5 Thursday [value]

6 Tuesday [value]

7 Wednesday [value]

1. Which operator had the higher average bill?

• The answer should be assigned to Q4.

1. What was the highest amount of free wine given by day/driver combination? (For instance, Friday Bruno was 13, while Wednesday Salvator was 12)

• The answer should be assigned to Q5 and look like the following:

# A tibble: 1 x 3

# Groups: day, driver [1]

day driver n

<chr> <chr> <int>

1 [day] [name] [value]

• Depending on how you do this, you might need to convert a <dbl> to <int>. You can convert a variable using as.integer().

nycflights13Stats

For this assignment, name your R file nycflights13stats.R

• For all questions you should load tidyverse, lm.beta, and nycflights13. You should not need to use any other libraries.

o If the tidyverse package is not installed, you’ll need to do a one-time installation from the Console Window in RStudio like this:

install.packages("tidyverse")

o If the lm.beta package is not installed, you’ll need to do a one-time installation from the Console Window in RStudio like this:

install.packages("lm.beta")

o If the nycflights13 package is not installed, you’ll need to do a one-time installation from the Console Window in RStudio like this:

install.packages("nycflights13")

o Load tidyverse with: suppressPackageStartupMessages(library(tidyverse))

o Load nycflights13 with:

suppressPackageStartupMessages(library(nycflights13))

o Load lm.beta with:

suppressPackageStartupMessages(library(lm.beta))

o The actual data set is called flights.

See the nycflights13 package page and chapter 5 from the textbook for more info.

o You can not attempt to install packages in CodeGrade.

• Round all float/dbl values to two decimal places.

o If your rounding does not work the way you expect, convert the tibble to a dataframe by using as.data.frame()

• All statistics should be run with variables in the order I state

o E.g., “Run a regression predicting mileage from mpg, make, and type” would be:

lm(mileage ~ mpg + make + type...)

Before attempting to answer these, or if you lose points upon an attempt, please review all CodeGrade information provided in the CodeGrade overview submodule - if you do not you are likely to lose points.

1. Address the outliers for departure delay as described in the outliers lectures, using 0.997 and 0.003 as the cutoffs. What percentage of data remains following the removal of these outliers?

• The answer should be assigned to Q1.

Answer the following questions using the nycflights13 dataset without outliers.

1. Run cor.test for the relationship between departure delay and distance.

• This answer should be assigned to Q2. You should not round.

1. Create a regression predicting departure delay from distance.

• The summary of the model should be assigned to Q3. You should not round.

1. Calculate standardized regression coefficients with lm.beta for the summary from Q3.

• Assign it to Q4. You should not round.

1. Create another regression, this time adding carrier to the regression from Q3.

• The summary of the model should be assigned to Q5. You should not round.

• NOTE: if you use a different method of removing outliers (see Q1) than what Dr. Longo uses in his videos you risk upsetting CodeGrade.

For this assignment, name your R file msleepStats.R

• For all questions you should load tidyverse and lm.beta. You should not need to use any other libraries.

o Load tidyverse with suppressPackageStartupMessages(library(tidyverse))

o If the lm.beta package is not installed, you’ll need to do a one-time installation from the Console Window in RStudio like this:

install.packages("lm.beta")

o Load lm.beta with:

suppressPackageStartupMessages(library(lm.beta))

o The actual data set is called msleep

• Round all float/dbl values to two decimal places unless otherwise specified.

• All statistics should be run with variables in the order I state.

o E.g., “Run a regression predicting mileage from mpg, make, and type” would be:

lm(mileage ~ mpg + make + type...)

Before attempting to answer these, or if you lose points upon an attempt, please review all CodeGrade information provided in the CodeGrade overview submodule - if you do not you are likely to lose points.

1. Run cor.test for the relationship between total sleep and body weight. You should not round these values.

• The answer should be assigned to Q1

1. Create a correlation matrix for the relations among total sleep, rem sleep, brain weight, and body weight. Make sure to remove missing values.

• The matrix should be assigned to Q2

1. Run a regression predicting body weight by vore.

• Assign the coefficients to Q3

1. Create a regression predicting bodywt by vore and REM sleep. Compared to the model in Q3, which one has the better AIC?

• Assign the better AIC value to Q4

1. Create a logistic regression predicting whether or not an animal is a carnivore or herbivore based on sleep total. Assign the model to Q5.

• You’ll need to filter out omnivores and insectivores:

filter(vore != "omni" & vore != "insecti")

• You will need to use the following code to create the variable you are predicting:

mutate(vorebin = ifelse(vore == 'carni', 0, 1))

• This will group insectivores with herbivores, which is what we want.

• You should not round these values.

fastfoodStats

For this assignment, name your R file fastfoodStats.R

• For all questions you should load tidyverse, lm.beta and openintro. You should not need to use any other libraries.

o If the openintro package is not installed, you’ll need to do a one-time installation from the Console Window in RStudio like this:

install.packages("openintro")

o If the lm.beta package is not installed, you’ll need to do a one-time installation from the Console Window in RStudio like this:

install.packages("lm.beta")

o Load libraries with

suppressPackageStartupMessages(library(tidyverse))

suppressPackageStartupMessages(library(lm.beta))

suppressPackageStartupMessages(library(openintro))

• Round all float/dbl values to two decimal places.

• All statistics should be run with variables in the order I state

o E.g., “Run a regression predicting mileage from mpg, make, and type” would be:

lm(mileage ~ mpg + make + type...)

To access the fastfood data, run the following:

fastfood <- openintro::fastfood

1. Create a correlation matrix for the relations between calories, total_fat, sugar, and calcium for all items at Sonic, Subway, and Taco Bell, omitting missing values with na.omit().

• Assign the matrix to Q1

1. Create a regression predicting whether or not a restaurant is McDonalds or Subway based on calories, sodium, and protein. (McDonalds should be 1, Subway 0)

• Save the coefficients to Q2

1. Run the same regression as in #2 but remove sodium as a predictor.

• Which model fits better? Save the AIC of the better model to Q3.

1. Run a regression predicting calories from saturated fat, fiber, and sugar. Based on standardized regression coefficients, identify the strongest predictor.

• Assign the unstandardized regression coefficient of the strongest predictor to Q4. (You can access the coefficients by indexing the model object)

1. For this question, use data from only restaurants with between 50 and 60 items in the data set. Predict total fat from cholesterol, total carbs, vitamin a, and restaurant. Remove any nonsignificant predictors and run again.

• Assign the strongest standardized regression coefficient to Q5.

pizzaStats

For this assignment, name your R file pizzaStats.R

• For all questions you should load tidyverse and lm.beta. You should not need to use any other libraries.

o Load tidyverse with suppressPackageStartupMessages(library(tidyverse))

o Load lm.beta with

suppressPackageStartupMessages(library(lm.beta))

• Download the pizza.csv file from Brightspace and place it in the same folder/directory as your script file. Then in RStudio, set your Working Directory to your Source File location:

• Load the pizza.csv file like this:

pizza <- read_csv('pizza.csv')

• Round all float/dbl values to two decimal places unless otherwise specified.

• All statistics should be run with variables in the order I state

o E.g., “Run a regression predicting mileage from mpg, make, and type” would be:

lm(mileage ~ mpg + make + type...)

• For the CodeGrade submission you will need to use pizza <- read_csv('pizza.csv'). This might be different from how you run it on your local computer, so please note the difference.

1. Create a correlation matrix for temperature, bill, pizzas, and got_wine.

• Your matrix should be assigned to Q1.

1. Create a correlation matrix of the relationships between time, temperature, bill, and pizzas for Laura in the East branch.

• Your matrix should be assigned to Q2.

1. Run a regression predicting whether or not wine was ordered from temperature, bill, and pizza.

• Assign the coefficients to Q3.

1. Run a regression predicting bill from temperature, pizzas, and got_wine.

• Assign the standardized regression coefficients to Q4 by using the lm.beta() function. You should not round these values.

1. Add operator to the regression from Q4. Which is the better model?

• Assign the better model’s AIC to Q5.

• Use the classical AIC (k=2).

(5/5)

CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,

Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This

7COM1028 Secure Systems Programming Referral Coursework: Secure

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme

Get Free Quote!

316 Experts Online