logo Hurry, Grab up to 30% discount on the entire course
Order Now logo

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Ligia QuintanaMarketing
(5/5)

927 Answers

Hire Me
expert
Bryce KimEnglish
(5/5)

982 Answers

Hire Me
expert
Alan DuderMarketing
(5/5)

950 Answers

Hire Me
expert
Marcus HayesFinance
(5/5)

646 Answers

Hire Me
R Programming
(5/5)

You are asked to create reports containing visualizations of the Boston Housing data

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Instructions

This PDF contains a long-form version of the questions, as well as a data appendix which is required to answer a few of the questions. No outside data is needed to answer these questions.

Visualizing Data

This section assesses the ability to interpret and communicate insights. The Boston Housing data in the appendix was used to make the visuals in this section.

Question 1: You are asked to create reports containing visualizations of the Boston Housing data. What tools, software, packages, and/or libraries would you use to create an interactive report? What if the document is required to be printed?

Question 2: Business leaders who will review this report want to understand median home values, Charles River proximity, and the relationship between the two. The following four visuals are considered. Which do you recommend to include in the report and why? (Remember to refer to the appendix for data details.)

Question 3: What changes would you make to the visual you chose in the previous question to make it more interpretable and visually appealing? In your report to business leaders, how would you describe the plot in one sentence?

Modeling Data

This section assesses the ability to think critically about variables and how they can be used to predict a desired outcome. Core competencies include understanding distributions in data, making appropriate data transformations, and selecting an appropriate model. The Boston Housing Data Exploration in the Appendix should be used for this section.

Question 4: Predicting River Proximity: Using the Boston Housing data, you want to predict which tracts are adjacent to the Charles river (as denoted by the chas variable). Based on the data summary in the appendix, what data cleaning, transformations, and/or feature engineering can be used to prepare the data prior to model training?

Question 5: Predicting River Proximity: Using the Boston Housing data, you want to predict which tracts are adjacent to the Charles river (as denoted by the chas variable). Propose a model to investigate the relationship between the covariates and the chas variable. Explain how you would use the model and its output to provide evidence of the strength and confidence in the relationship.

Question 6: The realtor thinks that a tract on the Charles River increases median home values by $5,000. To test the realtor’s hypothesis, you created a linear regression model with chas as a covariate (Note that medv is in $1,000’s of dollars). The coeffecient associated with chas was 2.87 with a standard error of 0.86. Is there evidence to reject the realtor’s claim? Additionally, how would you defend this analysis with the realtor (who has no knowledge of linear models)?

Question 7: Predicting Home Value: A local realtor wants to use the data to accurate provide house pricing estimates for clients. Below is the code a data scientist used to fit a random forest model using the Boston Housing data. Critique the approach, point out any errors, and make recommendations for how it can be improved.

# This is R language

data = BostonHousing

# Normalize Variables to improve model performance vars_to_normalize = c('crim', 'zn', 'indus', 'nox', 'rm',

'rad', 'age', 'dis', 'medv') normalize = function(x){

return (x - mean(x, na.rm=TRUE))/sd(x, na.rm=TRUE)

}

for (var in vars_to_normalize){ data[[var]] = normalize(data[[var]])

}

set.seed(1337)

# runif produces a random number between 0 and 1 data["in_test"] = runif(nrow(data)) > .7

get_error = function(model, data){ data["prediction"] = predict(model, newdata=data)

return(mean(abs(data[["prediction"]] - data[["medv"]])))

}

# medv ~. is R shorthand for use medv as the response and all variables as covariates rf1 = randomForest(medv ~., data = data, ntree = 50)

rf2 = randomForest(medv ~., data = data, ntree = 100)

rf3 = randomForest(medv ~., data = data, ntree = 250) rf4 = randomForest(medv ~., data = data, ntree = 500) rf5 = randomForest(medv ~., data = data, ntree = 1000)

get_error(rf1, data)

> 0.9724985

get_error(rf2, data)

> 0.9945239

get_error(rf3, data)

> 0.9599333

get_error(rf4, data)

> 0.9648902

get_error(rf5, data)

> 0.9622595

 

rf31 = randomForest(medv ~., data = data, ntree = 250, mtry=3) rf32 = randomForest(medv ~., data = data, ntree = 250, mtry=4) rf33 = randomForest(medv ~., data = data, ntree = 250, mtry=5) rf34 = randomForest(medv ~., data = data, ntree = 250, mtry=6) rf35 = randomForest(medv ~., data = data, ntree = 250, mtry=7)

rf351 = randomForest(medv ~., data = data, ntree = 250, mtry=7, nodesize=3)

rf352 = randomForest(medv ~., data = data, ntree = 250, mtry=7, nodesize=5)

rf353 = randomForest(medv ~., data = data, ntree = 250, mtry=7, nodesize=7)

rf354 = randomForest(medv ~., data = data, ntree = 250, mtry=7, nodesize=9)

rf355 = randomForest(medv ~., data = data, ntree = 250, mtry=7, nodesize=11)

get_error(rf351, data)

> 0.822931

get_error(rf352, data)

> 0.8958512

get_error(rf353, data)

> 1.003511

get_error(rf354, data)

> 1.085806

get_error(rf355, data)

> 1.14492

# Final Error Metric:

get_error(rf351, data[data[["in_test"]],])

> 0.7603227

Question 8: What model performance metric was used in the model above to evaluate performance (Question 7)? Propose another function/way of measuring performance and discuss the benefits/drawbacks of each.

Question 9: If you could augment the Boston Housing data with external datasets, what additional data would you want to include, and how would you obtain and integrate it?

(5/5)
Attachments:

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme