PROBLEMS
6.1 Predicting Boston Housing Prices. The file BostonHousing.xls contains in- formation collected by the US Bureau of the Census concerning housing in the area of Boston, Massachusetts. The dataset includes information on 506 census housing tracts in the Boston area. The goal is to predict the median house price in new tracts based on information such as crime rate, pollution, and number of rooms. The dataset contains 12 predictors, and the response is the median house price (MEDV). Table 6.3 describes each of the predictors and the response.
TABLE 6.3
CRIM
ZN
INDUS
CHAS
NOX
RM
AGE
DIS
RAD
TAX
PTRATIO
LSTAT MEDV
DESCRIPTION OF VARIABLES FOR BOSTON HOUSING EXAMPLE
Per capita crime rate by town
Proportion of residential land zoned for lots over 25,000 ft2
Proportion of nonretail business acres per town
Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) Nitric oxide concentration (parts per 10 million)
Average number of rooms per dwelling
Proportion of owner-occupied units built prior to 1940
Weighted distances to five Boston employment centers
Index of accessibility to radial highways
Full-value property-tax rate per $10,000 Pupil/teacher ratio by town
% Lower status of the population
Median value of owner-occupied homes in $1000s
a. Why should the data be partitioned into training and validation sets? What will the training set be used for? What will the validation set be used for?
b. Fit a multiple linear regression model to the median house price (MEDV) as a function of CRIM, CHAS, and RM. Write the equation for predicting the median house price from the predictors in the model.
c. Using the estimated regression model, what median house price is predicted for a tract in the Boston area that does not bound the Charles River, has a crime rate of 0.1, and where the average number of rooms per house is 6? What is the prediction error?
d. Reduce the number of predictors:
i. Which predictors are likely to be measuring the same thing among the 13 predictors? Discuss the relationships among INDUS, NOX, and TAX.
ii. Compute the correlation table for the 12 numerical predictors and search for highly correlated pairs. These have potential redundancy and can cause multi- collinearity. Choose which ones to remove based on this table.
iii. Use an exhaustive search to reduce the remaining predictors as follows: First, choose the top three models. Then run each of these models separately on the training set, and compare their predictive accuracy for the validation set. Compare RMSE and average error, as well as lift charts. Finally, describe the
best model.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme