Predicting Home Values in the West Roxbury Neighborhood Textbook Section 2.6 describes an end-to-end example of building a predictive model to predict home value prices in the West Roxbury neighborhood using multiple linear regression. It is repeated here verbatim for completeness and convenience. The data file WestRoxbury.csv is provided to you. The Internet has revolutionized the real estate industry. Realtors now list houses and their prices on the web and estimates of house and condominium prices have become widely available, even for units not on the market. Zillow is the most popular online real estate information site , and in 2014 they purchased their major rival, Trulia. By 2015, Zillow had become the dominant platform for checking house prices and, as such, the dominant online advertising venue for realtors. What used to be a comfortable 6% commission structure for realtors, affording them a handsome surplus (and an oversupply of realtors), was being rapidly eroded by an increasing need to pay for advertising on Zillow. (This, in fact, is the key to Zillow's business model—redirecting the 6% commission away from realtors and to itself.) Zillow gets much of the data for its “Zestimates” of home values directly from publicly available city housing data, used to estimate property values for tax assessment. A competitor seeking to get into the market would likely take the same approach. So would realtors seeking to develop an alternative to Zillow. A simple approach would be a naive, model-less method—just use the assessed values as determined by the city. Those values, however, do not necessarily include all properties, and they might not include changes warranted by remodeling, additions, and the like. Moreover, the assessment methods used by cities may not be transparent or always reflect true market values. However, the city property data can be used as a starting point to build a model, to which additional data (e.g., that collected by large realtors) can be added later. Let's look at how Boston property assessment data, available from the city of Boston, might be used to predict home values. The data in “WestRoxbury.csv” file includes information on single family owner-occupied homes in West Roxbury, a neighborhood in southwest Boston, in 2014. The data include values for various predictor variables and for a target‐assessed home value (“total value”). This dataset has 14 variables and a description of each variable is given in the table below (the full data dictionary provided by the City of Boston is available here; we have modified a few variable names.) TOTAL VALUE: Total assessed value for property, in thousands of USD TAX: Tax bill amount based on total assessed value multiplied by the tax rate, in USD LOT SQ FT: Total lot size of parcel in square feet YR BUILT: Year property was built GROSS AREA: Gross floor area LIVING AREA: Total living area for residential properties (ft2) FLOORS: Number of floors ROOMS: Total number of rooms BEDROOMS: Total number of bedrooms FULL BATH: Total number of full baths HALF BATH: Total number of half baths KITCHEN: Total number of kitchens FIREPLACE: Total number of fireplaces REMODEL: When house was remodeled (Recent/Old/None) The dataset includes 5802 homes. A sample of the data is shown in the table below. Below the header row, each row in the data represents a home. For example, the first home was assessed at a total value of $344.2 thousand (TOTAL VALUE). Its tax bill was $4330. It has a lot size of 9965 square feet (ft2), was built in year 1880, has two floors, 6 rooms, and so on. An analysis pipeline is demonstrated in the video in Week 2 folder. Note that you have been provided a .csv file as opposed to a .xlsx file shown in the video demonstration (It should not make a difference in importing the data but the prompts will be slightly different). Instead of following the exact steps in that video, I provided some modified steps below. I recommend making a copy of the RapidMiner process “06-06-ToyotaCorolla Linear Stepwise Regression.rmp” in Week 4 folder and then make the following changes in that process: • Import the data (WestRoxbury.csv) in your repository and then retrieve it in your process. • Change the Data Preprocessing steps as follows: o Use the Filter Examples operator (use invert filter option) to remove rows matching the “YR BUILT = 0” condition. o Use the Generate ID operator to add an id attribute. No parameters needed. o Use the Set Role operator to set the role of TOTAL VALUE attribute to label and the role of the id attribute to id. o Use the Generate Attributes operator to add a new attribute AGE with the following function description: date_get(date_now), (DATE_UNIT_YEAR)-[YR BUILT] o Use the Select Attributes operator to select a subset of attributes: all attributes except TAX and YR BUILT. Make sure to check the box “include special attributes”. We are removing YR BUILT since we have now added the AGE attribute. Also, we are removing TAX because tax is estimated based on the TOTAL VALUE, which is to be predicted, so including TAX would results in a target leakage. o Use the Nominal to Numerical operator with the dummy coding option for the REMODEL attribute. Select the “use comparison groups” option as well and the comparison group for REMODEL should be set to None (by typing it in). The reason for this is that knowing the variables for categories “Old” and “New”, the algorithm can infer the value of the “None” variable (e.g., when both “Old” and “New” variables have values 0, it implies “None” variable would have the value 1.). To avoid this collinearity in our data, we use one of the categories (“None” in our case) as the comparison group. See the screenshot below for the settings. • In the “06-06-ToyotaCorolla Linear Stepwise Regression.rmp” process, Filter Example Range operator is used to select the first 1000 rows. Remove this operator. • In the Generate Attributes operator, Residual should have the following functional formula: [TOTAL VALUE ] - [prediction(TOTAL VALUE )] • In the “Calc Pred Metrics (V)” and “Gains and Lift” subprocesses, modify the Rename operator. Select the old name to be prediction(TOTAL VALUE ) for the new name pred, and select the old name to be TOTAL VALUE for the new name label. • In the Linear Regression operator, make sure that the backward and forward alpha are both set to 0.1. With these settings, you are using the variable selection strategy of “stepwise regression” with the Iterative T-test feature selection option in the Linear Regression operator. Report the following results in your response 1. Based on your validation results, provide screenshots of Data tab of the “Linear Regression” results tab. 2. Write the linear regression model (equation) generated that includes the intercept and the coefficients for all predictors using the Description section of the “Linear Regression” results tab. 3. Provide and interpret each of the following metrics: root mean squared error (RMSE), absolute error (MAE), relative error (MAPE), squared error (MSE), squared correlation (R2), mean percent error (MPE), and mean error. 4. Plot the lift chart using a line plot with the following settings, using the ExampleSet obtained from the Gains and Lift computations. This should be similar to the lift chart shown in the textbook Figure 5.3 (left subfigure). 5. Plot the decile lift chart using a bar plot with the following settings, using the ExampleSet obtained from the Gains and Lift computations. This should be similar to the lift chart shown in the textbook Figure 5.3 (right subfigure). 6. Reading the lift chart subsection in Section 5.2 of your textbook, interpret the charts obtained in parts (5) and (6) above.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme