Instructions
This PDF contains a long form version of the questions, as well as a data appendix which is required to answer a few of the questions. No outside data is needed to answer these questions.
1 VISUALIZING DATA
This section assesses the ability to interpret and communicate insights. The Boston Housing data in the appendix was used to make the visuals in this section.
Question 1: You are asked to create reports containing visualizations of the Boston Housing data. What tools, software, packages, and/or libraries would you use to create an interactive report? What if the document is required to be printed?
Question 2: Business leaders who will review this report want to understand median home values, Charles River proximity, and the relationship between the two. The following four visuals are considered. Which do you recommend to include in the report and why? (Remember to refer to the appendix for data details.)
Question 3: What changes would you make to the visual you chose in the previous question to make it more interpretable and visually appealing? In your report to business leaders, how would you describe the plot in one sentence?
Modeling Data
This section assesses the ability to think critically about variables and how they can be used to predict a desired outcome. Core competencies include understanding distributions in data, making appropriate data transformations, and selecting an appropriate model. The Boston Housing Data Exploration in the Appendix should be used for this section.
Question 4: Predicting River Proximity: Using the Boston Housing data, you want to predict which tracts are adjacent to the Charles river (as denoted by the chas variable). Based on the data summary in the appendix, what data cleaning, transformations, and/or feature engineering can be used to prepare the data prior to model training?
Question 5: Predicting River Proximity: Using the Boston Housing data, you want to predict which tracts are adjacent to the Charles river (as denoted by the chas variable). Propose a model to investigate the relationship between the covariates and the chas variable. Explain how you would use the model and its output to provide evidence of the strength and confidence in the relationship.
Question 6: The realtor thinks that a tract on the Charles River increases median home values by $5,000. To test the realtor’s hypothesis, you created a linear regression model with chas as a covariate (Note that medv is in $1,000’s of dollars). The coeffecient associated with chas was 2.87 with a standard error of 0.86. Is there evidence to reject the realtor’s claim? Additionally, how would you defend this analysis with the realtor (who has no knowledge of linear models)?
Question 7: Predicting Home Value: A local realtor wants to use the data to accurate provide house pricing estimates for clients. Below is the code a data scientist used to fit a random forest model using the Boston Housing data. Critique the approach, point out any errors, and make recommendations for how it can be improved.
Question 8: What model performance metric was used in the model above to evaluate performance (Question 7)? Propose another function/way of measuring performance and discuss the benefits/drawbacks of each.
Question 9: If you could augment the Boston Housing data with external datasets, what additional data would you want to include, and how would you obtain and integrate it?
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme