(5/5)

Hire Me
(5/5)

Hire Me
(5/5)

Hire Me
(5/5)

Hire Me
(5/5)

# Describe the dataset in terms of rows, columns, types of data and any outliers and missing data

INSTRUCTIONS TO CANDIDATES

Instructions.

Please make sure if you use R you copy and paste it into Word using Courier Font (makes it easier to Read). For each of the problems that are looking for a response (not just a calculation), be sure to explain and interpret the results. If you aren’t sure…ASK PLEASE. Please start each question on a new page and clearly label that start of each problem (Maybe slightly larger font, bold face , underline… ) anything that will help me find the problem you are working on.

1. Sustainability and Energy

The Department of Energy has launched new initiatives around sustainability. The aim is to identify different groups of houses to identify the factors that should lead to reduced energy. However, the Director of the department is finding it difficult because they are using Excel , and so you have been asked to assist.

Using the dataset thads2013n.txt which has a tremendous amount of information, your goal is to answer the following questions below. (The pdf file in the assignment sections has the definitions of the data set - YOU WILL NEED IT).

a) Describe the dataset in terms of rows, columns, types of data and any outliers and missing data … the usual.

b) Clean the data - describe what you did to clean the data

c) Create a set number of groups of “housing” observations.

i) Determine which variables will you then cluster on. Remember we are focused mainly on the energy cost (UTILITY VARIABLE).

ii) Conduct cluster analyses using two agglomerative methods and a

k-means cluster. How many clusters do you settle on using each method. Why? Provide the necessary charts.

iii) Define how you value or discern each cluster.

d) Create a new variable in your dataset which identifies which observation is within which cluster (k-means only), then provide measures for each cluster on three

variables (UTILITY, TOTALSAL, ZINC2 - know what they are for your assessment, don’t just give me the variable names).

e) Conduct an analysis between each group, i.e. each cluster to determine if there is a statistically significant difference in the UTILITY, TOTALSAL and ZINC2)

(5/5)

## Related Questions

##### . The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

##### . Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

##### . The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

##### . Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

##### . The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme

Get Free Quote!

388 Experts Online