logo Hurry, Grab up to 30% discount on the entire course
Order Now logo
483 Times Downloaded

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Fifi BoxNursing
(5/5)

677 Answers

Hire Me
expert
Umar AkbarScience
(/5)

683 Answers

Hire Me
expert
Neil BissonnetteeCriminology
(5/5)

773 Answers

Hire Me
expert
Chase CruzEducation
(5/5)

792 Answers

Hire Me
Rapid Miner
(5/5)

Define the concept of data quality and discuss the key dimensions of data quality

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Task 1 Data Quality (45 Marks)

 

1.1 Define the concept of data quality and discuss the key dimensions of data quality (30

marks 1000 words)

1.2 Explain why data quality is so important for effective predictive analytics in an

organisation, drawing on a real world example (15 marks 500 words)

 

Task 2 Exploratory Data Analysis and Linear Regression Analysis (45 Marks)

 

Carefully study the Data Dictionary for Boston Housing Data Set (Table 1) and

accompanying description of each variable. It is important to understand this data set used

for Task 2. Each record in the housing.csv data set describes a Boston suburb or town.

Data was drawn from Boston Standard Metropolitan Statistical Area (SMSA) in 1970.

 

 

 

Note: You should conduct some desktop research to identify determinates/drivers of

Housing prices in order to fully understand and interpret the key findings of your

exploratory data analysis (EDA) and Linear Regression Model for the housing.csv data set

for Task 2.

 

Task 2.1) Conduct and report on exploratory data analysis (EDA) of the housing.csv data

set using RapidMiner Studio data mining tool. Note this will require use of a number of

RapidMiner operators

 

Provide following for Task 2.1:

 

(i) a screen capture of your final EDA process, briefly describe your EDA process

(ii) summarise key results of your exploratory data analysis in Table 2.1 Results of

Exploratory Data Analysis for housing.csv. Table 2.1 should include key

characteristics of each variable in housing.csv set such as maximum, minimum

values, average, standard deviation, most frequent values (mode), missing values

and invalid values etc.

(iii) Discuss key results of exploratory data analysis presented in Table 2.1 and provide

a rationale for selecting top 5 variables for predicting median house value (medv),

in particular focusing on the relationships of independent variables with each other

and with dependent variable median house value (medv) drawing on results of

EDA analysis and relevant literature on determinates of house prices

(25 marks 300 words)

Hint: Statistics Tab and Chart Tab in RapidMiner Studio provide a lot of descriptive

statistical information and the ability to create useful charts like Barcharts, Scatterplots,

Boxplot charts etc for EDA analysis. You might also like to look at running correlations

and/or chi square tests as appropriate to determine which variables contribute most to

predicting median house value (medv).

 

Task 2.2) Build and report on your Linear Regression model for predicting medv using

RapidMiner data mining process and appropriate set of data mining operators and a reduced

set of variables from housing.csv data set as determined by your exploratory data analysis in

Task 2.1.

 

Provide the following for Task 2.2:

 

(i) A screen capture of Final Linear Regression Model process and briefly describe

your Final Linear Regression Model process

(ii) Table 2.2 named Results of Final Linear Regression Model for Task 2.2 for

housing.csv data set.

(iii) Discuss the results of Final Linear Regression Model for housing.csv data set

drawing on key outputs (coefficients, standardised coefficients, t-statistics values,

p-values and significance levels etc) for predicting median house value (medv)

and relevant supporting literature on interpretation of a Linear Regression Model.

(20 marks 200 words)

Include all appropriate outputs such as RapidMiner Processes, Graphs and Tables that

support key aspects of exploratory data analysis and linear regression model analysis of

the housing.csv data set in your Report 2.

(5/5)
Attachments:

Expert's Answer

483 Times Downloaded

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme