Task 1 Data Quality (45 Marks)
1.1 Define the concept of data quality and discuss the key dimensions of data quality (30
marks 1000 words)
1.2 Explain why data quality is so important for effective predictive analytics in an
organisation, drawing on a real world example (15 marks 500 words)
Task 2 Exploratory Data Analysis and Linear Regression Analysis (45 Marks)
Carefully study the Data Dictionary for Boston Housing Data Set (Table 1) and
accompanying description of each variable. It is important to understand this data set used
for Task 2. Each record in the housing.csv data set describes a Boston suburb or town.
Data was drawn from Boston Standard Metropolitan Statistical Area (SMSA) in 1970.
Note: You should conduct some desktop research to identify determinates/drivers of
Housing prices in order to fully understand and interpret the key findings of your
exploratory data analysis (EDA) and Linear Regression Model for the housing.csv data set
for Task 2.
Task 2.1) Conduct and report on exploratory data analysis (EDA) of the housing.csv data
set using RapidMiner Studio data mining tool. Note this will require use of a number of
RapidMiner operators
Provide following for Task 2.1:
(i) a screen capture of your final EDA process, briefly describe your EDA process
(ii) summarise key results of your exploratory data analysis in Table 2.1 Results of
Exploratory Data Analysis for housing.csv. Table 2.1 should include key
characteristics of each variable in housing.csv set such as maximum, minimum
values, average, standard deviation, most frequent values (mode), missing values
and invalid values etc.
(iii) Discuss key results of exploratory data analysis presented in Table 2.1 and provide
a rationale for selecting top 5 variables for predicting median house value (medv),
in particular focusing on the relationships of independent variables with each other
and with dependent variable median house value (medv) drawing on results of
EDA analysis and relevant literature on determinates of house prices
(25 marks 300 words)
Hint: Statistics Tab and Chart Tab in RapidMiner Studio provide a lot of descriptive
statistical information and the ability to create useful charts like Barcharts, Scatterplots,
Boxplot charts etc for EDA analysis. You might also like to look at running correlations
and/or chi square tests as appropriate to determine which variables contribute most to
predicting median house value (medv).
Task 2.2) Build and report on your Linear Regression model for predicting medv using
RapidMiner data mining process and appropriate set of data mining operators and a reduced
set of variables from housing.csv data set as determined by your exploratory data analysis in
Task 2.1.
Provide the following for Task 2.2:
(i) A screen capture of Final Linear Regression Model process and briefly describe
your Final Linear Regression Model process
(ii) Table 2.2 named Results of Final Linear Regression Model for Task 2.2 for
housing.csv data set.
(iii) Discuss the results of Final Linear Regression Model for housing.csv data set
drawing on key outputs (coefficients, standardised coefficients, t-statistics values,
p-values and significance levels etc) for predicting median house value (medv)
and relevant supporting literature on interpretation of a Linear Regression Model.
(20 marks 200 words)
Include all appropriate outputs such as RapidMiner Processes, Graphs and Tables that
support key aspects of exploratory data analysis and linear regression model analysis of
the housing.csv data set in your Report 2.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme