logo Hurry, Grab up to 30% discount on the entire course
Order Now logo
471 Times Downloaded

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Basudev RoyMathematics
(/5)

666 Answers

Hire Me
expert
Wil AndersonEngineering
(5/5)

606 Answers

Hire Me
expert
Jason ParkerCriminology
(5/5)

769 Answers

Hire Me
expert
Vedparkash GuptaAccounting
(5/5)

725 Answers

Hire Me
Rapid Miner
(5/5)

goal of the Predictive Analytics Case Study is to predict whether it is likely to rain tomorrow or not based on previous weather conditions recorded

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Task 1 Predictive Analytics Case Study (40 Marks)

The goal of the Predictive Analytics Case Study is to predict whether it is likely to rain

tomorrow or not based on previous weather conditions recorded by 49 weather station

locations in the weatherAUS.csv data set provided (see Table 1 Data Dictionary for

weatherAUS.csv data set). You should review the data dictionary for weatherAUS.csv data

set. The Australian Weather dataset contains over 190,000 daily observations from January

2008 through to July 2021 from 49 Australian weather stations. The daily observations are

available from http://www.bom.gov.au/climate/data Bureau of Meteorology. Definitions for

each variable are adapted from http://www.bom.gov.au/climate/dwo/IDCJDW0000.shtml.

In completing Task 1 you will apply business understanding, data understanding, data

preparation, modelling and evaluation phases of the CRISP DM data mining process. It is

important that you understand this data set to complete Task 1 and four sub tasks.

Table 1 Data dictionary for weatherAUS.csv

Variable Name Data Type Description

Date Date Date of weather observation

Location Text Common name of the location of the weather station.

MinTemp Real Minimum temperature in degrees Celsius.

MaxTemp Real Maximum temperature in degrees Celsius.

Rainfall Real Amount of rainfall recorded for the day in mm.

Evaporation Real So-called Class A pan evaporation (mm) in the 24 hours to 9am.

Sunshine Real Number of hours of bright sunshine in the day.

WindGustDir Polynominal Direction of the strongest wind gust in the 24 hours to midnight.

WindGustSpeed Integer Speed (km/h) of the strongest wind gust in the 24 hours to midnight.

WindDir9am Polynominal Direction of wind at 9am

WindDir3pm Polynominal Direction of wind at 3pm

WindSpeed9am Integer Wind speed (km/hr) averaged over 10 minutes prior to 9am.

WindSpeed3pm Integer Wind speed (km/hr) averaged over 10 minutes prior to 3pm.

Humidity9am Integer Relative humidity (percent) at 9am.

Humidity3pm Integer Relative humidity (percent) at 3pm.

Pressure9am Real Atmospheric pressure (hpa) reduced to mean sea level at 9am.

Pressure3pm Real Atmospheric pressure (hpa) reduced to mean sea level at 3pm.

Cloud9am Integer Fraction of sky obscured by cloud at 9am. This is measured in "oktas", which are a unit of eighths.

It records how many eights of the sky are obscured by cloud. A 0 measure indicates completely clear sky whilst an 8 indicates that it is completely overcast.

Cloud3pm Integer Fraction of sky obscured by cloud (in "oktas": eighths) at 3pm. See Cloud9am for a description of the values.

Temp9am Real Temperature (degrees C) at 9am.

Temp3pm Real Temperature (degrees C) at 3pm.

RainToday Nominal Integer: Yes if precipitation (mm) in the 24 hours to 9am exceeds 1mm, otherwise No.

RISK_MM Real Amount of rain. A kind of measure of the "risk".

Rain Tomorrow Nominal Target variable. Did it rain tomorrow? Yes or No

 

1.1 Exploratory data analysis and date preparation Conduct an exploratory data

analysis and data preparation of weatherAUS.csv data set using RapidMiner to understand

the characteristics of each variable and relationship of each variable to other variables.

Summarise the findings of your exploratory data analysis and data preparation in terms

of describing key characteristics of each variable in the weatherAUS.csv data set such as

maximum, minimum values, average, standard deviation, most frequent values (mode),

missing values and invalid values etc and relationships with other variables, transformation

of existing variables, creation of new variables in a table named Task 1.1 Results of

Exploratory Data Analysis and Data Preparation.

Hint: Statistics Tab and Chart Tab in RapidMiner provide a lot of descriptive statistical

information and useful charts like Barcharts, Scatterplots required for Task 1.1 etc. You

might also like to look at running some correlations and/or chi square tests depending on

whether a variable is a categorical variable or a numeric variable. Indicate in Table 1.1 which

variables which contribute most to predicting whether it is likely to rain tomorrow or not.

You could also consider transforming some variables and creating new variables and

converting target/label variable into a binominal variable to facilitate analysis in Tasks 1.2,

1.3 and 1.4.

Briefly discuss the key findings of your exploratory data analysis and data preparation

and justification for variables most likely to predict whether it is likely to rain tomorrow or

not (10 marks 500 words).

 

1.2 Decision Tree Model Build a Decision Tree model for predicting whether it is likely

to rain tomorrow or not based on the weatherAUS.csv data set using RapidMiner and a set of

data mining operators in part determined by your exploratory data analysis in Task 1.1.

Provide these outputs from RapidMiner (1) Final Decision Tree Model process, (2) Final

Decision Tree diagram and (3) Decision tree rules. Briefly explain your final Decision Tree

Model Process, and discuss the results of the Final Decision Tree Model drawing on key

outputs (Decision Tree Diagram, Decision Tree Rules) for predicting whether it is likely to

rain tomorrow or not based on key contributing variables and relevant supporting literature

on interpretation of decision trees (10 marks 150 words).

 

1.3 Logistic Regression Model Build a Logistic Regression model for predicting

whether it is likely to rain tomorrow or not using RapidMiner and an appropriate set of data

mining operators and weatherAUS.csv data set determined in part by your exploratory data

analysis in Task 1.1. Provide these outputs from RapidMiner (1) Final Logistic Regression

Model process (2) Key outputs from Logistic Regression Model. Hint for Task 1.3 Logistic

Regression Model you may need to change data types of some variables. Briefly explain

your final Logistic Regression Model Process and discuss the results of the Final Logistic

Regression Model drawing on the key outputs (Coefficients, Standardised Coefficients, Odds

Ratios, P Values etc) for predicting whether it is likely to rain tomorrow or not based on key

contributing variables and relevant supporting literature on interpretation of logistic

regression models (10 marks 150 words).

1.4 Model Validation and Performance: You will need to validate your Final

Decision Tree Model and Final Logistic Regression Model using the Cross-Validation

Operator, Apply Model Operator and Performance Operator in your data mining processes.

Discuss and compare the performance of the Final Decision Tree Model with the Final

Logistic Regression Model for predicting whether it is likely to rain tomorrow or not based

on key results of the confusion matrix presented in Table 1.4 Model Performance Metrics

(Decision Tree vs Logistic Regression). Table 1.4 will compare the Final Decision Tree

Model with the Final Logistic Regression Model using following model performance metrics

– (1) accuracy (2) sensitivity (3) specificity and (4) F1 score (10 marks 200 words).

 

Note the important outputs from the data mining analyses conducted in RapidMiner for Task

1 must be included in your Assignment 3 report to provide support for your conclusions

reached regarding each analysis conducted for 1.1, 1.2, 1.3 and 1.4. Note you can export

important outputs from RapidMiner as jpg image files and include these screenshots in the

relevant Task 1 parts of your Assignment 3 Report.

 

Note you will find the North Text book and RapidMiner Tutorials useful references for the

data mining process activities conducted in Task 1 in relation to the exploratory data analysis

and data preparation, decision tree analysis, logistic regression analysis and evaluation of the

performance of the Final Decision Tree model and the Final Logistic Regression model.

These concepts are covered in Module RapidMiner Practicals and Chapters 3, 4, 9, 10 and 13

of North Textbook and RapidMiner Tutorials contained within RapidMiner.

 

Research and critically review the study materials and other relevant literature to provide a

suitable written response to each of the following tasks 2, 3 and 4 supported with an

appropriate level of in-text referencing:

 

 

 

 

Task 2 Sentiment Analysis (15 marks 500 words)

2.1 Define the concept Sentiment Analysis and explain how Sentiment Analysis relates to

text mining (7 Marks 250 words)

2.2 Identify and describe a widely used application area of sentiment analysis and explain

why sentiment analysis is used in this application: what business problem does sentiment

analysis address and how does it add value for an organisation and its customers: illustrating

your answer with a real-world example of the application of sentiment analysis by an

organisation (8 marks 250 words)

 

Task 3 Big Data Technologies 15 marks 500 words)

3.1 Identify and describe each of the three prominent big data technologies using diagrams

where appropriate (8 marks 250 words).

3.2 Explain the key role (s) that these three prominent big technologies play in managing big

data in an organisation including how these three big data technologies are interrelated and

integrated to achieve effective big data management (7 marks 250 words).

Task 4 Artificial Intelligence: automation and augmentation in workplace

and ethical considerations (20 marks 1000 words)

4.1 First, discuss how configurations of humans and artificial intelligence will evolve in the

workplace as organisations drive automation and augmentation through the adoption of

artificial intelligence (10 marks 500 words).

4.2 Second identify and discuss the ethical implications for organisations in relation to (1)

privacy (2) transparency (3) bias and discrimination and (4) governance and accountability

of using artificial intelligence to drive automation and augmentation in the workplace (10

marks 500 words).

 

(5/5)
Attachments:

Expert's Answer

471 Times Downloaded

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme