Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Drop Files Here Or Click to Upload

Or Get Complete Course Help

Angel MoralesLaw

(5/5)

800 Answers

Hire Me

Malachi StoneStatistics

(5/5)

502 Answers

Hire Me

Aashi NagpalOthers

(5/5)

812 Answers

Hire Me

Faith BrownComputer science

(5/5)

579 Answers

Hire Me

SPSS

(5/5)

Better World Shopping mall is a shopping center that specifically caters to the apparel needs of the urban area residents.

INSTRUCTIONS TO CANDIDATES

ANSWER ALL QUESTIONS

Question 1

Better World Shopping mall (BWSM) is a shopping center that specifically caters to the apparel needs of the urban area residents. Since last year, its revenue has been declining and many retail shops have decided to move out from the mall. The management of BWSM wants to study the amount of money that their customers would spend on shopping and divides them into groups for promotion. The management decided to use clustering to better profile their customers according to their demographics and spending power. You are now given a dataset (BWSM.csv) as shown in Table 1 to help BWSM to do this data mining project.

Table 1. Description of BWSM.csv

Attribute	Description	Labels/Values
CustomerID	Unique identifier of the customer	Unique code
Gender	Gender of the customer	“M” for Male / “F” for Female
Education	Education level of the customer	“1” for High school or below / “2” for Bachelor’s degree / “3” for Master’s degree or above
Age	Age of the customer	Integer measurement
Income	Annual Income of the customer	Dollar measurement
Household	Household type of the customer	“1” for Single / “2” for Couple / “3” for Family with children / “4” for Extended family (i.e. children and grandparents) / “5” for Others
VisitFrequency	How many times the customer visits BWSM per month	Integer measurement
AvgSpent	The average amount the customer spent in the BWSM per visit	Dollar measurement

(a) With reference to the CRISP-DM framework, discuss how you plan to carry out this data mining project. (24 marks)

(b) Analyse the data based on the summary statistics given in Table 2.

Table 2. Summary statistics of the attributes

(i) Explain if there is a need to perform data transformation. (4 marks)

(ii) Describe a scenario where z-score normalization is preferable to min-max normalisation. In your answer, differentiate these two (2) categories of data normalisation techniques. (4 marks)

(c) Assume that you have built two clustering models, Model A and Model B. The details of each model are given in Table 3. In each model, you are able to clearly describe the profile of each cluster. Based on Table 3, identify the model that you believe is better for deployment. Defend your choice by providing good reasons.

Table 3. Description of Model A and Model B

Description	Model A	Model B
Number of clusters	3	5
Number of clustering criteria	4	4
Ease of interpretation of the profile of each cluster	High	High
Average Silhouette coefficient	0.75	0.79
Size of each cluster	Cluster 1: 23% Cluster 2: 46% Cluster 3: 31%	Cluster 1: 17% Cluster 2: 25% Cluster 3: 26% Cluster 4: 20% Cluster 5: 12%

Question 2

You are a data scientist in the Quality Control Department of a wine production company. You would like to understand the factors affecting the wine quality by developing a classification tree that can predict whether the wine is of “Low quality” or “High quality”. A dataset related to a particular type of white wine produced by your company was collected. The number of instances in the white wine samples are 4898.

In the dataset (winequality-white.csv), there are 11 attributes related to physicochemical properties of the wine and 1 attribute “Quality” indicating the quality of the wine. Table 4 shows the attributes in the winequality-white.csv and the range of each attribute.

Table 4. Description of winequality-white.csv

Attribute (unit)	Range
fixed acidity (g(tartaric acid)/dm3)	3.8-14.2
volatile acidity (g(acetic acid)/dm3)	0.1-1.1
citric acid (g/dm3)	0-1.7
residual sugar (g/dm3)	0.6-65.8
chlorides (g(sodium chloride)/dm3)	0.01-0.35
free sulfur dioxide (mg/dm3)	2-289
total sulfur dioxide (mg/dm3)	9-440
density (g/cm3)	0.987-1.039
pH	2.7-3.8
sulphates (g(potassium sulphate)/dm3)	0.2-1.1
alcohol (vol.%)	8.0-14.2
quality	3-9

a) The quality of wine is initially determined by 30 wine experts in a scale that ranges from 0 (bad) to 10 (excellent) using blind wine tasting. The attribute “quality” in the dataset is the final wine quality score based on the 30 scores. Discuss whether the attribute “quality” should be the “mode”, “mean” or “median” of the scores given by the wine experts. Provide an explanation to support your answer. (6 marks)

(b) Based on the attribute “quality”, you decided to perform binning in the IBM SPSS Modeler to categorise the wine quality into two classes: Low Quality and High Quality. After the Binning node has been executed, there is a new attribute “quality_BIN” created with two bin values: “1” and “2”. Figure 1 shows the setting in the Binning Node. Discuss the purpose of binning and describe the meaning of the two bins in this context.

(5/5)

Hurry, Grab up to 30% discount on the entire course

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Angel MoralesLaw

Malachi StoneStatistics

Aashi NagpalOthers

Faith BrownComputer science

SPSS

Better World Shopping mall is a shopping center that specifically caters to the apparel needs of the urban area residents.

ANSWER ALL QUESTIONS

Attachments:

Instructions Files

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

Other Services

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Angel MoralesLaw

Malachi StoneStatistics

Aashi NagpalOthers

Faith BrownComputer science

SPSS

Better World Shopping mall is a shopping center that specifically caters to the apparel needs of the urban area residents.

ANSWER ALL QUESTIONS

Attachments:

Instructions Files

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class

. The following program contains five errors. Identify the errors and fix them

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer