logo Hurry, Grab up to 30% discount on the entire course
Order Now logo

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Sean WoodComputer science
(5/5)

514 Answers

Hire Me
expert
Jacob CaleGeneral article writing
(5/5)

568 Answers

Hire Me
expert
Susanne ParkerNursing
(5/5)

610 Answers

Hire Me
expert
Minakshi AroraManagement
(5/5)

760 Answers

Hire Me
Rapid Miner
(5/5)

Cluster analysis has many useful applications. Here in this assignment we are going to try it on market segmentation

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

 Assignment 3

 

K-Means Cluster Analysis

Cluster analysis has many useful applications. Here in this assignment we are going to try it on market segmentation. Evgeniou (2015) has a very nice introduction in this topic. The scenario below and the dataset are both from his work. See the references section for his full article.

 

Scenario:

“The management team of a large shopping mall would like to understand the types of people who are, or could be, visiting their mall. They have good reasons to believe that there are a few different market segments, and they are considering designing and positioning the shopping mall services better in order to attract mainly a few profitable market segments, or to differentiate their services (e.g. invitations to events, discounts, etc.) across market segments.” (Evgeniou, 2016)

 

The dataset

The Market Research Survey Questions

V1: Shopping is fun (scale 1-7)

V2: Shopping is bad for your budget (scale 1-7)

V3: I combine shopping with eating out (scale 1-7)

V4: I try to get the best buys while shopping (scale 1-7)

V5: I don't care about shopping (scale 1-7)

V6: You can save lot of money by comparing prices (scale 1-7)

Income: the household income of the respondent (in dollars)

Mall.Visits: how often they visit the mall (scale 1-7)

Gender: m = male, f = female

 

Procedure:

We will follow Evgeniou’s 8-step approach to analyze the data. These steps will help you better understand the procedure to conduct a cluster analysis. If you are interested in how he explains these steps, please see his article cited in the references section at the end of this assignment instructions.

 

1. Confirm the data in metric.

The dataset that we are going to use is Data – Market Segmentation 3.xlsx, which is different from Evgeniou’s. Use Read Excel to read it. 

 

Cluster analysis can only handle numeric data. All columns in the dataset except ‘gender’ are numeric. Therefore, all we need to do is determine whether the non-numeric variable can be transformed into numeric. Fortunately, ‘gender’ is one of the variables that can be transformed easily. Let’s use Generate Attributes to convert ‘gender’ into 0’s (female) and 1’s (male). A screenshot is shown below. See also the discussion in class for details.

 

 

Define ID variable as an identifier using the Set Role operator. Select all variables except the original Gender column before moving on.

 

2. Decide whether to scale or standardize the data.

V1 – V6 are in the 7-point Likert scale, but income and Mall.Visits are not. If you remember our discussion in class, variables with a wide range of values (e.g., income) will affect the Euclidean distance calculation a lot more than those with smaller ranges. 

 

Here in this step you will normalize every variable to [0, 1]. Note: A variable specifically set with the identifier role will not be rescaled. Normalization will have no effect on it.

 

 

3. Decide which variables to use for clustering

We will use all variables except the original ‘gender’ variable. In real life, you will consider those variables that are relevant to your specific study. 

 

Note: It is still OK to include the ID variable, since we had specifically defined it with an identifier role in a previous step. A variable with this role will not be used in calculation and model building even if you include it.

 

4. Define similarity measures between observations

There are several distance/similarity measures available for K-means. Since Euclidean Distance is the most popular one, let’s just use it.

 

5. Visualize individual attributes and pair-wise distance between the observations

We could examine the histograms for this purpose, but you do not have to do this step for this assignment.

 

6. Select the clustering method and decide how many clusters to have

a. We will use K-Means for this assignment. Be sure to check the “use local random seed” parameter in the K-Means operator, and set “local random seed” to 1992.

b. Use DBI to determine the best K. Do the following:

i. Enter your data and draw the line chart. Show K = 2, … 10. 

ii. Q1: How did you select the best K? What is your best K and its DBI value? Answer this question in Assignment 4.docx. Be sure to include a screenshot of the DBI values. 

iii. Move your design to a sub-process and disable the sub-process.

 

7. Profile and interpret the clusters

a. Q2:Re-design your process to perform k-means with the best K from the previous section. All the data pre-processing should be the same. The random seed should be the same as well. Replace the Optimize Parameter operator with the actual k-means analysis.(Note that your results may not a little different from Evgeniou’s because of the normalization procedure that we used.)

i. A screenshot of the cluster centroids

ii. A screenshot of the line plot of all cluster centroids. 

 

iii. Q3: A paragraph or two to describe the difference variable-by-variable between the clusters of customers who do not like shopping. 

iv. Q4: Locate the cluster that represents the customers who love shopping. Describe the patterns that you see about these customers variable-by-variable. Provide a strategy to your manager to engage these customers. 

 

8. Assess the robustness of our clusters

a. Evgeniou recommends the following ideas. You do not have to do this part in the assignment, but it is a good idea to try out the ideas and see how the group membership changes.

i. using different subsets of the original data

ii. using variations of the original segmentation attributes

iii. using different distance metrics

iv. using different segmentation methods

v. using different numbers of clusters

 

 

 

What to submit

1. Rapidminer’s rmp file, and Assignment3.docx.

2. Assignment 3.docx. Make sure questions and your answers are cleared shown. If I have a hard time locating your answer(s), I will first take off 15% before grading your assignment. It is a good idea to include section headings and questions.

3. Hope you enjoy the real life data sets used in this assignment. Your learning should not stop here. See if you can uncover some additional insights using these datasets.

 

(5/5)
Attachments:

Related Questions

. The fundamental operations of create, read, update, and delete (CRUD) in either Python or Java

CS 340 Milestone One Guidelines and Rubric  Overview: For this assignment, you will implement the fundamental operations of create, read, update,

. Develop a program to emulate a purchase transaction at a retail store. This  program will have two classes, a LineItem class and a Transaction class

Retail Transaction Programming Project  Project Requirements:  Develop a program to emulate a purchase transaction at a retail store. This

. The following program contains five errors. Identify the errors and fix them

7COM1028   Secure Systems Programming   Referral Coursework: Secure

. Accepts the following from a user: Item Name Item Quantity Item Price Allows the user to create a file to store the sales receipt contents

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

. The final project will encompass developing a web service using a software stack and implementing an industry-standard interface. Regardless of whether you choose to pursue application development goals as a pure developer or as a software engineer

CS 340 Final Project Guidelines and Rubric  Overview The final project will encompass developing a web service using a software stack and impleme