problem 1. Cluster Analysis
Instructions:
• Download the data set bank_direct_marketing from Canvas, and save the input file in your data folder
• In SAS Enterprise Miner, build a cluster analysis model and answer the questions below where you see [your answer].
• Saving the output for submission - in the Cluster node’s results, maximize the Output window, select File -> Save as
Bank_direct_marketing dataset
Demographic:
education
customer_id
age
marital
job
Other information:
Default – default? (yes/no)
Balance – checking account balance
housing – having a mortgage account (yes/no)
loan – having a loan account (yes/no)
Promotion:
Contact – the channel salesperson contacted the customer
Day – contact date
Month – contact month
Duration – number of days since the last contact
Campaign – number of campaigns
pdays – see the detail below
previous – number of previous campaigns
poutcome – the previous outcome
y – currently having an active promotion of some sort (yes/no)
pdays number of days that passed by after the client was last contacted from a previous campaign (numeric, -1 means client was not previously contacted).” This is a combination of a continuous variable and a categorical variable. He consulted with the domain expert on the team and she suggested that the variable be binned as follows:
Bank Direct Marketing Cluster Analysis
Question: Do subgroups of customers exist in the bank direct marketing data set? If so, what are the best ways to describe them? What business recommendations can you make based on the results?
1. Create a new diagram in your project. You can open an existing project or create a new one. Name the diagram Bank Clustering.
2. Add a new library if you created a new project. If the library of your existing project point to the data folder that contain the data set bank_direct_marketing, you don’t need to add a new library.
3. Use the bank_direct_marketing data as a data source for this clustering and profiling exercise.
4. Determine whether the model roles and measurement levels assigned to the variables are appropriate.
• Select the ellipsis next to Variables in the Properties panel.
• Sort the variables by level by clicking the Level column.
• Select all the continuous variables, except customer_id.
• Click the Explore button and inspect the distributions.
• Click the plot containing the distribution for balance to select it.
• Right-click in the same plot and select Graph Properties Number of X Bins and change the value to 100. Click OK.
• Repeat for all the other histograms, steps e & f. (Age is displayed as a bar chart.). Note: The three most heavily skewed distributions are for balance, campaign, and previous. Although not optimal, we could reduce the skewness of the distributions by taking the log of the variable.
5. Drag a Transform Variables (in Modify tab) node onto the diagram and connect it to the Input Data source. Apply a log transformation to the following variables:
• balance
• previous
• campaign
Here is the how-to:
1. Select the ellipsis next to Variables.
2. Change Method to Log for balance, previous, and campaign.
3. Run the node and do not view the results.
6. Connect a Cluster node to the Transform Variables node.
7. Change Maximum Number of Clusters to 6.
8. Change Use for all the variables to No, except for these:
• balance
• previous
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme