This assignment is based on the R code, RStudio notebook, and discussion in the live session for this module. It does, however, use different data, so please make note of that. We have included that data (which, again, is different from the in-class lecture), a data description sheet (which is different from the in-class lecture), and the in-class RMD file (which is the same as the in-class lecture). We recommend that you attend or watch the live session recording for this module before attempting this assignment. Data: “city.rds Download city. rds” (This is not the same data we used in class.)
Data Description Sheet: “Data Description_8HE_for the assignment only.xlsx Download Data Description_8HE_for the assignment only.xlsx” (this is NANSE data, but it is aggregated differently from that in module 7 and that discussed in class) In-Class RMD Notebook: “In-Class Notebook Mod8HE KMeans DBSCAN_v02_s.rmd Download In-Class Notebook Mod8HE KMeans DBSCAN_v02_s.rmd” NANSE would like to get a better sense of how its stores in different parts of the country behave. In particular, they wonder if they could organize their cities into groups based upon the sales history of the stores in those cities.
That is, they would like to get a better understanding of the geographical sales patterns at the city level. Luckily, they have a data set that lists each city as an observation and multiple features for each city such as the average size of each store in that city, the proportion of weeks each store in that city has had above median profit for a week, the average proportion of the top 20 products that the stores sell, the average proportion of products sold in different velocity groups, and the average proportion of products sold on promotion. Your job is to use the two clustering algorithms we have learned to cluster these cities into similar groups. While you can simply reuse most of the code from class without alteration, this is a new dataset and you do have to make a few adjustments. In particular, before you run both the K-means analysis and the DBSCAN analysis, read the new dataset (“city.rds”) in and call it ‘clustering_input1’. Next, create a new dataset called ‘clustering_input2’ by removing the columns ‘city’, ‘region’, and ‘province’ from ‘clustering_input1’. Finally, use the number 77 for each ‘set.seed (77)’ function. Suggestion: While you could just cut and paste the code used in the high-engagement session for this module, we encourage you to write all of the code from scratch. This is a technique we often use when looking up and using new code we are borrowing from someone else. Instructions* (2 points) How many clusters do you want to use for the K-means algorithm for this data? Play around a bit to make your decision. Justify your answer with evidence from your analysis. Specifically, discuss the following: the results from the “wss” and the “silhouette” methods of finding the optimal number of clusters, and the size of clusters, and the two-dimensional cluster plot. There is no right answer here, but clearly defend your responses. (3 points) Regardless of your answer above, use the following code for the ‘kmeans()’ function to answer the rest of the questions about K-means: ‘kmeans (clustering_input2, centers=6, iter.max=10, nstart=10)’. Using K-means, print the cluster centers using 'clusters$centers' and name each of your clusters and explain how you came up with each name.
As you do this, remember that the amounts you see are centered and standardized, so, focus on their sign and size without going into more detail. For example, a title might say something like, “Large; profitable; diverse products; chips and water”. Next, place those cluster names into the 'clustering_input1' dataset using the code provided in class. Which cluster groups together the cities that are the largest and most profitable? Provide the cluster number in your response.
For example, your response might be "Cluster 3". Provide one recommendation to NANSE based upon this analysis.
(1 point) After adding your cluster names to 'clustering_input1' above in 2a, look at the dataset and describe what you learn about the cities of Trenton and Parry Sound. We did not standardize the data in 'clustering_input1', so this data is easier to interpret following the data description sheet.
(1 points) Run the DBSCAN algorithm using epsilon = 5 and minPts=4. Print the size of the clusters that this creates. How many clusters are there and how many cities are in each cluster?
(2 points) How does your solution using K-means compare to that using DBSCAN? Specifically, answer the following: Which is more helpful and why? Which method would you suggest that the management of NANSE use and why? How do the methods deal with outliers differently?
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme