Introduction
In this assignment you will perform data analysis through a basic, unsupervised machine learning approach known as “k-means clustering”. k-means clustering is a way to group objects together that are somehow similar to each other. It is a method that allows you to make inferences about the data based on locality and requires no prior knowledge of correct outcomes. For example, consider the outbreak of the COVID-19 virus. It may help to think of the initial inputs given below as the locations of 4 individuals who are known to be infected. This week we will perform clustering analysis to determine from whom the 96 other patients are most likely to have contracted the virus based on proximity.
Skills: lists, tuples, file I/O, control structures, string functions
K-means Clustering
Most often you will begin with a set of points of size N. The ‘k’ in k-means refers to the number of clusters into which you would like to partition the N points. Here’s an example of 2-means clustering:
The algorithm for performing k-means clustering, with k = 2, is as follows:
1. Select (usually at random) k points to serves as “centroids” (cluster centers)
2. For each point in the dataset, determine which of the k centroids the point is closest to by finding the Euclidean distance and add it to that cluster
3. Compute the new centroid for each of the k clusters by calculating the mean point (hence the
name “k-means”)
a. meanx = sum(x-values for all points)/number of points in cluster
b. meany = sum(y-values for all points)/number of points in cluster
c. centroidnew = meanx, meany
4. Re-cluster by repeating steps 2 and 3 until convergence is achieved
a. When updating the centroids by calculating the cluster means in Step 3, it is possible for points to switch clusters when we repeat Step 2 again; that’s perfectly fine
b. We iterate through the process until no more points switch clusters. When we reach this final state we say we have achieved “convergence” or “stability” and no further iteration is needed as it will not improve the accuracy of the clustering
c. Once stable, we can measure a cluster’s accuracy by calculating the mean distance of
each point in a cluster to its centroid
5. Repeat the entire process of Steps 1-4 multiple times selecting a new, random pair of initial centroids each time. We can then compare the accuracy of each iteration to determine which is most accurate if desired.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme