Problem 6: [30pts] Dynamic Programming for k-Mean Square Clustering

Let X = (x1, x2, . . . , xn) be a sequence of reals.

The problem is to partition X into k non-overlapping subsequences so as to minimize the sum of the total mean square error over all of the subsequences.

This is a common first-step in approximating large data sequences.

As an example, the diagram below shows two ways of partitioning the same sequence of 8 items into three subsequences. The one on the right has a smaller mean-square error cost. In fact, the partition on the right provides the minimum cost over all possible 3-partitions

i |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |

xi |
6 |
0 |
-6 |
0 |
-6 |
12 |
-6 |
6 |

I1 = (0, 3, 6, 8)

x¯1,3 = 0 x¯4,6 = 2 x¯7,8 = 0

MSC(1, 3) = 62 + 02 + 62 = 72

MSC(4, 6) = 22 + 82 + 102 = 168

MSC(7, 8) = 62 + 62 = 72

M3(I1) = 72 + 168 + 72 = 312

The formal definitions are below:

- A k-partition of X is a partition of the sequence into k subsequences via index list I = (i0, i1, . . . , ik).

I satisfies i0 = 0, ik = n and i0 < i1 < i2 < · · · < ik. [xt−1 + 1, xt] is the tth interval of the partition.

In is the set of all k-partitioning index lists for [1 . . . n]

Given index list I, the k-Cost MSΣCk(I) of X is the sum of the MSC’s

The problem is to find the partition that minimizes MSCk(I) over all pos- sible partitions. More specifically use Dynamic Programming to find

OPTk(X) = min MSCk(I)Set Xm = (x1, x2, . . . , xm) .

In the example on the previous page X4 = (6, 0, −6, 0).

For 1 ≤ t ≤ k and 1 ≤ m ≤ n define

M(m : t) = OPTt(Xm).

This is the (minimum) cost of the best t-partition for Xm. Note that

OPTk(Xn) = M (n : k), so filling in the M (m : t) table solves the problem.

- First, describe how to preprocess X in O(n) time so that, after the preprocessing, MSC(i, j) can be calculated in O(1) time, for any i, j

- Write down a recurrence relation for M (m : t) with 1 ≤ t ≤ k and 1 ≤ m ≤ You should write your recurrence relation in the form

M (m : t) =MSC(1, m) if t = 1.

?????????????? otherwise

- Prove the correctness of the recurrence relation for part (b).
- Give documented pseudocode for an algorithm for calculating M (n :

k), based on your recurrence relation from (b).

Your code may assume that you have a procedure for calculating

MSC(i, j) in O(1) time (even if you did not solve part (a)).

Analyze the running time of your algorithm. For full marks, your algorithm should run in O(kn2) time.

CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,

Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This

7COM1028 Secure Systems Programming Referral Coursework: Secure

Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip

CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme

Get Free Quote!

429 Experts Online