Week 2 Tutorial: Clustering Solution



Question 1 (For Assessment)

  1. Load the data set question1.RData into R.

  1. Compute the following 2 class clusterings of the data:

    • A hierarchical clustering using single linkage

    • A hierarchical clustering using complete linkage

    • A 2 cluster k-means clustering (with nstart=30)

  1. For each clustering, make a plot of the data coloured according to which cluster it is in.

  1. Write a short paragraph commenting on the different clusterings. It should explain why the clusterings are different and which clustering is preferable.

Instructions for submission: Submit a PDF containing the three pictures produced in Step 3 and the explanation in step 4.

Question 2

Implement Lloyd’s K-Means algorithm. The skeleton should look like this:

my_kmeans <- function(data, k, n_starts) {

done = FALSE

n = dim[data][1] #data is a matrix, where each row is one data point

cluster = rep(NA,n) #this vector says which cluster each point is in

#uniformly choose initial cluster centers

centers = data[sample(x=1:n,size = k, replace = FALSE),]

while (!done) {

  • Do Step 2.1

  • Do Step 2.2

  • Check if the cluster assignements changed. If they have, set done=TRUE





Use this algorithm to make a 4 clustering of the data set in question2.RData. Comment on the clustering.

Question 3 (Bookwork)

Do Question 2 from section 10.7 of the text book

Question 4 (Bookwork)

Do Question 4 from section 10.7 of the text book