Now I see it! K-means cluster analysis in R

Of course, a picture on a computer monitor is a coloured plot of x and y coordinates or pixels. Still, I was smitten by David Sparks' posts on is.r(), where he shows how easy it is to read images into R to analyse them. In two posts [1], [2] he replicates functionality of image manipulation programmes like GIMP.

I can't resist to write about this here as well. David's first post is about k-means cluster analysis. One of the popular algorithms for k-means is Lloyd's algorithm. So, on that note I will use a picture of the Lloyd's of London building to play around with David's code, despite the fact that the two Lloyds have nothing to do with each other. Lloyd's provides pictures of its building copyright free on its web site. However, I will use a reduced file size version hosted on wikimedia.

The ReadImages package by Markus Löcher [3] allows me to load a jpeg-file into R. The R object of the images is an array, which has the structure of three layered matrices, representing the value of the colours red, green and blue for each x and y coordinate. I convert the array into a data frame, as this is an accepted structure by k-means and plot the data.
library("ReadImages")
url <- "http://upload.wikimedia.org/wikipedia/commons/6/6a/6414A_1_copy.jpg"
fn <- tempfile()
download.file(url, destfile=fn)
readImage <- read.jpeg(fn)

dm <- dim(readImage)
rgbImage <- data.frame(
x=rep(1:dm[2], each=dm[1]),
y=rep(dm[1]:1, dm[2]),
r.value=as.vector(readImage[,,1]),
g.value=as.vector(readImage[,,2]),
b.value=as.vector(readImage[,,3]))

plot(y ~ x, data=rgbImage, main="Lloyd's building",
col = rgb(rgbImage[c("r.value", "g.value", "b.value")]),
asp = 1, pch = ".")


Running a k-means analysis on the three colour columns in my data frame allows me to reduce the picture to k colours. The output gives me for each x and y coordinate the colour cluster it belongs to. Thus, I plot my picture again, but replace the original colours with the cluster colours.
Read more »