CS-498 Probability and Statistics for Computer Scientists

D.A. Forsyth --- 3310 Siebel Center

daf@uiuc.edu, daf@illinois.edu

MWF 1320 DCL, 09h00-09h50

TA's: Saurabh Singh (ss1@illinois.edu) and Daphne Tsatsoulis (tsatsou2@illinois.edu)

Office Hours:

DAF: Monday, 10h00-11h00, Wednesday, 10h00-11h00, Friday, 10h00-11h00 (3310 Siebel)

Saurabh: Monday 14h30-15h30, Wednesday 14h30-15h30 (Siebel Center downstairs TA office for 400 level classes)

Daphne: Monday, 13h00-15h00, Friday, 14h00-15h00 (Siebel Center downstairs TA office for 400 level classes) NOTE NEW HOURS!

Homework 9: Due 4 May 2015

This homework can be done in pairs. It is more elaborate than previous homeworks, and is intended to be somewhat open ended. A good submission will include figures like those in the text: examples of the cluster centers; examples of the histograms; examples of the signals; and a class confusion matrix. You can do this homework in relatively few lines of R (my version ran to about 30 lines), but it takes some thought to know which ones.

Do problems 13.1 at the end of chapter 13 ("Clustering") in the 16 April edition of the book. If you think you've already done this one, you're looking at the wrong edition; I reordered the chapters.

Here's a useful fragment of R, which I'll talk about in class:

setwd('/users/daf/Current/courses/Probcourse/Clustering/RCode')
wdat<-read.csv('turkiye-student-evaluation_R_Specific.csv')
wdm<-wdat[c(1:20),c(7:9)] # choose the questions
wdm2 <- matrix(as.vector(t(as.matrix(wdm))), nrow = 5, byrow = TRUE)
foo<-kmeans((wdm2), 2)
closest<-knn(foo$centers, wdm2, c(1:2))
wdm2<-as.data.frame(wdm2)
wdm2$type<-closest
bar<-subset(wdm2, (wdm2$type==1))