CS-498 Applied Machine Learning - Homework 5

CS-498 Applied Machine Learning

D.A. Forsyth --- 3310 Siebel Center

daf@uiuc.edu, daf@illinois.edu

15:30 - 16:45 OR 3.30 pm-4.45 pm, in old money
TuTh
1320 Digital Computer Laboratory

TA's:

Mariya Vasileva mvasile2@illinois.edu

Sili Hui silihui2@illinois.edu

Daeyun Shin dshin11@illinois.edu

Ayush Jain ajain42@illinois.edu

Office Hours:

Ayush Fri - 14h00-16h00 or 2-4 pm, location: in front of 3304

Daeyun Thu - 11h00-13h00 or 11 am-1 pm location: 0207 Siebel/p>

Mariya Wed - 15h00-17h00 or 3 - 5 pm location: 0207 Siebel

Sili Thur - 12h00-14h00 or 12 - 2 pm location: 0207 Siebel

DAF Mon - 14h00-15h00, Fri - 14h00-15h00

or swing by my office (3310 Siebel) and see if I'm busy

Evaluation is by: Homeworks and take home final.

I will shortly post a policy on collaboration and plagiarism

Homework 5: Due 7 Mar 2016 23h59 (Mon; midnight)

You should do this homework in groups of up to three; details of how to submit have been posted on piazza.

Details and description subject to minor changes

EM Topic models The UCI Machine Learning dataset repository hosts several datasets recording word counts for documents here. You will use the NIPS dataset. You will find (a) a table of word counts per document and (b) a vocabulary list for this dataset at the link. You must implement the multinomial mixture of topics model, lectured in class. For this problem, you should write the clustering code yourself (i.e. not use a package for clustering).
- Cluster this to 30 topics, using a simple mixture of multinomial topic model, as lectured in class.
- Produce a graph showing, for each topic, the probability with which the topic is selected.
- Produce a table showing, for each topic, the 10 words with the highest probability for that topic.
Image segmentation using EM You can segment an image using a clustering method - each segment is the cluster center to which a pixel belongs. In this exercise, you will represent an image pixel by its r, g, and b values (so use color images!). Use the EM algorithm applied to the mixture of normal distribution model lectured in class to cluster image pixels, then segment the image by mapping each pixel to the cluster center with the highest value of the posterior probability for that pixel. You must implement the EM algorithm yourself (rather than using a package). We will release a set of test images shortly; till then, use any color image you care to.
- Segment each of the test images to 10, 20, and 50 segments. You should display these segmented images as images, where each pixel's color is replaced with the mean color of the closest segment
- We will identify one special test image. You should segment this to 20 segments using five different start points, and display the result for each case. Is there much variation in the result?