CS-498 Applied Machine Learning - Homework 5
CS-498 Applied Machine Learning
D.A. Forsyth --- 3310 Siebel Center
15:30 - 16:45 OR 3.30 pm-4.45 pm, in old money
1320 Digital Computer Laboratory
Mariya Vasileva firstname.lastname@example.org
Sili Hui email@example.com
Daeyun Shin firstname.lastname@example.org
Ayush Jain email@example.com
Ayush Fri - 14h00-16h00 or 2-4 pm, location: in front of 3304
Daeyun Thu - 11h00-13h00 or 11 am-1 pm location: 0207 Siebel/p>
Mariya Wed - 15h00-17h00 or 3 - 5 pm location: 0207 Siebel
Sili Thur - 12h00-14h00 or 12 - 2 pm location: 0207 Siebel
DAF Mon - 14h00-15h00, Fri - 14h00-15h00
or swing by my office (3310 Siebel) and see if I'm busy
Evaluation is by: Homeworks and take home final.
I will shortly post a policy on collaboration and plagiarism
Homework 5: Due 7 Mar 2016 23h59 (Mon; midnight)
You should do this homework in groups of up to three; details of how to submit have been posted on piazza.
Details and description subject to minor changes
- EM Topic models The UCI Machine Learning dataset repository hosts several datasets recording word counts for documents here. You will use the NIPS dataset. You will find (a) a table of word counts per document and (b) a vocabulary list for this dataset at the link. You must implement the multinomial mixture of topics model, lectured in class. For this problem, you
should write the clustering code yourself (i.e. not use a package for clustering).
- Cluster this to 30 topics, using a simple mixture of multinomial topic model, as lectured in class.
- Produce a graph showing, for each topic, the probability with which the topic is selected.
- Produce a table showing, for each topic, the 10 words with the highest probability for that topic.
- Image segmentation using EM You can segment an image using a clustering method - each segment is the cluster center to which a pixel belongs. In this exercise, you will represent an image pixel by its r, g, and b values (so use color images!). Use the EM algorithm applied to the mixture of normal distribution model lectured in class to cluster image pixels, then segment the image by mapping each pixel to the cluster center with the highest value of the posterior probability for that pixel. You must implement the EM algorithm yourself (rather than using a package). We will release a set of test images shortly; till then, use any color image you care to.
- Segment each of the test images to 10, 20, and 50 segments. You should display these segmented images as images, where each pixel's color is replaced with the mean color of the closest segment
- We will identify one special test image. You should segment this to 20 segments using five different start points, and display the result for each case. Is there much variation in the result?