D.A. Forsyth --- 3310 Siebel Center

Looking up at the surface along a cliff face from 20m

TA's:

 

 

Homework 7: Due 9 April 2018 23h59 (Mon; midnight)

 

You may do this homework in groups of up to 3 contributors. Groups of 1 or of 2 are just fine, too. A group can consist of any mixture of any type of student (MSCS-DS/Online/Face). We do not offer coordination services for complex group interactions, and you may want to take this into account when forming your group.

Submission: Homework 7 submission details as for earlier homeworks.

No programming environment warnings for this homework.

 

  • EM Topic models The UCI Machine Learning dataset repository hosts several datasets recording word counts for documents here. You will use the NIPS dataset. You will find (a) a table of word counts per document and (b) a vocabulary list for this dataset at the link. You must implement the multinomial mixture of topics model, lectured in class. For this problem, you should write the clustering code yourself (i.e. not use a package for clustering).
  • Image segmentation using EM You can segment an image using a clustering method - each segment is the cluster center to which a pixel belongs. In this exercise, you will represent an image pixel by its r, g, and b values (so use color images!). Use the EM algorithm applied to the mixture of normal distribution model lectured in class to cluster image pixels, then segment the image by mapping each pixel to the cluster center with the highest value of the posterior probability for that pixel. You must implement the EM algorithm yourself (rather than using a package). Test images are here, and you should display results for all three of them. Till then, use any color image you care to.