D.A. Forsyth --- 3310 Siebel Center

A cranky male largemouth bass tries to chase a diver from its nest.

TA's:

MCS-DS version of the course
- Krishna Kothapalli Lead
- Daniel Calzada
- Taiyu Dong
- Shreya Rajpal
- Ehsan Saleh
Online version of the course
- Binglin Chen
In person version of the course
- Christopher Benson
- Ji Li
- Maghav Kumar

Homework 3: Due 19 Feb 2017 23h59 (Mon; midnight)

You may do this homework in groups of up to 3 contributors. Groups of 1 or of 2 are just fine, too. A group can consist of any mixture of any type of student (MSCS-DS/Online/Face). We do not offer coordination services for complex group interactions, and you may want to take this into account when forming your group.

Submission: Course submission policy is here

You may use any programming language that amuses you for this homework.

Problem 1

CIFAR-10 is a dataset of 32x32 images in 10 categories, collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. It is often used to evaluate machine learning algorithms. You can download this dataset from https:// www.cs.toronto.edu/~kriz/cifar.html.

For each category, compute the mean image and the first 20 principal components. Plot the error resulting from representing the images of each category using the first 20 principal components against the category.
Compute the distances between mean images for each pair of classes. Use principal coordinate analysis to make a 2D map of the means of each categories. For this exercise, compute distances by thinking of the images as vectors.
Here is another measure of the similarity of two classes. For class A and class B, define E(A | B) to be the average error obtained by representing all the images of class A using the mean of class A and the first 20 principal components of class B. Now define the similarity between classes to be (1/2)(E(A | B) + E(B | A)). If A and B are very similar, then this error should be small, because A's principal components should be good at representing B. But if they are very different, then A's principal components should represent B poorly. In turn, the similarity measure should be big. Use principal coordinate analysis to make a 2D map of the classes. Compare this map to the map in the previous exercise? are they different? why?