D.A. Forsyth --- 3310 Siebel Center
TA's:
- MCS-DS version of the course
- Krishna Kothapalli Lead
- Daniel Calzada
- Taiyu Dong
- Shreya Rajpal
- Ehsan Saleh
- Online version of the course
- In person version of the course
- Christopher Benson
- Ji Li
- Maghav Kumar
Homework 3: Due 19 Feb 2017 23h59 (Mon; midnight)
You may do this homework in groups of up to 3 contributors. Groups of 1 or
of 2 are just fine, too. A group can
consist of any mixture of any type of student (MSCS-DS/Online/Face). We do not offer coordination services for complex
group interactions, and you may want to take this into account when forming your
group.
Submission: Course submission policy is here
You may use any programming language that amuses you for this homework.
Problem 1
CIFAR-10 is a dataset of 32x32 images in 10 categories, collected by Alex
Krizhevsky, Vinod Nair, and Geoffrey Hinton. It is often used to evaluate
machine learning algorithms. You can download this dataset from https://
www.cs.toronto.edu/~kriz/cifar.html.
- For each category, compute the mean image and the first 20 principal
components. Plot the error resulting from representing the images of each
category using the first 20 principal components against the category.
- Compute the distances between mean images for each pair of classes. Use
principal coordinate analysis to make a 2D map of the means of each
categories. For this exercise, compute distances by thinking of the images
as vectors.
- Here is another measure of the similarity of
two classes. For class A and class B, define E(A | B) to be the average
error obtained by representing all the images of class A using the mean
of class A and the first 20 principal components of class B. Now define
the similarity between classes to be (1/2)(E(A | B) + E(B | A)). If A and B are very similar, then
this error should be small, because A's principal components should be good at representing B.
But if they are very different, then A's principal components should represent B poorly. In turn, the similarity measure should be big.
Use
principal coordinate analysis to make a 2D map of the classes. Compare
this map to the map in the previous exercise? are they different? why?