D.A. Forsyth --- 3310 Siebel Center

Trevor Walker --- 4207 Siebel Center

Office Hours Time: TBA, Location: TBA

DAF waves at camera with
blue drysuit glove in very murky water

TA's:

 

 

Homework 3: Due 24 Sep 2017 23h59 (Mon; midnight)

 

You must do this homework individually

Submission: Course submission policy is as described in the homework

 

  1. Problem 1

    You may use any programming language that amuses you for this homework. You may use a PCA package if you so choose but remember you need to understand what comes out of the package to get the homework right!

    The goal of this homework is to use PCA to smooth the noise in the provided data. At https://www.kaggle.com/t/e9337b95218e48a1be69a69e3826688a , you will find a five noisy versions of the Iris dataset, and a noiseless version.

    For each of the 5 noisy data sets, you should compute the principle components in two ways. In the first, you will use the mean and covariance matrix of the noiseless dataset. In the second, you will use the mean and covariance of the respective noisy datasets. Based on these components, you should compute the mean squared error between the noiseless version of the dataset and each of a PCA representation using 0 (i.e. every data item is represented by the mean), 1, 2, 3, and 4 principal components.

    You should produce:

Number of PCs-> 0N 1N 2N 3N 4N 0c 1c 2c 3c 4c
Dataset I
Dataset II
Dataset III
Dataset IV
Dataset V

Submission details

  1. Submit to Kaggle The reconstruction of the second noisy version as a CSV file in the same format as the datasets. The CSV file should be named "yournetid-recon.csv", and I will frown on humorists who do not replace the "yournetid" with their netid. Each row is your reconstructed version of the data item. The first line should be a header reading:

    "Sepal.Length","Sepal.Width","Petal.Length","Petal.Width"

    Each following line is your reconstruction of a data item, in order (so first data item first, etc).

  2. Submit to Kaggle A CSV file containing your numbers for the table. The CSV file should be named "yournetid-numbers.csv", and I will frown on humorists who do not replace the "yournetid" with their netid. The first line should read

    "0N, 1N, 2N, 3N, 4N, 0c, 1c, 2c, 3c, 4c"

    The following lines should be the rows of the table, in order, and contain only numbers. You should provide your numbers to at least three digits.

  3. Submit to Gradescope: A two-page PDF file. The first page should contain your 50 MSE numbers filled in a table (see the course page for what it should look like), and the second page should be a one-page screenshot of your code.