D.A. Forsyth --- 3310 Siebel Center

A view from inside a small wreck, looking out a door.

TA's:

 

 

Homework 2: Due 12 Feb 2017 23h59 (Mon; midnight)

 

You may do this homework in groups of up to 3 contributors. Groups of 1 or of 2 are just fine, too. A group can consist of any mixture of any type of student (MSCS-DS/Online/Face). We do not offer coordination services for complex group interactions, and you may want to take this into account when forming your group.

Submission: Course submission policy is here

 

  1. Problem 1

    You may use any programming language that amuses you for this homework.

    The UC Irvine machine learning data repository hosts a collection of data on adult income, donated by Ronny Kohavi and Barry Becker. You can find this data at https://archive.ics.uci.edu/ml/datasets/Adult For each record, there is a set of continuous attributes, and a class "less than 50K" or "greater than 50K". There are 48842 examples. You should use only the continuous attributes (see the description on the web page) and drop examples where there are missing values of the continuous attributes. Separate the resulting dataset randomly into 10% validation, 10% test, and 80% training examples.

    Write a program to train a support vector machine on this data using stochastic gradient descent. You should not use a package to train the classifier (that's the point), but your own code. You should ignore the id number, and use the continuous variables as a feature vector. You should scale these variables so that each has unit variance. You should search for an appropriate value of the regularization constant, trying at least the values [1e-3, 1e-2, 1e-1, 1]. Use the validation set for this search. You should use at least 50 epochs of at least 300 steps each. In each epoch, you should separate out 50 training examples at random for evaluation (call this the set held out for the epoch). You should compute the accuracy of the current classifier on the set held out for the epoch every 30 steps. You should produce: