D.A. Forsyth --- 3310 Siebel Center

Trevor Walker --- 4207 Siebel Center

Office Hours Time: TBA, Location: TBA

DAF waves at camera with
blue drysuit glove in very murky water

TA's:

 

 

Homework 2: Due 17 Sep 2017 23h59 (Mon; midnight)

 

You must do this homework individually

Submission: Course submission policy is here

 

  1. Problem 1

    You may use any programming language that amuses you for this homework.

    The UC Irvine machine learning data repository hosts a collection of data on adult income, donated by Ronny Kohavi and Barry Becker. You can find this data at https://archive.ics.uci.edu/ml/datasets/Adult For each record, there is a set of continuous attributes, and a class "less than 50K" or "greater than 50K". We will provide you with 43958 examples with known class labels, and 4884 examples without class labels, to be found at https://www.kaggle.com/t/f3528db914934de29c24d30cf792dea3. Split it randomly into 10% validation and 90% training data.

    Write a program to train a support vector machine on this data using stochastic gradient descent. You should not use a package to train the classifier (that's the point), but your own code. You should ignore the id number, and use the continuous variables as a feature vector. You should scale these variables so that each has unit variance, and you should subtract the mean so that each has zero mean. You should search for an appropriate value of the regularization constant, trying at least the values [1e-3, 1e-2, 1e-1, 1]. Use the validation set for this search. You should use at least 50 epochs of at least 300 steps each. In each epoch, you should separate out 50 training examples at random for evaluation (call this the set held out for the epoch). You should compute the accuracy of the current classifier on the set held out for the epoch every 30 steps. You should produce:

Submission instructions

We will be using Gradescope (pdf submissions) and compass (code submissions) for all your homework submissions this semester. Compass will be set up and it might take a couple of days that you will see the course online. The homework deadline remains the same. Submit the pdf on Gradescope by Monday(09/17). Again, HW2 codes is due as soon as the compass is set up (you will be notified when the course on the compass is online).

Submission on gradescope:

Course code: MWV5GB

Again(if you haven't already), sign up on Gradescope ONLY by using your Illinois email. We will NOT grade submissions that come from other email ids. Please follow the following format for submitting your pdf on Gradescope.