CS-498 Applied Machine Learning - Homework 3
CS-498 Applied Machine Learning
D.A. Forsyth --- 3310 Siebel Center
15:30 - 16:45 OR 3.30 pm-4.45 pm, in old money
1320 Digital Computer Laboratory
Mariya Vasileva email@example.com
Sili Hui firstname.lastname@example.org
Daeyun Shin email@example.com
Ayush Jain firstname.lastname@example.org
Ayush Tue, Thu - 17h00-18h00 or 5-6 pm in old money
Daeyun Thu - 11h00-13h00 or 11 am-1 pm in old money
Mariya Wed - 15h00-17h00 or 3 - 5 pm in old money
Sili Thur - 12h00-14h00 or 12 - 2 pm in old money
DAF Mon - 14h00-15h00, Fri - 14h00-15h00
or swing by my office (3310 Siebel) and see if I'm busy
Evaluation is by: Homeworks and take home final.
I will shortly post a policy on collaboration and plagiarism
Homework 2: Due 22 Feb 2016 23h59 (Mon; midnight)
You should do this homework in groups of up to three; details of how to submit have been posted on piazza.
Details and description subjet to minor changes
Submission: Homework 3 submission details TBA.
You will find a training dataset to do with matching faces here. Each vector consists of measurements of attributes from two faces. If the faces belong to the same class, the first element of the vector is zero, otherwise it is one. The rest of the vector consists of (attribute values for face 1) (attribute values for face 2). You will use this dataset to train and evaluate various classifiers. Be aware that this is a large dataset (about 100M), so don't download repeatedly or all at the same time. Fairly soon, we will release various test datasets on Kaggle. The homework has several components.
- Train and evaluate the following classifiers, using this data for training and the evaluation data we will release. You can start without the evaluation data by doing test-train splits on this training data.
This part will comprise approximately 1/3 of the grade. You should report: training error for each of these classifiers; test error for each on the evaluation sets we release on Kaggle.
- Linear SVM (you may use a package or your own code)
- Naive Bayes (you may use a package or your own code)
- Random Forests (I strongly advise you use a package, but...)
- The original data can be found here; a clear and helpful account of how the data was made can be found here. You will find each of the attribute vectors for each face here. Notice that, in principle, you
could build a classifier for each pair by matching the first attribute vector to this dataset, matching the second attribute vector to the dataset, then seeing if the two names are the same. Doing this in
practice presents challenges, because there is so much data. Use an approximate nearest neighbors package (I favor FLANN or RANN) to build such a matcher. Report its accuracy on the training data and the evaluation sets we release on Kaggle. This part will comprise approximately 1/3 of the grade.
- Use any method you like, apart from matching as in part 2 to build a classifier that is as accurate as you can make it. Report its accuracy on the training data and on the evaluation sets we release on Kaggle. This part will comprise approximately 1/3 of the grade, weighted so that any reasonable contribution gets half the available marks, and the other half are allocated based on the accuracy of your method.