CS-498 Applied Machine Learning - Homework 6
CS-498 Applied Machine Learning
D.A. Forsyth --- 3310 Siebel Center
15:30 - 16:45 OR 3.30 pm-4.45 pm, in old money
1320 Digital Computer Laboratory
Mariya Vasileva email@example.com
Sili Hui firstname.lastname@example.org
Daeyun Shin email@example.com
Ayush Jain firstname.lastname@example.org
Ayush Fri - 14h00-16h00 or 2-4 pm, location: in front of 3304
Daeyun Thu - 11h00-13h00 or 11 am-1 pm location: 0207 Siebel/p>
Mariya Wed - 15h00-17h00 or 3 - 5 pm location: 0207 Siebel
Sili Thur - 12h00-14h00 or 12 - 2 pm location: 0207 Siebel
DAF Mon - 14h00-15h00, Fri - 14h00-15h00
or swing by my office (3310 Siebel) and see if I'm busy
Evaluation is by: Homeworks and take home final.
I will shortly post a policy on collaboration and plagiarism
Homework 6: Due 4 April 2016 23h59 (Mon; midnight)
You should do this homework in groups of up to three; details of how to submit have been posted on piazza. You can use any programming language you
care to, but I think you'll prefer R because it has tools for this (lm and glmnet).
Details and description subject to minor changes
- Linear regression with various regularizers The UCI Machine Learning dataset repository hosts a dataset giving features of music, and the latitude and longitude from which that
music originates here. Investigate methods to predict latitude and longitude from these features, as below. There are actually two versions of this dataset. Either one is OK by me, but I think you'll find the one with more independent variables more interesting. You should
ignore outliers (by this I mean you should ignore the whole question; do not try to deal with them). You should regard latitude and longitude as entirely independent.
- First, build a straightforward linear regression of latitude (resp. longitude) against features. What is the R-squared? Plot a graph evaluating each regression.
- Does a Box-Cox transformation improve the regressions? Notice that the dependent variable has some negative values, which Box-Cox doesn't like. You can deal with this by remembering that these are angles, so you get to choose the origin. why do you say so? For the rest of the exercise, use the transformation if it does improve things, otherwise, use the raw data.
- Use glmnet to produce:
- A regression regularized by L2 (equivalently, a ridge regression). You should estimate the regularization coefficient that produces the minimum error. Is the regularized regression better than the unregularized regression?
- A regression regularized by L1 (equivalently, a lasso regression). You should estimate the regularization coefficient that produces the minimum error. How many variables are used by this regression? Is the regularized regression better than the unregularized regression?
- A regression regularized by elastic net (equivalently, a regression regularized by a convex combination of L1 and L2). Try three values of alpha, the weight setting how big L1 and L2 are. You should estimate the regularization coefficient that produces the minimum error. How many variables are used by this regression? Is the regularized regression better than the unregularized regression?
- Logistic regression The UCI Machine Learning dataset repository hosts a dataset giving whether a Taiwanese credit card user defaults against a variety of features here. Use logistic regression to predict whether the user defaults. You should ignore outliers, but you should try the
various regularization schemes we have discussed.