CS-498 Applied Machine Learning - Additional Homework

CS-498 Applied Machine Learning

D.A. Forsyth --- 3310 Siebel Center

daf@uiuc.edu, daf@illinois.edu

15:30 - 16:45 OR 3.30 pm-4.45 pm, in old money
1320 Digital Computer Laboratory


Mariya Vasileva mvasile2@illinois.edu

Sili Hui silihui2@illinois.edu

Daeyun Shin dshin11@illinois.edu

Ayush Jain ajain42@illinois.edu

Office Hours:

Ayush Fri - 14h00-16h00 or 2-4 pm, location: in front of 3304

Daeyun Thu - 11h00-13h00 or 11 am-1 pm location: 0207 Siebel/p>

Mariya Wed - 15h00-17h00 or 3 - 5 pm location: 0207 Siebel

Sili Thur - 12h00-14h00 or 12 - 2 pm location: 0207 Siebel

DAF Mon - 14h00-15h00, Fri - 14h00-15h00

or swing by my office (3310 Siebel) and see if I'm busy

Evaluation is by: Homeworks and take home final.

I will shortly post a policy on collaboration and plagiarism





Additional Homework, due 2 May 23h59 (Mon; midnight)


This homework is for those taking the course for 4 hours credit. You should do this homework in groups of up to three; details of how to submit have been posted on piazza. You can use any programming language you care to, but I think you'll prefer R because it has tools for this (lm and glmnet).


Details and description subject to minor changes


A bunch of wide datasets, from cancer genetics: In "Clustering Cancer Gene Expression Data: a Comparative Study", by Marcilio C. P. de Souto, Ivan G. Costa, Daniel S. A. de Araujo, Teresa B. Ludermir, Alexander, authors clustered gene expression data for various cancers. In particular, they collected together 35 cancer gene expression datasets. You can find descriptive text here. There is a link to the datasets on that page; it links to here. Each dataset is "wide" (more predictors than examples). Each dataset contains predictors for various classes of medical issue; the number of classes ranges from 2 to 14. This exercise will involve more dataset jockeying than before; that's life.