CS-498 Applied Machine Learning
D.A. Forsyth --- 3310 Siebel Center
Office Hours Time: TBA, Location: TBA
Alternative locations may be available
TA's:
- MCS-DS version of the course
- Krishna Kothapalli Lead
- Daniel Calzada
- Taiyu Dong
- Shreya Rajpal
- Ehsan Saleh
- Online version of the course
- In person version of the course
- Christopher Benson
- Ji Li
- Maghav Kumar
Important; Important; Important
Announcements page - check this frequently!
On it, you'll find the homework submission policy!
A straw poll
here
Contact policy
With multiple versions of the course and nearly 400 students, I'm
quite distracted and am focusing on content preparation. Generally,
please do not bring DAF an issue you haven't already raised with a TA.
Questions I've been getting a lot
Getting into the class
In the past, we've been able to admit everyone who wanted to get into
the in-person version of the class after the first rush settled down.
Will this be true this semester? who knows? not me. PLEASE do not
come and tell me that you really want to get in, or your cat died and
its last words were to take the class, or something. I'll try to
admit everyone, but can't concentrate on doing that if all try to tell
me why they want to get in.
There will not be an overflow section. This
means that if enough people drop so that there is space in the room,
you'll get in; if not, you won't. In the past, people have started on
the first homework and been safe. I can't guarantee this will be OK
this year.
The only 4-hour version is online. Sorry, too
many students and versions already.
Can I audit? The main resource limits on the
physical class are physical seats in the room. We cannot have an
overcrowded room. If physical seats are open, sure (I'm always happy
to have an audience); but please don't take a seat that should be
occupied by someone who is registered
What is the difference between the 3 hr and 4 hr versions?
4 hour versions (some online students; MSCS-DS students) will get an extra assignment
later in the course.
Important contact advice
A really common question is: how do I do something in R? Usually, I
get the answer to this by searching; I use Google, but you may have a
preferred search. If you ask me or a TA this question, and we do this
it in front of you successfully you should feel a little embarrassed
cause you could have done this for yourself. Warning:
we will embarrass you in this way; it's better to do this
sort of thing for yourself.
Office Hours
You should contact a TA for your version of the course using that
version's appropriate method. There will be Zoom office hours for the
MSCS-DS course. For the in-person version of the course, go to the
appropriate location
- DAF: TBA
- MCS-DS version of the course
contact via Zoom, should be a button on the coursera page
- Krishna Kothapalli: Mon, 15h00-16h00 Lead
- Daniel Calzada: Tuesday, 20h00-21h00
- Taiyu Dong: Wed, 14h00-15h00
- Shreya Rajpal: Sat, 10h00-11h00
- Ehsan Saleh: Fri, 15h00-16h00
- Online/City Scholars version of the course
contact via Zoom; Binglin will circulate details
- Binglin Chen: Wed, 19h00-20h00, Sat, 19h00-20h00 Note change
- In person version of the course
- Christopher Benson: Wed, 13h00-14h00, loc 0207 Siebel
- Ji Li: Mon, 15h00-16h00, loc 0207 Siebel
- Maghav Kumar: Wed, 13h00-15h00, loc 0207 Siebel
Evaluation is by: Homeworks and take home final.
I will shortly post a policy on collaboration and plagiarism
please complete this. I intend it to be anonymous (but don't fully understand google forms, so...) and it will help me know what you know already. Please don't fill in lots of forms to bias, etc.
Homeworks
A total of 9 homeworks will appear here. There will be no final exam
- one homework will be designated a take-home final.
- Homework 1 Due 5 Feb 2018, 23h59
- Homework 2 Slipped by one day: Now Due 13 Feb 2018, 23h59
- Homework 3 Slipped by one week: Now due 26 Feb Due 19 Feb 2018, 23h59 I slipped this cause I couldn't see any reason not to, but notice this eats into time available for homework 4.
- Homework 4 Notice I found the dataset; also some remarks on test train splits Slipped by one day: Now Due 6 Mar 2018, 23h59 (we had some Compass problems)
- Homework 5 Due 12 Mar 2018, 23h59
- Homework 6 Due 26 Mar 2018, 23h59 I have been pressured to slip this, and have
succumbed to this temptation; due date is now 2 April 2018 However, I wouldn't rely on this working every time
- Homework 7 Due 9 April 2018, 23h59 Won't slip this date, sorry!
- Homework 8 Due 16 April 2018, 23h59 This date has been slipped due to autograder problems; due date is now 3 May.
- Homework 9 Due 10 May 2018, 23h59 This date cannot be slipped.
- Remission of Sins Homework Due 10 May 2018, 23h59 This date cannot be slipped. Terms and conditions apply; see the page.
Censored rubrics
Here are the criteria we will use in grading with some gory details omitted
Drafts of future homeworks
These are drafts of homeworks I intend to release with attached due dates. This helps people
work ahead, BUT you need to keep in mind that I'm not guaranteeing that this wording
is what will appear in the final homework. Details might change, but changes should be small. Due
dates are approximate, and may change.
Syllabus:
I will start at the beginning of the textbook and
proceed to the end, covering approximately one chapter per week. You'll notice there are 14
substantive chapters and 15 weeks; this is to allow a little
spreading out, but in week N I expect to be in or close to chapter N.
Read the textbook. I wrote it specifically for this
course, AND it's free. I will split time in lecture between sketching
important points described in the text, and solving problems. If you
haven't read the text, this might be quite puzzling.
Important This will change as I
purge typos, etc. In the past, people have brought the pdf with
them on mobile devices. If you print it, you'll have to do it again.
Required Text:
Applied Machine Learning, D.A. Forsyth, (approximate 9'th draft)
- Version of 15 Jan 2018
- Version of 28 Mar 2018
- Version of 5 April 2018 Warning: this has the LDA chapter,
at the request of some. This hasn't been purged of typos, and may not be to everybody's taste. That chapter won't be part of
any exercise, etc.
- Version of 16 April 2018 Improved illustration of the autoencoder and GAN sections; improved boosting chapter. This is the one to use right now. Warning: this has the LDA chapter,
at the request of some. This hasn't been purged of typos, and may not be to everybody's taste. That chapter won't be part of
any exercise, etc.
- Version of 19 April 2018 Small modifications of the boosting chapter. This is the one to use right now. Warning: this has the LDA chapter,
at the request of some. This hasn't been purged of typos, and may not be to everybody's taste. That chapter won't be part of
any exercise, etc.
Piazza link
for
this course (which is now right, I believe)
Piazza duty list
Generally, TA's are expected to spend much of their time on
piazza. But we have a system of on-duty and off-duty days. On these days, you'll find
much of the piazza action from the named TA (we hope!)
- Monday: Ehsan, Krishna
- Tuesday: Chris, Daniel
- Wednesday: Maghav
- Thursday: Binglin, Ji
- Friday: Taiyu, Shreya
I'm a video star! (or at least, I have been filmed)
- you can see me here though you'll need to log in, and it may take a short while after class to be ready
Here's what appeared on my screen
broken links fixed (I think)
low resolution movies of the class screen; good motivation to
remind me to adjust zoom on projector, swap screens, etc.
higher resolution movies of the class screen, for those who like more pixels.
Backup Material:
Probability and Statistics for Computer Science,
D.A. Forsyth
- I can no longer release a PDF, as this has been published. The
moire effect on the cover picture is the result of my scanner
interacting with a shiny cover.
Code fragments I showed in class:
I've cleaned some of these up a bit, but they're not intended to be production code, etc;
just to show some R tricks. Among other things, these codes contain
known errors!
- A naive bayes classifier on the Pima indians dataset; I averaged over 10 test train splits, and ignored examples with NA values; mainly interesting for simple code tricks. File here.
- A naive bayes classifier on the Pima indians dataset; I averaged over 10 test train splits, but now I used examples with NA values both in train and test; mainly interesting for simple code tricks. File here.
- A naive bayes classifier on the Pima indians dataset, using Klar and Caret; mainly interesting for simple code tricks. File here.
- An SVM on the Pima indians dataset, using Klar and Caret and SVMLight; mainly interesting for simple code tricks. File here.
- A much more elaborate SVM on the Pima indians dataset, using Klar and Caret and SVMLight. File here.
R resources: