CS-498 Probability and Statistics for Computer Scientists
D.A. Forsyth --- 3310 Siebel Center
MWF 1320 DCL, 09h00-09h50
TA's: Michael Sittig (firstname.lastname@example.org) and Nik Spirin (email@example.com)
DAF: Monday, 10h00-11h00, Wednesday, 10h00-11h00, Friday, 10h00-11h00 (3310 Siebel)
Michael Sittig: Tuesdays, 14h00-15h00, Wednesdays, 17h00-18h00, Siebel Center downstairs TA office for 400 level classes
Nik Spirin: Thursdays, 17h00-18h00, Fridays, 11h00-12h00, Siebel Center downstairs TA office for 400 level classes
Homework 9: Due 8 Dec 2014
Do problems 13.1 and 13.2 at the end of chapter 13 ("Learning to Classify") in the 21 Nov edition of the book
Obtain the 50K dataset from the UC Irvine Machine Learning data repository (the dataset is this one).
Drop all the categorical variables, and use R's Nearest Neighbor package to build a classifier that can tell whether an adult will earn more than or less than $50K. Use the package's test-train split machinery, and estimate the accuracy of your classifier. Look at the worked example at the start of chapter 15 for a simple example.
I've no idea what this is doing here, and am having trouble figuring out how to remove it; sorry