CS-498 Probability and Statistics for Computer Scientists

D.A. Forsyth --- 3310 Siebel Center

daf@uiuc.edu, daf@illinois.edu

10:00AM - 10:50AM
MWF
1310 Digital Computer Laboratory

TA's:

Henry Lin halin2@illinois.edu

Karthik Ramaswamy kramasw2@illinois.edu

Office Hours:

DAF: 15h00-16h00 M (3310 Siebel), 13h00-14h00 W (3310 Siebel), 13h00-14h00 F (3310 Siebel)

Henry: 16h00-17h00 W, 16h00-17h00 F, Theory Lounge

Karthik: 16h00-17h00 T, 17h00-18h00 Th, Theory Lounge

NOTE NEW HOURS!

or swing by my office (3310 Siebel) and see if I'm busy

Evaluation is by: Homeworks, Midterm, take home final.

 

 

Homework 12: Due 11h00, 14 Dec 2015

This homework must be done individually. It is a straightforward exercise. You can do this homework in relatively few lines of R (my version ran to about 12 lines), but it takes some thought to know which ones.

 

  1. Obtain the Kittiwake dataset (this one). This gives the population of Kittiwakes (a kind of sea bird) and the area of the colony. We are interested in the relationship between area and population. Use R's lm function (sample code available in the book) to compute four linear regressions: First, predicting population from Area; second, predicting log(population) from Area; third, predicting log(population) from log(area); and fourth, predicting population from log(area). For each regression, prepare a figure that shows a scatter plot of the actual data points, and also the predictions made by the regression (i.e. at each x-value, there should be two y-values --- the true value, and the prediction of the regression). Which of these four models do you trust the most, and why?