CS-498 Probability and Statistics for Computer Scientists

D.A. Forsyth --- 3310 Siebel Center

daf@uiuc.edu, daf@illinois.edu

MWF 1320 DCL, 09h00-09h50

TA's: Saurabh Singh (ss1@illinois.edu) and Daphne Tsatsoulis (tsatsou2@illinois.edu)

Office Hours:

DAF: Monday, 10h00-11h00, Wednesday, 10h00-11h00, Friday, 10h00-11h00 (3310 Siebel)


Saurabh: Monday 14h30-15h30, Wednesday 14h30-15h30 (Siebel Center downstairs TA office for 400 level classes)

Daphne: Monday, 13h00-15h00, Friday, 14h00-15h00 (Siebel Center downstairs TA office for 400 level classes) NOTE NEW HOURS!



Homework 10: Due 11 May 2014

This homework can be done in pairs. It is more elaborate than previous homeworks, and is intended to be somewhat open ended. I will designate it a take-home final, which changes nothing but means that it substitutes for a final exam. A good submission will include figures like those in the text, used to investigate the regression. You can do this homework in relatively few lines of R (my version ran to about 30 lines), but it takes some thought to know which ones.


  1. Obtain the Kittiwake dataset (this one). This gives the population of Kittiwakes (a kind of sea bird) and the area of the colony. We are interested in the relationship between area and population. Use R's lm function (sample code available in the book) to compute four linear regressions: First, predicting population from Area; second, predicting log(population) from Area; third, predicting log(population) from log(area); and fourth, predicting population from log(area). For each regression, prepare a figure that shows a scatter plot of the actual data points, and also the predictions made by the regression (i.e. at each x-value, there should be two y-values --- the true value, and the prediction of the regression). Which of these four models do you trust the most, and why?
  2. Obtain the Yacht Hydrodynamics dataset (this one) from the UC Irving Machine Learning website. This gives the residuary resistance per unit weight of displacement for a yacht hull (variable 7) in terms of a variety of properties (variables 1-6). Regress variable 7 against the others. Does a Box-Cox transformation help? How good is the regression?