CS-498 Probability and Statistics for Computer Scientists
D.A. Forsyth --- 3310 Siebel Center
MWF 1320 DCL, 09h00-09h50
TA's: Saurabh Singh (email@example.com) and Daphne Tsatsoulis (firstname.lastname@example.org)
DAF: Monday, 10h00-11h00, Wednesday, 10h00-11h00, Friday, 10h00-11h00 (3310 Siebel)
Saurabh: Monday 14h30-15h30, Wednesday 14h30-15h30 (Siebel Center downstairs TA office for 400 level classes)
Daphne: Monday, 13h00-15h00, Friday, 14h00-15h00 (Siebel Center downstairs TA office for 400 level classes) NOTE NEW HOURS!
Homework 10: Due 11 May 2014
This homework can be done in pairs. It is more elaborate than previous homeworks, and is intended to be somewhat open ended. I will designate it a take-home final, which changes nothing but means that it substitutes for a final exam. A good submission will include figures like those in the text, used to investigate the regression. You can do this homework in relatively few lines of R (my version ran to about 30 lines), but it takes some thought to know which ones.
Obtain the Kittiwake dataset (this
. This gives the population of Kittiwakes (a kind of sea bird)
and the area of the colony. We are interested in the relationship
between area and population. Use R's lm function (sample code
available in the book) to compute four linear regressions:
First, predicting population from Area; second, predicting
log(population) from Area; third, predicting log(population) from
log(area); and fourth, predicting population from log(area).
For each regression, prepare a figure that shows a scatter plot
of the actual data points, and also the predictions made by the
regression (i.e. at each x-value, there should be two y-values ---
the true value, and the prediction of the regression). Which of
these four models do you trust the most, and why?
- Obtain the Yacht Hydrodynamics dataset (this one) from the UC Irving Machine Learning website. This gives the residuary resistance per unit weight of displacement for a yacht hull (variable 7) in terms of a variety of properties (variables 1-6). Regress variable 7 against the others. Does a Box-Cox transformation help? How good is the regression?