Assets for "Probability and Statistics for Computer Scientists"
by David Forsyth
daf@uiuc.edu, daf@illinois.edu
If you've been using this book, or want to use this book, good for you, and thanks. I worked hard
to base it around real datasets, from the internet. In the nature of the internet, data moves around, URL's change
and so on. The intention of this page is to (a) provide copies of datasets when I downloaded them and (b) to provide pointers to new
URL's for those datasets when I can. It's possible -- though unlikely -- that a dataset is unavailable because someone wanted to remove it;
if so, please tell me and I'll remove from here too. Thanks to Rao Chaganty for sending me the first bunch of broken links.
DASL seems to have changed its structure, and most of the DASL links are broken.
Fixes for broken DASL links
-
Cheese data
- was http://lib.stat.cmu.edu/DAS/Datafiles/Cheese.html
- The file I downloaded from DASL was a list of cheese tasting scores;
the backstory wasn't that informative to me (I mean, how does one score the flavor of cheese?), and I don't have it. The scores are in this file.
- Popular kids data
- was http://lib.stat.cmu.edu/DASL/Datafiles/PopularKids.html
- I don't have the back story, but there is a citation in the book. The data is in this
file.
- Oil production data
- was http://lib.stat.cmu.edu/DASL/Datafiles/Oilproduction.html
- The data, and some back story, is on this page.
- Sodium content of hot dogs
- was http://lib.stat.cmu.edu/DASL/Datafiles/Hotdogs.html
- The data is on this page but the mystery of the "Meat" tag remains.
- Magazine ads
- http://lib.stat. cmu.edu/DASL/Datafiles/magadsdat.html
- This data can be found on this page. Notice that the "GRP" field, referred to in the text, is now called "GROUP"
- Portugese student alcohol consumption
- was http://archive.ics.uci.edu/ml/datasets/STUDENT+ALCOHOL+CONSUMPTION
- You can find this here.
- Cigarettes and cancer
- was http://lib.stat.cmu.edu/DASL/Datafiles/cigcancerdat.html
- A copy of this data is here.
- CEO Salaries
- was http://lib.stat.cmu.edu/DASL/Datafiles/ceodat.html
- I found this data here.
- Pizza data
- was
- You should read the paper here, which is by Dunn and describes using this dataset.
- The links in this paper are broken too, but fortunately I kept a copy of this rather nice dataset, which is here.
- Mouse phenotype data
- This was at http://cgd.jax.org/datasets/phenotype/SvensonDO.shtml
- You can find it at http://churchill-lab.jax.org/website/SvensonDO a link which will take you to a page with many interesting datasets. The one you're looking for is Phenotype Data , called Churchill.Mamm.Gen.2012.phenotypes.csv