I taught a course several times at UIUC to undergrad computer science students. The remit of the course was to teach the bits of probability and statistics that all computer science students should know. As Molesworth used to say, "Akwire kulchur and kepe the brane kleen" (spelling from memory; thanks Geoffrey Willans and Ronald Searle). I taught some simple data visualization methods; some discrete probability; some continuous probability; principal components analysis; some inference; some classification; some clustering; and some regression. The book is a cleaned up version of my notes from that course. I placed a lot of weight on using real data sets, and on getting a broad coverage.
If you've been using this book, or want to use this book, good for you, and thanks. I worked hard to base it around real datasets, from the internet. In the nature of the internet, data moves around, URL's change and so on. The intention of this page is to (a) provide copies of datasets when I downloaded them and (b) to provide pointers to new URL's for those datasets when I can. It's possible -- though unlikely -- that a dataset is unavailable because someone wanted to remove it; if so, please tell me and I'll remove from here too. Thanks to Rao Chaganty for sending me the first bunch of broken links.
Fixes are here