Probability and Statistics for Computer Scientists

Cover image for Probability and Statistics
for Computer Science

by David Forsyth

daf@uiuc.edu, daf@illinois.edu

What's in it

I taught a course several times at UIUC to undergrad computer science students. The remit of the course was to teach the bits of probability and statistics that all computer science students should know. As Molesworth used to say, "Akwire kulchur and kepe the brane kleen" (spelling from memory; thanks Geoffrey Willans and Ronald Searle). I taught some simple data visualization methods; some discrete probability; some continuous probability; principal components analysis; some inference; some classification; some clustering; and some regression. The book is a cleaned up version of my notes from that course. I placed a lot of weight on using real data sets, and on getting a broad coverage.

Table of Contents and Downloadable Chapters

Describing Data
- First tools for looking at data
- Looking at relationships
Probability
- Basic ideas in probability
- Random variables and expectations
- Useful probability distributions
Inference
- Samples and populations
- The significance of evidence
- Experiments
- Inferring probability models from data
Tools
- Extracting important relationships in high dimensions
- Learning to classify
- Clustering: models of high dimensional data
- Regression
- Markov chains and hidden markov models

Fixes for broken links

If you've been using this book, or want to use this book, good for you, and thanks. I worked hard to base it around real datasets, from the internet. In the nature of the internet, data moves around, URL's change and so on. The intention of this page is to (a) provide copies of datasets when I downloaded them and (b) to provide pointers to new URL's for those datasets when I can. It's possible -- though unlikely -- that a dataset is unavailable because someone wanted to remove it; if so, please tell me and I'll remove from here too. Thanks to Rao Chaganty for sending me the first bunch of broken links.

Fixes are here