###

# Probability and Statistics for Computer Scientists

## by David Forsyth

daf@uiuc.edu, daf@illinois.edu

## What's in it

I taught a course several times at UIUC to undergrad computer science students. The remit of the course was to teach the bits of
probability and statistics that all computer science students should know. As Molesworth used to say, "Akwire kulchur and kepe the brane kleen" (spelling from memory; thanks Geoffrey Willans and Ronald Searle). I taught some simple data visualization methods; some discrete probability; some continuous probability; principal components analysis; some inference; some classification; some clustering; and some regression. The book is a cleaned up version of my notes from that course. I placed a lot of weight on using real data sets, and on getting a broad coverage.

## Table of Contents and Downloadable Chapters

- Describing Data
- First tools for looking at data
- Looking at relationships

- Probability
- Basic ideas in probability
- Random variables and expectations
- Useful probability distributions

- Inference
- Samples and populations
- The significance of evidence
- Experiments
- Inferring probability models from data

- Tools
- Extracting important relationships in high dimensions
- Learning to classify
- Clustering: models of high dimensional data
- Regression
- Markov chains and hidden markov models

## Fixes for broken links

If you've been using this book, or want to use this book, good for you, and thanks. I worked hard
to base it around real datasets, from the internet. In the nature of the internet, data moves around, URL's change
and so on. The intention of this page is to (a) provide copies of datasets when I downloaded them and (b) to provide pointers to new
URL's for those datasets when I can. It's possible -- though unlikely -- that a dataset is unavailable because someone wanted to remove it;
if so, please tell me and I'll remove from here too. Thanks to Rao Chaganty for sending me the first bunch of broken links.

Fixes are here