CS-199 Big Data

Lead Instructor: David Forsyth --- 3310 Soda Hall

Steering Committee: Mani Golparvar-Fard, CEE
Farzad Kamalabadi, ECE
Romit Roy Choudhury, ECE
Richard Sowers, ISE
Matthew West, ME
ChengXiang Zhai, CS

daf@uiuc.edu, daf@illinois.edu

Tue 16h00-17h15 1131 Siebel

Office Hours: TBA

or swing by my office (3310 Siebel) and see if I'm busy

Evaluation is by: Participation, Homework

I will shortly post a policy on collaboration and plagiarism

 

Recommended Book:

Doing Data Science, Cathy O'Neil and Rachel Schutt (O'Reilly Media), 2013

 

Homework:

Class Materials:

 

Datasets:

Domain examples:

Engineering Problem: Construction Activity Analysis for Productivity Improvement
Case: Earthmoving Operations including excavators and dump trucks
Technical solution: Supervised Learning (SVM)
Data (There is 65Gbs data, password protected; password: BigDataIllinois2014)

Engineering Problem: Predicting the Number of Weather Impact Days for Construction using Hydrological Data
Case: Ikenberry Dining and Residential Hall Construction Projects – 81 impact days during 2008-2010
Technical Solution: K-Means Clustering Techniques
Data (20 txt files that need to be parsed in based on the following description, password protected; password: BigDataIllinois2014)

Engineering Problem: Understanding activities using smartphone data
Case: Classify accelerometer data into activities, regress calories against activity labels
Technical Solution: Vector quantization, classification, regression
Data is on this website

Engineering Problem: Identifying a phone from its accelerometer
Case: Look at accelerometer signatures and tell which phone this is
Technical Solution: Vector quantization, classification
Data is on this website

Engineering Problem: Understand and predict user behavior
Case: Yelp reviews of restaurants for a large urban area - what do reviewers do?
Technical Solution: Visualization, summaries, vector quantization, classification
Data is on this website

Engineering Problem: Understand where aerosol particles are coming from
Case: Air pollution data from Paris
Technical Solution: Clustering, vector quantization, classification
Data is on this website

Engineering Problem: Who does well in MOOCS
Case: Completion and grade data for a UIUC MOOC
Technical Solution: Regression
Data is on this website

 

Topic model and text tools:

 

Slides:

Notes:

R Resources: