TA's:
You may do this homework in groups of up to 3 contributors. Groups of 1 or of 2 are just fine, too. A group can consist of any mixture of any type of student (MSCS-DS/Online/Face). We do not offer coordination services for complex group interactions, and you may want to take this into account when forming your group.
Submission: Homework 2 submission details TBA.
You may use any programming language that amuses you for this homework.
You can find a dataset dealing with European employment in 1979 at http://lib.stat.cmu.edu/DASL/Stories/EuropeanJobs.html. This dataset gives the percentage of people employed in each of a set of areas in 1979 for each of a set of European countries. Notice this dataset contains only 26 data points. That's fine; it's intended to give you some practice in visualization of clustering.
Do exercise 6.2 in the Jan 15 version of the course text
Questions about the homeworkAnswer sure; don't know how well they work for this, as haven't used the package. For this one, it may be simpler to build your own than to understand the package
Answer You should not test on examples that you used to build the dictionary, but you can train on them. In a perfect world, I would split the volunteers into a dictionary portion (about half), then do a test/train split for the classifier on the remaining half. You can't do that, because for some signals there are very few volunteers. For each category, choose 20% of the signals (or close!) to be test. Then use the others to both build the dictionary and build the classifier.
Answer Ignore them; they shouldn't matter (think through the logic of the method again if you're uncertain about this)