Transfer learnable features for ASL:

We have recently become interested in transfer learning, a body of procedures that allow information obtained learning one task to be transferred to another, related, task.   Transfer learning is important for activity recognition and understanding, because we do not expect to have examples of particular activities at particular aspects to hand.  We have used ASL recognition as an example domain. We build word models for American Sign Language (ASL) that transfer between different signers and different aspects.  This is advantageous because one could use large amounts of labelled avatar data in combination with a smaller  amount of labelled human data to spot a large number of words in human data. Transfer learning is possible because we represent blocks of video with novel intermediate discriminative feature based on splits of the data. By constructing the same splits in avatar and human data and clustering appropriately, our features are both discriminative and semantically similar: across signers similar features imply similar words. We demonstrate transfer learning in two scenarios: from avatar to a frontally viewed human signer and from an avatar to human signer in a 3/4 view.  This method has been written up and submitted for publication.  This work is joint with Ali Farhadi and Ryan White, and appeared as

 
Ali Farhadi, David Forsyth, Ryan White, "Transfer Learning in Sign Language," IEEE Conference on Computer Vision and Pattern Recognition, 2007


Finding:
  Our method is remarkably effective. We train the method with two types of data.  First, we use examples of some words in frontal view of an avatar, in frontal view of a real signer and in 3/4 view of a real signer (shared words).  These establish comparisons between features.  Then we train models of new words using only examples in frontal view of an avatar (target words).  We use these models to spot words in (a) a frontal view of a real signer and (b) in 3/4 view of a real signer.   Our method has seen no example of these words in for any view of a real signer, but is still accurate at spotting and classifying these words
results from transfer learning
 This confusion matrix establishes that transfer learning is accurate, especially considering the difficulty of the task (elements onthe diagonal indicate correct classification while others indicate errors). Training for the word model (p(c|w)) is done on the avatar and tested on frontal human data (left) and 3/4 view human data (right). This task is more challenging than a typical application because wordspotting is done without language models (such as word frequency priors, bigrams or trigrams).