CS 598 Readings
Instructors: D.A. Forsyth
Time: Tue, Thur 15h3016h45, 3403 Siebel
Coverage
How many objects are there?
How many words does an author know?
Nonparametric coverage methods  The ChaoLee estimator that Ian talked about
 Estimating the number of classes via sample coverage A Chao, SM Lee  Journal of the American Statistical Association, 1992  Taylor & Francis
 Comparison of three estimators of the number of species J Bunge, M Fitzpatrick, J Handley  Journal of Applied Statistics, 1995
 Estimating the number of species in a stochastic abundance model A Chao, J Bunge  Biometrics, 2004  Wiley Online Library
 Predicting the conditional probability of discovering a new class CX Mao  Journal of the American Statistical Association, 2004
 Estimating the number of classes CX Mao, BG Lindsay  The Annals of Statistics, 2007
Using priors on partitions to estimate coverage
My notes on Dirichlet process and PitmanYor processes; in preparing these notes, I used:
 S Ghosal, "The Dirichlet process, related priors and posterior assymptotics", in N. Hjort, C Holmes, P. Mueller and S.G. Walker, eds Bayesian Nonparametrics, Cambridge, 2010
 A. Lijoi and I Pruenster, "Models beyond the Dirichlet process", in N. Hjort, C Holmes, P. Mueller and S.G. Walker, eds Bayesian Nonparametrics, Cambridge, 2010
(you might be able to find PDF's with a web search)
 A Bayesian interpretation of interpolated KneserNey YW Teh  2006  Daphne
 Hierarchical Bayesian nonparametric models with applications YW Teh, MI Jordan  Daphne
 Bayesian nonparametric estimation of the probability of discovering new species A Lijoi, RH Mena, I Prünster  Biometrika, 2007  Biometrika Trust  DAF
Dirichlet process clustering
The source of all this stuff
More resources
Deep Learning
 Hinton, G. E., Osindero, S., and Teh, Y. (2006a). A fast learning algorithm for deep belief nets. Neural Computation, 18,
1527–1554.  Kevin K
 Marc Ranzato, YLan Boureau, Yann LeCun. Sparse Feature Learning for Deep Belief Networks. NIPS 2007  Amin
 Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, "Greedy LayerWise Training of Deep Networks." 2007, pp. 153160.  Kevin S
 Honglak Lee, Roger Grosse, Rajesh Ranganath, Andrew Y. Ng Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations  Juan
 Coates, Adam, Honglak Lee, and Andrew Y. Ng. "An analysis of singlelayer networks in unsupervised feature learning." Ann Arbor 1001 (2010): 48109.  Juan
 Coates, Adam, and Andrew Y. Ng. "The importance of encoding versus training with sparse coding and vector quantization." International Conference on Machine Learning. Vol. 8. 2011.  Juan
 Building Highlevel Features Using Large Scale Unsupervised Learning Quoc V. Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeffrey Dean, and Andrew Y. Ng, ICML 2012.  Juan
 Hinton, Geoffrey E., et al. "Improving neural networks by preventing coadaptation of feature detectors." arXiv preprint arXiv:1207.0580 (2012).  Jason
 Krizhevsky, A., Sutskever, I. and Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012  Jason
 Coates, Adam, and Andrew Ng. "Learning Feature Representations with KMeans." Neural Networks: Tricks of the Trade (2012): 561580.  Tom
 Coates, Adam, Andrej Karpathy, and Andrew Ng. "Emergence of ObjectSelective Features in Unsupervised Feature Learning." Advances in Neural Information Processing Systems 25. 2012. Tom
 A Hinton Paper  Rajhans
Deep Learning Critique
 L. Theis, S. Gerwinn, F. Sinz, M. Bethge In All Likelihood, Deep Belief Is Not Enough,
Journal of Machine Learning Research 12 (2011) 30713096
Representation
High level perception, representation, analogy: A critique ...., DJ Chalmers, R.M. French, D.R. Hofstadter, J. Theoretical Experimental AI, 1991  Saurabh
Multitask learning
 Multitask Learning, Rich Caruana, Machine Learning, July 1997, Volume 28, Issue 1, pp 4175
Structure Learning
Latent Dirichlet Allocation
 D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, January 2003.
 M. Hoffman, D. Blei, J. Paisley, and C. Wang. Stochastic variational inference. June, 2012. [ArXiv]
 A spectral algorithm for latent Dirichlet allocation.
Anima Anandkumar, Dean P. Foster, Daniel Hsu, Sham M. Kakade, and YiKai Liu.
Advances in Neural Information Processing Systems 25, 2012.
Spectral Learning
 A spectral algorithm for learning hidden Markov models.
Daniel Hsu, Sham M. Kakade, and Tong Zhang.
TwentySecond Annual Conference on Learning Theory, 2009.
Journal of Computer and System Sciences, 78(5):14601480, 2012.
 Multilabel prediction via compressed sensing.
Daniel Hsu, Sham M. Kakade, John Langford, and Tong Zhang.
Advances in Neural Information Processing Systems 22, 2009.
Some Feature Selection
 Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B., Vol. 58, No. 1, pages 267288).
J. H. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the
graphical lasso. Biostatistics, 9(3):432–441, 2007.
 The nonparanormal: Semiparametric estimation of high dimensional undirected graphs
Han Liu, John Lafferty and Larry Wasserman
Journal of Machine Learning Research, Vol. 10, pp 22952328, 2009.
NLP
 Dipanjan Das and Slav Petrov, Unsupervised PartofSpeech Tagging with Bilingual GraphBased Projections, ACL 2011
 Ben Taskar, Dan Klein, Michael Collins, Daphne Koller, and Christopher Manning, MaxMargin Parsing, EMNLP 2004
 Michael Collins DualDecomposition Tutorial (ICML?)
 Percy Liang, Slav Petrov, Michael I. Jordan, and Dan Klein, The Infinite PCFG using Hierarchical Dirichlet Processes, EMNLPCoNLL 2007
 Shay B. Cohen, Karl Stratos, Michael Collins, Dean P. Foster, and Lyle Ungar, Spectral Learning of LatentVariable PCFGs, ACL 2012
+ other spectral learning
 Jennifer Gillenwater, Alex Kulesza, and Ben Taskar, NearOptimal MAP Inference for Determinantal Point Processes, NIPS 2012
+ other DPP background, potentially
 Percy Liang, Michael I. Jordan, and Dan Klein, Learning DependencyBased Compositional Semantics, ACL 2011
