CS 598 Readings

Instructors: D.A. Forsyth

Time: Tue, Thur 15h30-16h45, 3403 Siebel

Coverage

How many objects are there?

Recognition-by-compo nents: a theory of human image understanding. (material around p 127) I Biederman - Psychological review, 1987

How many words does an author know?

Estimating the number of unseen species: How many words did Shakespeare know? B Efron, R Thisted - Biometrika, 1976 - Biometrika Trust
- My notes on this fascinating paper
Did Shakespeare write a newly-discovered poem? R Thisted, B Efron - Biometrika, 1987 - Biometrika Trust
Are the Thisted-Efron authorship tests valid? RJ Valenza - Computers and the Humanities, 1991 - Springer
The state of authorship attribution studies: Some problems and solutions J Rudman - Computers and the Humanities, 1997 - Springer

Nonparametric coverage methods - The Chao-Lee estimator that Ian talked about

Estimating the number of classes via sample coverage A Chao, SM Lee - Journal of the American Statistical Association, 1992 - Taylor & Francis
Comparison of three estimators of the number of species J Bunge, M Fitzpatrick, J Handley - Journal of Applied Statistics, 1995
Estimating the number of species in a stochastic abundance model A Chao, J Bunge - Biometrics, 2004 - Wiley Online Library
Predicting the conditional probability of discovering a new class CX Mao - Journal of the American Statistical Association, 2004
Estimating the number of classes CX Mao, BG Lindsay - The Annals of Statistics, 2007

Using priors on partitions to estimate coverage

My notes on Dirichlet process and Pitman-Yor processes; in preparing these notes, I used:

S Ghosal, "The Dirichlet process, related priors and posterior assymptotics", in N. Hjort, C Holmes, P. Mueller and S.G. Walker, eds Bayesian Nonparametrics, Cambridge, 2010
A. Lijoi and I Pruenster, "Models beyond the Dirichlet process", in N. Hjort, C Holmes, P. Mueller and S.G. Walker, eds Bayesian Nonparametrics, Cambridge, 2010

(you might be able to find PDF's with a web search)

A Bayesian interpretation of interpolated Kneser-Ney YW Teh - 2006 - Daphne
Hierarchical Bayesian nonparametric models with applications YW Teh, MI Jordan - Daphne
Bayesian nonparametric estimation of the probability of discovering new species A Lijoi, RH Mena, I Prünster - Biometrika, 2007 - Biometrika Trust - DAF

Dirichlet process clustering

A whole bunch of tutorials by Yee-Whye Teh on Bayesian nonparametrics are here
Y.-W. Teh, “Introduction to Bayesian Nonparametrics,” presented at the MLSS, 2011, pp. 1–204.

The source of all this stuff

The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator, J Pitman, M Yor - The Annals of Probability, 1997

More resources

Tutorials on Bayesian Nonparametrics, a web page by Peter Orbanz

Deep Learning

Hinton, G. E., Osindero, S., and Teh, Y. (2006a). A fast learning algorithm for deep belief nets. Neural Computation, 18,
1527–1554. - Kevin K
Marc Ranzato, Y-Lan Boureau, Yann LeCun. Sparse Feature Learning for Deep Belief Networks. NIPS 2007 - Amin
Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, "Greedy Layer-Wise Training of Deep Networks." 2007, pp. 153-160. - Kevin S
Honglak Lee, Roger Grosse, Rajesh Ranganath, Andrew Y. Ng Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations - Juan
Coates, Adam, Honglak Lee, and Andrew Y. Ng. "An analysis of single-layer networks in unsupervised feature learning." Ann Arbor 1001 (2010): 48109. - Juan
Coates, Adam, and Andrew Y. Ng. "The importance of encoding versus training with sparse coding and vector quantization." International Conference on Machine Learning. Vol. 8. 2011. - Juan
Building High-level Features Using Large Scale Unsupervised Learning Quoc V. Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeffrey Dean, and Andrew Y. Ng, ICML 2012. - Juan
Hinton, Geoffrey E., et al. "Improving neural networks by preventing co-adaptation of feature detectors." arXiv preprint arXiv:1207.0580 (2012). - Jason
Krizhevsky, A., Sutskever, I. and Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012 - Jason
Coates, Adam, and Andrew Ng. "Learning Feature Representations with K-Means." Neural Networks: Tricks of the Trade (2012): 561-580. - Tom
Coates, Adam, Andrej Karpathy, and Andrew Ng. "Emergence of Object-Selective Features in Unsupervised Feature Learning." Advances in Neural Information Processing Systems 25. 2012.- Tom
A Hinton Paper - Rajhans

Deep Learning Critique

L. Theis, S. Gerwinn, F. Sinz, M. Bethge In All Likelihood, Deep Belief Is Not Enough,
Journal of Machine Learning Research 12 (2011) 3071-3096

Representation

High level perception, representation, analogy: A critique ...., DJ Chalmers, R.M. French, D.R. Hofstadter, J. Theoretical Experimental AI, 1991 - Saurabh

Multitask learning

Multitask Learning, Rich Caruana, Machine Learning, July 1997, Volume 28, Issue 1, pp 41-75

Structure Learning

I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, Large Margin Methods for Structured and Interdependent Output Variables, Journal of Machine Learning Research (JMLR), 6(Sep):1453-1484, 2005.
Subgradient Methods for Maximum Margin Structured Learning Nathan D. Ratliff J. Andrew Bagnell Martin A. Zinkevich

Latent Dirichlet Allocation

D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, January 2003.
M. Hoffman, D. Blei, J. Paisley, and C. Wang. Stochastic variational inference. June, 2012. [ArXiv]
A spectral algorithm for latent Dirichlet allocation.
Anima Anandkumar, Dean P. Foster, Daniel Hsu, Sham M. Kakade, and Yi-Kai Liu.
Advances in Neural Information Processing Systems 25, 2012.

Spectral Learning

A spectral algorithm for learning hidden Markov models.
Daniel Hsu, Sham M. Kakade, and Tong Zhang.
Twenty-Second Annual Conference on Learning Theory, 2009.
Journal of Computer and System Sciences, 78(5):1460-1480, 2012.
Multi-label prediction via compressed sensing.
Daniel Hsu, Sham M. Kakade, John Langford, and Tong Zhang.
Advances in Neural Information Processing Systems 22, 2009.

Some Feature Selection

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B., Vol. 58, No. 1, pages 267-288).
J. H. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the
graphical lasso. Biostatistics, 9(3):432–441, 2007.
The nonparanormal: Semiparametric estimation of high dimensional undirected graphs
Han Liu, John Lafferty and Larry Wasserman
Journal of Machine Learning Research, Vol. 10, pp 2295-2328, 2009.

NLP

Dipanjan Das and Slav Petrov, Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections, ACL 2011
Ben Taskar, Dan Klein, Michael Collins, Daphne Koller, and Christopher Manning, Max-Margin Parsing, EMNLP 2004
Michael Collins Dual-Decomposition Tutorial (ICML?)
Percy Liang, Slav Petrov, Michael I. Jordan, and Dan Klein, The Infinite PCFG using Hierarchical Dirichlet Processes, EMNLP-CoNLL 2007
Shay B. Cohen, Karl Stratos, Michael Collins, Dean P. Foster, and Lyle Ungar, Spectral Learning of Latent-Variable PCFGs, ACL 2012
+ other spectral learning
Jennifer Gillenwater, Alex Kulesza, and Ben Taskar, Near-Optimal MAP Inference for Determinantal Point Processes, NIPS 2012
+ other DPP background, potentially
Percy Liang, Michael I. Jordan, and Dan Klein, Learning Dependency-Based Compositional Semantics, ACL 2011