CS 598 Readings
Instructors: D.A. Forsyth
Time: Tue, Thur 15h30-16h45, 3403 Siebel
How many objects are there?
How many words does an author know?
Nonparametric coverage methods - The Chao-Lee estimator that Ian talked about
- Estimating the number of classes via sample coverage A Chao, SM Lee - Journal of the American Statistical Association, 1992 - Taylor & Francis
- Comparison of three estimators of the number of species J Bunge, M Fitzpatrick, J Handley - Journal of Applied Statistics, 1995
- Estimating the number of species in a stochastic abundance model A Chao, J Bunge - Biometrics, 2004 - Wiley Online Library
- Predicting the conditional probability of discovering a new class CX Mao - Journal of the American Statistical Association, 2004
- Estimating the number of classes CX Mao, BG Lindsay - The Annals of Statistics, 2007
Using priors on partitions to estimate coverage
My notes on Dirichlet process and Pitman-Yor processes; in preparing these notes, I used:
A Bayesian interpretation of interpolated Kneser-Ney YW Teh - 2006 - Daphne
Hierarchical Bayesian nonparametric models with applications YW Teh, MI Jordan - Daphne
Bayesian nonparametric estimation of the probability of discovering new species A Lijoi, RH Mena, I Prünster - Biometrika, 2007 - Biometrika Trust - DAF
- S Ghosal, "The Dirichlet process, related priors and posterior assymptotics", in N. Hjort, C Holmes, P. Mueller and S.G. Walker, eds Bayesian Nonparametrics, Cambridge, 2010
- A. Lijoi and I Pruenster, "Models beyond the Dirichlet process", in N. Hjort, C Holmes, P. Mueller and S.G. Walker, eds Bayesian Nonparametrics, Cambridge, 2010
(you might be able to find PDF's with a web search)
Dirichlet process clustering
The source of all this stuff
- Hinton, G. E., Osindero, S., and Teh, Y. (2006a). A fast learning algorithm for deep belief nets. Neural Computation, 18,
1527–1554. - Kevin K
- Marc Ranzato, Y-Lan Boureau, Yann LeCun. Sparse Feature Learning for Deep Belief Networks. NIPS 2007 - Amin
- Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, "Greedy Layer-Wise Training of Deep Networks." 2007, pp. 153-160. - Kevin S
- Honglak Lee, Roger Grosse, Rajesh Ranganath, Andrew Y. Ng Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations - Juan
- Coates, Adam, Honglak Lee, and Andrew Y. Ng. "An analysis of single-layer networks in unsupervised feature learning." Ann Arbor 1001 (2010): 48109. - Juan
- Coates, Adam, and Andrew Y. Ng. "The importance of encoding versus training with sparse coding and vector quantization." International Conference on Machine Learning. Vol. 8. 2011. - Juan
- Building High-level Features Using Large Scale Unsupervised Learning Quoc V. Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeffrey Dean, and Andrew Y. Ng, ICML 2012. - Juan
- Hinton, Geoffrey E., et al. "Improving neural networks by preventing co-adaptation of feature detectors." arXiv preprint arXiv:1207.0580 (2012). - Jason
- Krizhevsky, A., Sutskever, I. and Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012 - Jason
- Coates, Adam, and Andrew Ng. "Learning Feature Representations with K-Means." Neural Networks: Tricks of the Trade (2012): 561-580. - Tom
- Coates, Adam, Andrej Karpathy, and Andrew Ng. "Emergence of Object-Selective Features in Unsupervised Feature Learning." Advances in Neural Information Processing Systems 25. 2012.- Tom
- A Hinton Paper - Rajhans
Deep Learning Critique
- L. Theis, S. Gerwinn, F. Sinz, M. Bethge In All Likelihood, Deep Belief Is Not Enough,
Journal of Machine Learning Research 12 (2011) 3071-3096
High level perception, representation, analogy: A critique ...., DJ Chalmers, R.M. French, D.R. Hofstadter, J. Theoretical Experimental AI, 1991 - Saurabh
- Multitask Learning, Rich Caruana, Machine Learning, July 1997, Volume 28, Issue 1, pp 41-75
Latent Dirichlet Allocation
- D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, January 2003.
- M. Hoffman, D. Blei, J. Paisley, and C. Wang. Stochastic variational inference. June, 2012. [ArXiv]
- A spectral algorithm for latent Dirichlet allocation.
Anima Anandkumar, Dean P. Foster, Daniel Hsu, Sham M. Kakade, and Yi-Kai Liu.
Advances in Neural Information Processing Systems 25, 2012.
- A spectral algorithm for learning hidden Markov models.
Daniel Hsu, Sham M. Kakade, and Tong Zhang.
Twenty-Second Annual Conference on Learning Theory, 2009.
Journal of Computer and System Sciences, 78(5):1460-1480, 2012.
- Multi-label prediction via compressed sensing.
Daniel Hsu, Sham M. Kakade, John Langford, and Tong Zhang.
Advances in Neural Information Processing Systems 22, 2009.
Some Feature Selection
- Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B., Vol. 58, No. 1, pages 267-288).
J. H. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the
graphical lasso. Biostatistics, 9(3):432–441, 2007.
- The nonparanormal: Semiparametric estimation of high dimensional undirected graphs
Han Liu, John Lafferty and Larry Wasserman
Journal of Machine Learning Research, Vol. 10, pp 2295-2328, 2009.
- Dipanjan Das and Slav Petrov, Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections, ACL 2011
- Ben Taskar, Dan Klein, Michael Collins, Daphne Koller, and Christopher Manning, Max-Margin Parsing, EMNLP 2004
- Michael Collins Dual-Decomposition Tutorial (ICML?)
- Percy Liang, Slav Petrov, Michael I. Jordan, and Dan Klein, The Infinite PCFG using Hierarchical Dirichlet Processes, EMNLP-CoNLL 2007
- Shay B. Cohen, Karl Stratos, Michael Collins, Dean P. Foster, and Lyle Ungar, Spectral Learning of Latent-Variable PCFGs, ACL 2012
+ other spectral learning
- Jennifer Gillenwater, Alex Kulesza, and Ben Taskar, Near-Optimal MAP Inference for Determinantal Point Processes, NIPS 2012
+ other DPP background, potentially
- Percy Liang, Michael I. Jordan, and Dan Klein, Learning Dependency-Based Compositional Semantics, ACL 2011