CS 598 Optimization methods in vision and learning
Homework:
- Homework 1, due 21 Feb 2012
Notes
- Week 1: Basic continuous optimization (descent directions, coordinate ascent, EM as coordinate ascent,
Newton's method, stabilized Newton's method); My notes
- Week 2, Week 3: More
basic continuous optimization (trust regions, dogleg
method, subspace method); My
notes: (Conjugate gradient); My notes (Quasi Newton methods for big problems, including DFP, BFGS, limited memory methods) My notes:
- Week 4, 5 and 6: Constrained optimization methods
- Lagrangians, quadratic penalty method, augmented lagrangian method (My notes:)
- Linear programs as an example (My notes)
- Inequality constraints in Lagrangians and KKT; relations between primal and dual (My notes)
- Active set methods and box constraints (my notes)
- Logistic regression as a classifier (my notes)
- The SVM as an example (My notes)
- L1 logistic regression, quadratic programming and LARS (my notes)
- Stochastic gradient descent (my notes)
- Semi-definite programs (My notes - these really do start at page 5!)
- Interior point methods (My notes)
- Quadratic programming and sequential quadratic programming (my notes)
- Week 8: Initial remarks on Combinatorial optimization (my notes); Flow and cuts (my notes)
- Week 12: alpha-expansion, alpha-beta swaps; MRF's as cuts (my notes)
- Week 13: More flow and cuts; matchings (my notes)
- Week 14: Max Cut and SDP (my notes)
- Final lectures: Submodular functions (my notes; some slides); Bilevel optimization problems (my notes)
Resources
Continuous optimization books
- Numerical Optimization (Springer Series in Operations
Research
and Financial Engineering) by Jorge Nocedal and Stephen Wright, 2006
- Convex Optimization by Stephen Boyd and Lieven
Vandenberghe, Cambridge, 2004
- Understanding and Using Linear Programming, J. Matousek and B. Gartner, Springer, 2007
- Introduction to Optimization, P. Pedregal, Springer, 2004
- Optimization for Machine Learning, Sra, Nowozin and Wright, MIT Press, 2011
- Foundations of Optimization, Guler, Springer, 2010
- Nonlinear Programming by Dimitri P. Bertsekas, Athena, 1999
- Practical Methods of Optimization by R. Fletcher,
Wiley, 2000
- Practical Optimization by Philip E. Gill, Walter
Murray, Margaret H. Wright, Academic, 1982
Useful papers
- Johathan Shewchuk, An
Introduction to the conjugate gradient method without the agonizing
pain, 1994
- F.A. Potra, S.J. Wright, "Interior point methods", Journal of Computational and Applied Mathematics
Volume 124, Issues 1-2, 1 December 2000, Pages 281-302 (find it here from a computer inside UIUC VPN)
- S. Shalev-Shwartz and Y. Singer, "Logarithmic regret algorithms for strongly convex repeated games", TR, Hebrew University, 2007
Neat MRF algorithm cheat-sheet
- Which Energy Minimization for my MRF/CRF? A Cheat-Sheet, Bogdan Alexe, Thomas Deselaers, Marcin Eicher, Vittorio Ferrari, Peter Gehler, Alain Lehmann, Stefano Pellegrini, Alessandro Prest
Submodular function applications
- Submodular function tutorial slides (Krause, Guestrin, 2008)
- Batch Mode Active Learning and Its Application to Medical Image Classification, Hoi, Jin, Zhu and Lyu, ICML 08
Bilevel Optimization notes
- An Introduction to Bilevel Programming (Fricke, ND)
- Classfication model selection via bilevel programming, Kunapuli, Bennet, Hu and Pang, Optimization Methods and Software, 2008
Papers
Continuous optimization applications
Iterative scaling and the
like
- The improved
iterative scaling algorithm: A gentle introduction
A Berger - Unpublished manuscript, 1997
- Iain Bancarz, M. Osborne, Improved iterative scaling can
yield multiple globally optimal models with radically differing
performance levels, Proceedings of the 19th international
conference on Computational linguistics, 1 - 7, 2002
- Robert Malouf, A
comparison of algorithms for maximum entropy parameter estimation,
proceeding of the 6th conference on Natural language learning - Volume
20, 1 - 7, 2002
- F. Sha and F. Pereira, Shallow
parsing with conditional random fields, Proc HLT-NAACL, Main
papers, pp 134-141, 2003
- Hanna Wallach, Efficient
Training of Conditional Random Fields, University of
Edinburgh, 2002.
Boosting
- Jerome H. Friedman, Greedy Function Approximation: A Gradient Boosting Machine , The Annals of Statistics, Vol. 29, No. 5 (Oct., 2001), pp. 1189-1232
Bundle adjustment
- Bill Triggs, Philip F. McLauchlan,
Richard I. Hartley and Andrew W. Fitzgibbon, Bundle Adjustment -- A Modern
Synthesis,
Vision Algorithms: Theory and Practice: International Workshop on
Vision Algorithms, Corfu, Greece, September 1999, pp153-177,
Matrix factorization
- Buchanan, A.M.; Fitzgibbon, A.W., Damped Newton algorithms for
matrix factorization with missing data, Computer Vision and
Pattern Recognition, 2005, 316 - 322
- Fast Maximum Margin Matrix Factorization for Collaborative Prediction, Jason D. M. Rennie, Nati Srebro, in Luc De Raedt, Stefan Wrobel (Eds.) Proceedings of the 22nd International Machine Learning Conference, ACM Press, 2005
- Scene Discovery by Matrix Factorization, N.Loeff, A. Farhadi, ECCV 2008
- T. Finley and T.Joachims, Supervised clustering with support vector machines, Proceedings of the 22nd international conference on Machine learning, 217 - 224, 2005
Registration
- A.W. Fitzgibbon, Robust
registration of 2D and 3D point sets, Image and Vision
Computing, Volume 21, Issues 13-14, 1 December 2003, Pages 1145-1153
SVM's
- C.J.C. Burges, ``A
Tutorial on Support Vector Machines for Pattern
Recognition, '' Data Mining and Knowledge Discovery, 2,
121-167 (1998)
- Sequential minimal
optimization: A fast algorithm for training support vector machines J Platt - Advances in Kernel Methods-Support Vector Learning, 1999
- S. S. Keerthi, S. K. Shevade,C. Bhattacharyya, K. R. K.
Murthy, Improvements to
Platt's SMO Algorithm for SVM Classifier Design, Neural Computation. 2001;13:637-649, 2001
- Dynamic visual category learning, T.Yeh and T. Darrell, CVPR, 2008
- K. Zhang, I.W. Tsang, J.T. Kwok, Maximum margin clustering made practical, Proceedings of the 24th international conference on Machine learning, 1119 - 1126, 2007
- Pegasos: Primal Estimated sub-GrAdient SOlver for SVM, S. Shalev-Shwartz, Y. Singer, N. Srebro, ICML 2008
Structure learning
- I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, Large Margin Methods for Structured and Interdependent Output Variables, Journal of Machine Learning Research (JMLR), 6(Sep):1453-1484, 2005.
- Subgradient Methods for Maximum Margin Structured Learning Nathan D. Ratliff
J. Andrew Bagnell
Martin A. Zinkevich
- Learning to localize objects with structured output regression, Blaschko and Lamport, ECCV 2008
Efficient use of subgradients
- Smooth minimization of non-smooth functions (Nesterov; Math. Prog 2005).
Discrete Optimization
Resources
- M. Goemans and D.P Williamson, Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming, Journal of the ACM, Volume 42 , Issue 6, 1115 - 1145, 1995
- L.Vandenberghe and S.Boyd, Semidefinite programming, SIAM review, 38, 1, 49-95, 1996
Applications
Matching
- Y Rubner, C Tomasi, LJ Guibas, The Earth Mover's Distance as a Metric for Image Retrieval, International Journal of Computer Vision, 2000
- Grauman, K. Darrell, T., Fast contour matching using approximate earth mover's distance. CVPR 2004
- E Levina, P Bickel, The earth mover’s distance is the Mallows distance: Some insights from statistics, Proc. ICCV, 2001
- S Belongie, J Malik, J Puzicha, Matching shapes, Proc. of ICCV, 2001
- Maciel, J.; Costeira, J.P.; A global solution to sparse correspondence problems, Pattern Analysis and Machine Intelligence, IEEE Transactions on, Volume 25, Issue 2, Feb. 2003 Page(s):187 - 19
- A.Berg, T. Berg and J. Malik, Shape matching and object recognition using low distortion correspondence, CVPR, 2005
Markov Random Fields
- Boykov, Y. and O. Veksler, Graph cuts in vision and graphics: theories and applications, Handbook of Math. Models of Computer Vision, Paragios, Chen, Faugeras (eds)
- Boykov, Y. Veksler, O. Zabih, R. Fast approximate energy minimization via graph cuts, PAMI, 23, 11, 1222-1239, 2001
- Boykov, Y. Veksler, O. Zabih, R., Markov random fields with efficient approximations, CVPR, 1998
- P. Kohli and M. Pawan Kumar and P.H.S. Torr, P^3 and Beyond: Solving Energies with Higher Order Cliques, CVPR 2007
- C.Olsson A.P. Eriksson and F. Kahl, Solving Large Scale Binary Quadratic Problems: Spectral Methods vs Semidefinite Programming, CVPR 2007
- Accelerated dual decomposition for MAP inference (Jojic, Gould, Koller; ICML 2010).
- MRF Energy Minimization and Beyond via Dual Decomposition (Komodakis, Paragios; PAMI 2010)
- A Comparative Study of Energy Minimization Methods for Markov Random Fields, R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M.Tappen, and C. Rother, ECCV 2006
- Minimizing Nonsubmodular Functions with Graph Cuts—A Review, V. Kolmogorov and C. Rother, PAMI 2007
- Convergent Tree-Reweighted Message Passing for Energy Minimization, V. Kolmogorov, PAMI 2006
Submodular functions
- Kang, Jin, Sukthankar, Correlated label propagation with application to multi-label learning, CVPR 2006