CS 598 Optimization methods in vision and learning

Notes

Week 1: Basic continuous optimization (descent directions, coordinate ascent, EM as coordinate ascent, Newton's method, stabilized Newton's method); My notes
Week 2, Week 3: More basic continuous optimization (trust regions, dogleg method, subspace method); My notes: (Conjugate gradient, approximate Hessian methods, BFGS, limited memory methods My notes:
Week 4 and 5: Constrained optimization methods (Lagrangians, lagrange duals, SVMs, quadratic penalty method, augmented lagrangian method) My notes: (inequality constraints; boxes, interior point methods) My notes: (Interior point methods) My notes; Logistic regression as a classifier and stochastic gradient descent (my notes);
Week 8: Initial remarks on Combinatorial optimization (my notes); Flow and cuts (my notes)
Week 9: More flow and cuts; matchings (my notes )
Week 10: Max Cut and SDP (my notes)
Week 15: Clean-up on Max-Cut and SUDOKU (my notes); Submodular functions (my notes)

Resources

Continuous optimization books

Numerical Optimization (Springer Series in Operations Research and Financial Engineering) by Jorge Nocedal and Stephen Wright, 2006
Convex Optimization by Stephen Boyd and Lieven Vandenberghe, Cambridge, 2004
Nonlinear Programming by Dimitri P. Bertsekas, Athena, 1999
Practical Methods of Optimization by R. Fletcher, Wiley, 2000
Practical Optimization by Philip E. Gill, Walter Murray, Margaret H. Wright, Academic, 1982
Johathan Shewchuk, An Introduction to the conjugate gradient method without the agonizing pain, 1994

Papers

Wed 19 Nov paper:

A combinatorial, primal dual approach to semidefinite programs, S. Arora and S. Kale

Continuous optimization applications

Iterative scaling and the like

The improved iterative scaling algorithm: A gentle introduction
A Berger - Unpublished manuscript, 1997
Iain Bancarz, M. Osborne, Improved iterative scaling can yield multiple globally optimal models with radically differing performance levels, Proceedings of the 19th international conference on Computational linguistics, 1 - 7, 2002
Robert Malouf, A comparison of algorithms for maximum entropy parameter estimation,
proceeding of the 6th conference on Natural language learning - Volume 20, 1 - 7, 2002
F. Sha and F. Pereira, Shallow parsing with conditional random fields, Proc HLT-NAACL, Main papers, pp 134-141, 2003
Hanna Wallach, Efficient Training of Conditional Random Fields, University of Edinburgh, 2002.

Boosting

Jerome H. Friedman, Greedy Function Approximation: A Gradient Boosting Machine , The Annals of Statistics, Vol. 29, No. 5 (Oct., 2001), pp. 1189-1232

Bundle adjustment

Bill Triggs, Philip F. McLauchlan, Richard I. Hartley and Andrew W. Fitzgibbon, Bundle Adjustment -- A Modern Synthesis, Vision Algorithms: Theory and Practice: International Workshop on Vision Algorithms, Corfu, Greece, September 1999, pp153-177,

Matrix factorization

Buchanan, A.M.; Fitzgibbon, A.W., Damped Newton algorithms for matrix factorization with missing data, Computer Vision and Pattern Recognition, 2005, 316 - 322
Fast Maximum Margin Matrix Factorization for Collaborative Prediction, Jason D. M. Rennie, Nati Srebro, in Luc De Raedt, Stefan Wrobel (Eds.) Proceedings of the 22nd International Machine Learning Conference, ACM Press, 2005
Scene Discovery by Matrix Factorization, N.Loeff, A. Farhadi, ECCV 2008
T. Finley and T.Joachims, Supervised clustering with support vector machines, Proceedings of the 22nd international conference on Machine learning, 217 - 224, 2005

Registration

A.W. Fitzgibbon, Robust registration of 2D and 3D point sets, Image and Vision Computing, Volume 21, Issues 13-14, 1 December 2003, Pages 1145-1153

SVM's

C.J.C. Burges, ``A Tutorial on Support Vector Machines for Pattern Recognition, '' Data Mining and Knowledge Discovery, 2, 121-167 (1998)
Sequential minimal optimization: A fast algorithm for training support vector machines J Platt - Advances in Kernel Methods-Support Vector Learning, 1999
S. S. Keerthi, S. K. Shevade,C. Bhattacharyya, K. R. K. Murthy, Improvements to Platt's SMO Algorithm for SVM Classifier Design, Neural Computation. 2001;13:637-649, 2001
Dynamic visual category learning, T.Yeh and T. Darrell, CVPR, 2008
K. Zhang, I.W. Tsang, J.T. Kwok, Maximum margin clustering made practical, Proceedings of the 24th international conference on Machine learning, 1119 - 1126, 2007
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM, S. Shalev-Shwartz, Y. Singer, N. Srebro, ICML 2008

Structure learning

I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, Large Margin Methods for Structured and Interdependent Output Variables, Journal of Machine Learning Research (JMLR), 6(Sep):1453-1484, 2005.
Subgradient Methods for Maximum Margin Structured Learning Nathan D. Ratliff J. Andrew Bagnell Martin A. Zinkevich
Learning to localize objects with structured output regression, Blaschko and Lamport, ECCV 2008

Discrete Optimization

Resources

M. Goemans and D.P Williamson, Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming, Journal of the ACM, Volume 42 , Issue 6, 1115 - 1145, 1995

L.Vandenberghe and S.Boyd, Semidefinite programming, SIAM review, 38, 1, 49-95, 1996

Applications

Matching

Y Rubner, C Tomasi, LJ Guibas, The Earth Mover's Distance as a Metric for Image Retrieval, International Journal of Computer Vision, 2000

Grauman, K. Darrell, T., Fast contour matching using approximate earth mover's distance. CVPR 2004

E Levina, P Bickel, The earth mover’s distance is the Mallows distance: Some insights from statistics, Proc. ICCV, 2001

S Belongie, J Malik, J Puzicha, Matching shapes, Proc. of ICCV, 2001

Maciel, J.; Costeira, J.P.; A global solution to sparse correspondence problems, Pattern Analysis and Machine Intelligence, IEEE Transactions on, Volume 25, Issue 2, Feb. 2003 Page(s):187 - 19

A.Berg, T. Berg and J. Malik, Shape matching and object recognition using low distortion correspondence, CVPR, 2005

Markov Random Fields

Boykov, Y. and O. Veksler, Graph cuts in vision and graphics: theories and applications, Handbook of Math. Models of Computer Vision, Paragios, Chen, Faugeras (eds)

Boykov, Y. Veksler, O. Zabih, R. Fast approximate energy minimization via graph cuts, PAMI, 23, 11, 1222-1239, 2001

Boykov, Y. Veksler, O. Zabih, R., Markov random fields with efficient approximations, CVPR, 1998

P. Kohli and M. Pawan Kumar and P.H.S. Torr, P^3 and Beyond: Solving Energies with Higher Order Cliques, CVPR 2007

C.Olsson A.P. Eriksson and F. Kahl, Solving Large Scale Binary Quadratic Problems: Spectral Methods vs Semidefinite Programming, CVPR 2007

Submodular functions

Kang, Jin, Sukthankar, Correlated label propagation with application to multi-label learning, CVPR 2006