Object Recognition

Object recognition is a topic of ongoing interest to me. An attraction is that we don't really know what it means to be doing object recognition, and the collective view of what the problem is has changed rather sharply on several occasions. One should use image cues to identify image regions as containing objects belonging to categories --- but what are categories? One source of information is language, and we have looked at the way people annotate images as a cue to what is depicted there.

There are a series of themes in my work here, which I break out in the list of papers. I have been fortunate to work with several talented people on this topic, including Joe Mundy, Andrew Zisserman, Charlie Rothwell and Michael Jordan; and including Kobus Barnard, Nando de Freitas and Erik Learned-Miller, who worked with me when they were post-docs at Berkeley; and including Pinar Duygulu, who worked with me as a visiting scholar at Berkeley; and including Tamara Berg, Alex Berg, Jaety Edwards, Michael Maire and Ryan White who worked with me when they were students at Berkeley.

Current students working with me on object recognition are: Ian Endres, Ali Farhadi, Varsha Hedau, Nicolas Loeff, Alex Sorokin and Gang Wang

I have done quite a lot of work on human activity recognition with NSF support, which is also described on this page.

Applications of Object Recognition

Forsyth, D.A., Malik, J. and Wilensky, R., ``Searching for digital pictures,'' Scientific American, June, 1997.

Words and pictures

The words that appear near pictures are very revealing of what might appear in the picture. Cooccurrence cues can be used to link images with words, or image regions with particular words. A nearby picture could disambiguate the sense of a word. Finally, there are language cues that suggest some objects are present.

Image-word co-occurrence

C.O. Alm, N.Loeff and D.A. Forsyth, Challenges for annotating images for sense disambiguation, Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006, pages 1–4, Sydney, July 2006
N.Loeff, C.O. Alm, D.A. Forsyth, Discriminating image senses by clustering with multimodal features Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pages
547–554, Sydney, July 2006
T. Berg and D.A. Forsyth, Animals on the Web, Proc CVPR 2006,
Jaety Edwards, Ryan White and David Forsyth "Words and Pictures in the News", HLT-NAACL03 Workshop on Learning Word Meaning from Non-Linguistic Data, 2003
K. Barnard and D.A. Forsyth, ``Learning the semantics of words and pictures'' Proc. IEEE InternationalConference on Computer Vision , pp. 408-415, July, 2001
K. Barnard and D.A. Forsyth, ``Clustering Art,'' IEEE Conf. Computer Vision and Pattern Recognition , 2001

Region-word co-occurrence

Kobus Barnard, Pinar Duygulu, Raghavendra Guru, Prasad Gabbur, and David Forsyth, "The effects of segmentation and feature choice in a translation model of object recognition", in Proceedings of Computer Vision and Pattern Recognition (CVPR), Madison, Wisconsin, June 2003.
Kobus Barnard, Pinar Duygulu, Nando de Freitas, David Forsyth, David Blei, and Michael I. Jordan, "Matching Words and Pictures", Journal of Machine Learning Research , Vol 3, pp 1107-1135. 2003
P. Duygulu, N. de Freitas, K. Barnard and D.A. Forsyth, "Object Recognition as Machine Translation," Proc. European Conf. Computer Vision, 2002 Best paper in cognitive computer vision award

Word senses

Kobus Barnard, Matthew Johnson and David Forsyth, "Words Sense Disambiguation with Pictures ", HLT-NAACL03 Workshop on Learning Word Meaning from Non-Linguistic Data, 2003

Language cues to depiction

Tamara Miller, Alexander C. Berg, Jaety Edwards, Michael Maire, Ryan White, Yee Whye Teh, Eric Learned-Miller, David A. Forsyth "Faces and Names in the News", Proceedings of Computer Vision and Pattern Recognition (CVPR), Washington DC, June 2004
Tamara L. Berg, Alexander C. Berg, Jaety Edwards, David A. Forsyth ``Who's in the Picture, '' Neural Information Processing Systems (NIPS), 2004

Object Representation

Himanshu Arora, Nicolas Loeff, David Forsyth, and Narendra Ahuja, “Unsupervised Segmentation of Objects using Efficient Learning” IEEE Conference on Computer Vision and Pattern Recognition, 2007
N. Loeff and A.Sorokin and H. Arora and D.A. Forsyth, ``Efficient Unsupervised Learning for Localization and Detection in Object Categories'', NIPS 2005
D.A. Forsyth; J. Malik; M. M. Fleck; H. Greenspan; T. K. Leung; S. Belongie; C. Carson; C. Bregler "Finding Pictures of Objects in Large Collections of Images" Proceedings of the International Workshop on Object Representation in Computer Vision II, LNCS; Vol. 1144, 335 - 360, 1996

Geometric methods for representing objects

D.A. Forsyth, ``Recognizing Algebraic Surfaces from their Outlines,'' International J. of Computer Vision , 18 , 1, 21-40, 1996.
Zisserman, A., Mundy, J.L., Forsyth, D.A., Liu, J.S., Pillow, N., Rothwell, C.A. and Utcke, S., ``Class-based grouping in perspective images'', Proceedings of the 5'th International Conference on Computer Vision , Boston, MA., June 20-23, 1995
Rothwell, C.A., Zisserman, A.., Forsyth, D.A. and Mundy, J.L. `` Planar Object Recognition using Projective Shape Representation'', International J. of Computer Vision , Vol 16, p. 57-59, 1995.
Zisserman, A., Forsyth, D.A., Mundy, J.L., Rothwell, C.A., and Liu, J.S., ``3D Object Recognition using Invariance,'' Artificial Intelligence , 78 , 239-288, 1995.
Forsyth, D.A., Mundy, J.L., Rothwell, C.A. and Zisserman, A., ``Using global consistency to recognise Euclidean objects with an uncalibrated camera'', Proc CVPR-94 502-507, 1994
Rothwell, C.A., Forsyth, D.A., Zisserman, A. and Mundy, J.L., ``Extracting projective structure from single perspective views of 3D point sets,'' International Conference on Computer Vision , Berlin, 573-582, 1993. Best paper award (Marr prize)
D.A. Forsyth, ``Recognizing Algebraic Surfaces from their Outlines,'' International Conference on Computer Vision , Berlin, 476-480, 1993.
J. Liu, J.L. Mundy, D.A. Forsyth, A.P. Zisserman and C.A. Rothwell, ``Efficient Recognition of rotationally symmetric surfaces and straight homogenous generalized cylinders,'' IEEE conference on Computer Vision and Pattern Recognition '93 , 1993.
Forsyth, D.A. and Rothwell, C.A., ``Recognising extruded surfaces from their outlines'', Proc. Allerton Conference on communication, computing and control , 1993.
Forsyth, D.A., Mundy, J.L., Zisserman, A.P. ``Transformational Invariance- a primer,'' Image and Vision Computing , 10 , 1, 39-45, 1992.
Rothwell, C.A., A. Zisserman, D.A. Forsyth and J.L. Mundy, ``Canonical Frames for Planar Object Recognition,'' 2nd European Conference on Computer Vision , Springer Verlag Lecture Notes in Computer Science, 757-772, 1992.
Rothwell, C.A., A. Zisserman, J.L. Mundy and D.A. Forsyth, ``Efficient Model Library Access by Projectively Invariant Indexing Functions,'' Computer Vision and Pattern Recognition 92 , 109-114, 1992.
Forsyth, D.A., Mundy, J.L., Zisserman, A.P. and Rothwell, C.A. ``Recognising rotationally symmetric surfaces from their outlines,'' 2nd European Conference on Computer Vision , 639-648, 1992.
Forsyth, D.A., Mundy, J.L., Zisserman, A.P.. and Brown, C.M., ``Projectively Invariant Representations Using Implicit Algebraic Curves,'' 9 , 2, 130-136, 1991.
Forsyth, D.A., Mundy, J.L., Zisserman, A.P., Coelho, C., Heller, A. and Rothwell, C.A. ``Invariant descriptors for 3D object recognition and pose,'' Special issue of I.E.E.E. Pattern Analysis and Machine Intelligence on 3D object recognition , 971-991, October, 1991.
Rothwell, C.A., Zisserman, A.P., Forsyth, D.A. and Mundy, J.L., ``Using Projective Invariants for constant time library indexing in model based vision,'' Proc. British Machine Vision Conference , 1991.
Forsyth, D.A., Mundy, J.L., Zisserman, A.P. and Brown, C.M., ``Projectively Invariant Representations Using Implicit Algebraic Curves,'' 1'st European Conference on Machine Vision , 427-436, 1990. (also OU Internal report no: 1829/90).
Forsyth, D.A., Mundy, J.L., Zisserman, A.P. and Brown, C.M., ``Invariance-a new framework for vision,'' 3rd International Conference on Computer Vision , 598-605, 1990. (also OU Internal report no: 1830/90)
Forsyth, D.A., Mundy, J.L., Zisserman, A.P. and Brown, C.M., ``Transformational Invariance- a primer,'' British Machine Vision Association Conference , 1-6, 1990
Zisserman, A.P., Marinos, C.I., Forsyth, D.A., Mundy, J.L. and Rothwell, C.A. ``Relative motion and pose from invariants,'' British Machine Vision Association Conference , 7-12, 1990.