Jakob Verbeek of Lear Team at INRIA, Grenoble, France will present an object detection system based on the powerful Fisher vector (FV) image representation in combination with spatial pyramids computed over SIFT descriptors.
To alleviate the memory requirements of the high dimensional FV representation, we exploit a recent segmentation-based method to generate class-independent object detection hypotheses, in combination with data compression techniques. Our main contribution, however, is a method to produce tentative object segmentation masks to suppress background clutter. Re-weighting the local image features based on these masks is shown to improve object detection significantly. To further improve the detector performance, we additionally compute these representations over local color descriptors, and include a contextual feature in the form of a full-image FV descriptor.
In the experimental evaluation based on the VOC 2007 and 2010 datasets, we observe excellent detection performance for our method. It performs better or comparable to many recent state-of-the-art detectors, including ones that use more sophisticated forms of inter-class contextual cues.
Additionally, including a basic form of inter-category context leads, to the best of our knowledge, to the best detection results reported to date on these datasets.
This work will be published in a forthcoming ICCV 2013 paper.