Improving the Fisher Kernel for Large-Scale Image Classification

Published by Florent Perronnin at 5 September 2010

Florent Perronnin, Jorge Sanchez, Thomas Mensink

European Conference on Computer Vision (ECCV), Heraklion, Greece, 5-11 September, 2010

Abstract

The Fisher kernel (FK) is a generic framework which combines the benefits of generative and discriminative approaches. In the context of image classification the FK was shown to extend the popular bag-of-visual-words (BOV) by going beyond count statistics. However, in practice, this enriched representation has not yet shown its superiority over the BOV. In the first part we show that with several well-motivated modifications over the original framework we can boost the accuracy of the FK. On PASCAL VOC 2007 we increase the Average Precision (AP) from 47.9% to 58.3%. Similarly, we demonstrate state-of-the-art accuracy on CalTech 256. A major advantage is that these results are obtained using only SIFT descriptors and costless linear classifiers. Equipped with this representation, we can now explore image classification on a larger scale. In the second part, as an application, we compare two abundant resources of labeled images to learn classifiers: ImageNet and Flickr groups. In an evaluation involving hundreds of thousands of training images we show that classifiers learned on Flickr groups perform surprisingly well (although they were not intended for this purpose) and that they can complement classifiers learned on more carefully annotated datasets.

This paper won the Koenderink 10 year test of time award at ECCV 2020. The Koenderink Prize recognises fundamental contributions in computer vision. It is awarded each year at the European Conference on Computer Vision (one of the most prestigious conferences in the field) for a paper published ten years ago at that conference which has withstood the test of time.

Related Content

NAVER FRANCE Gender Equality 2024

All

Publications

Blog

News

Code & Data

Careers

People

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

NAVER FRANCE Gender Equality 2023

Action

Improving the Fisher Kernel for Large-Scale Image Classification

Related Content

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings