Fisher Vectors for Fine-Grained Visual Categorization

Published by NAVER LABS Europe at 7 April 2013

Florent Perronnin, Jorge Sanchez, Zeynep Akata

CVPR, Colorado Springs, June 20,24,25, 2011.

The bag-of-visual-words (BOW) is certainly the most popular image representation to date and it has been shown to yield good results in various problems including Fine- Grained Visual Categorization (FGVC) [3, 4]. Our contribution is to show that the Fisher Vector (FV) – which describes an image by its deviation from an average model – is an alternative which performs much better than the BOW for the FGVC problem. In this extended abstract we first provide a brief introduction to the FV. We then present theoretical as well as practical motivations for using the FV for FGVC. We finally provide experimental results on four ImageNet subsets: fungus, ungulate, vehicle and ImageNet10K.
Compared to [4] which uses spatial pyramid (SP) BOW representations, we report significantly higher classification accuracies. For instance, on ImageNet10K we report 16.7% vs 6.4% top-1 accuracy which represents a 160%relative improvement.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

NAVER FRANCE Gender Equality 2026

All

Publications

Blog

News

Code & Data

Careers

People

Fisher Vectors for Fine-Grained Visual Categorization

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings