This work concerned with visual attention prediction, specifically predicting eye-fixation maps in still images. <br>Visual attention has been traditionally used in computer vision domains as a pre-processing step in order to focus later processing on regions of interest in images.
End-to-End Saliency Mapping via Probability Distribution Prediction
This stage is evermore important as vision models and datasets increase in size. We proposal an end-to-end model which, given an input image, outputs a topographic saliency map. The map is formulated as a generalized Bernoulli distribution, and the model is trained by optimizing a loss function suitable for measuring distances between distributions.
Most saliency estimation methods aim to explicitly model low-level conspicuity cues such as edges or blobs and may additionally incorporate top-down cues using face or text detection. Data-driven methods for training saliency models using eye-fixation data are increasingly popular, particularly with the introduction of large-scale datasets and deep architectures. However, current methods in this latter paradigm use loss functions designed for classification or regression tasks whereas saliency estimation is evaluated on topographical maps. In this work, we introduce a new saliency map model which formulates a map as a generalized Bernoulli distribution. We then train a deep architecture to predict such maps using novel loss functions which pair the softmax activation function with measures designed to compute distances between probability distributions. We show in extensive experiments the effectiveness of such loss functions over standard ones on four public benchmark datasets, and demonstrate improved performance over state-of-the-art saliency methods.