Back to the research area

Visual Saliency Prediction



ML-Net

Abstract

Current state of the art models for saliency prediction employ Fully Convolutional networks that perform a non-linear combination of features extracted from the last convolutional layer to predict saliency maps. We propose an architecture which, instead, combines features extracted at different levels of a Convolutional Neural Network (CNN). Our model is composed of three main blocks: a feature extraction CNN, a feature encoding network, that weights low and high level feature maps, and a prior learning network. We compare our solution with state of the art saliency models on two public benchmarks datasets. Results show that our model outperforms under all evaluation metrics on the SALICON dataset, which is currently the largest public dataset for saliency prediction, and achieves competitive results on the MIT300 benchmark.

ML-Net

A Deep Multi-Level Network for Saliency Prediction

M.Cornia, L.Baraldi, G.Serra, R.Cucchiara

ICPR 2016

Poster and Source Code:


 

SAM

Abstract

Data-driven saliency has recently gained a lot of attention thanks to the use of Convolutional Neural Networks for predicting gaze fixations. In this paper we go beyond standard approaches to saliency prediction, in which gaze maps are computed with a feed-forward network, and we present a novel model which can predict accurate saliency maps by incorporating neural attentive mechanisms. The core of our solution is a Convolutional LSTM that focuses on the most salient regions of the input image to iteratively refine the predicted saliency map. Additionally, to tackle the center bias present in human eye fixations, our model can learn a set of prior maps generated with Gaussian functions. We show, through an extensive evaluation, that the proposed architecture overcomes the current state of the art on two public saliency prediction datasets. We further study the contribution of each key components to demonstrate their robustness on different scenarios.

SAM

Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model

M.Cornia, L.Baraldi, G.Serra, R.Cucchiara

arXiv:1611.09571

Source Code:

  • GitHub Repository: SAM

 

Publications

1 Cornia, Marcella; Baraldi, Lorenzo; Serra, Giuseppe; Cucchiara, Rita "Visual Saliency for Image Captioning in New Multimedia Services" Proceedings of the 2017 IEEE International Conference on Multimedia and Expo Workshops, Hong Kong, July 10-14, 2017, 2017 Conference
2 Cornia, Marcella; Baraldi, Lorenzo; Serra, Giuseppe; Cucchiara, Rita "A Deep Multi-Level Network for Saliency Prediction" Proceedings of the 23rd International Conference on Pattern Recognition, Cancun, Mexico, 4-8 Dec 2016, 2016 Conference
3 Cornia, Marcella; Baraldi, Lorenzo; Serra, Giuseppe; Cucchiara, Rita "Multi-Level Net: a Visual Saliency Prediction Model" Computer Vision ECCV 2016 Workshops, vol. 9914, Amsterdam, The Netherlands, pp. 302 -315 , October 9th, 2016, 2016
DOI: 10.1007/978-3-319-48881-3_21 Conference