Face Verification with Depth Images
The computer vision community has broadly addressed the face recognition problem in both the RGB and the depth domain.
Traditionally, this problem is categorized into two tasks:
- Face Identification: comparison of an unknown subject’s face with a set of faces (one-to-many)
- Face Verification: comparison of two faces in order to determine whether they belong to the same person or not (one-to-one).
The majority of existing face recognition algorithms is based on the processing of RGB images, while only a minority of methods investigates the use of other image types, like depth maps or thermal images. Recent works employ very deep convolutional networks for the embedding of face images in a d-dimensional hyperspace. Unfortunately, these very deep architectures used for face recognition tasks typically rely upon very large scale datasets which only contain RGB or intensity images, such as Labeled Faces in the Wild (LFW), YouTube Faces Database (YTF) and MS-Celeb-1M.
The main goal of this work is to present a framework, namely JanusNet, that tackles the face verification task analysing depth images only.
We propose the use of shallow deep architectures is investigated in order to obtain real-time performance and to deal with the small scale of the existing depth-based face datasets, like. In fact, despite the recent introduction of deep-learning oriented depth-based datasets and cheap commercial depth sensors, the usual size of depth datasets is not big enough to train very deep neural models.
Furthermore, we aim to directly detect the identity of a person without strong a-priori hypotheses, like facial landmark or nose tip localization, which could compromise the whole following pipeline.
We want to exploit Privileged Information to boost the face verification accuracy:
- Training phase: the hybrid Siamese network is conditioned by a specific loss that forces its feature maps to mimic the mid-level features maps of the RGB network;
- Testing phase: the RGB network is not employed, while the depth and the hybrid Siamese network are fed with the same pair of depth images and jointly predict if they belong to the same person.
The Siamese networks are meant to predict whether two images belong to the same person or not.
During the training phase, the hybrid Siamese network is conditioned by a specific loss that forces its feature maps to mimic the mid-level features maps of the RGB network.
At testing time, the RGB network is not employed, while the depth and the hybrid Siamese network are fed with the same pair of depth images and jointly predict if they belong to the same person.
|1||Borghi, Guido; Pini, Stefano; Grazioli, Filippo; Vezzani, Roberto; Cucchiara, Rita "Face Verification from Depth using Privileged Information" Proceedings of the 29th British Machine Vision Conference (BMVC), Newcastle, 3-6 September 2018, 2018 Conference|