Back to ImageLab research fields

Research on Automotive

Learning to Generate Faces from RGB and Depth data



We investigate the Face Generation task, inspired by the Privileged Information approach, in which the main idea is to add knowledge at training time -- the generated faces -- in order to improve the performance of the presented systems at testing time.
Our main research questions are:

  • Is it possible to generate gray-level face images from the corresponding depth ones?
  • Is it possible to generate depth face maps from the corresponding gray-level ones?

Experimental results confirm the effectiveness of this research investigation.

Face Verification with Depth Images

face recognition suqare

The computer vision community has broadly addressed the face recognition problem in both the RGB and the depth domain.
Traditionally, this problem is categorized into two tasks:

  • Face Identification: comparison of an unknown subject’s face with a set of faces (one-to-many
  • Face Verification: comparison of two faces in order to determine whether they belong to the same person or not (one-to-one).

The majority of existing face recognition algorithms is based on the processing of RGB images, while only a minority of methods investigates the use of other image types, like depth maps or thermal images. Recent works employ very deep convolutional networks for the embedding of face images in a d-dimensional hyperspace. Unfortunately, these very deep architectures used for face recognition tasks typically rely upon very large scale datasets which only contain RGB or intensity images, such as Labeled Faces in the Wild (LFW), YouTube Faces Database (YTF) and MS-Celeb-1M.

The main goal of this work is to present a framework, namely JanusNet, that tackles the face verification task analysing depth images only

Hand Monitoring and Gesture Recognition for Human-Car Interaction

human-car interaction

Gesture-based human-computer interaction is a well-assessed field of application of computer vision algorithms. In particular, we are studying its exploitation in automotive applications. Our main goal is the development of a hand gesture-based interaction with car devices where the hands are kept on the steering wheel.

In order to detect and classify a gesture, it is mandatory to find the hand position into a given image.

Landmark Localization in Depth Images



A correct and reliable localization of facial landmarks enables several applications in many fields, ranging from Human Computer Interaction to video surveillance.
For instance, it can provide a valuable input to monitor the driver physical state and attention level in automotive context. In this paper, we tackle the problem of facial landmark localization through a deep approach. The developed system is fast and, in particular, is more reliable than state of the art competitors specially in presence of light changes and poor illumination, thanks to the use of depth input images. We also collected and shared a new realistic dataset inside a car, called MotorMark, to train and test the system. In addition, we exploited the public Eurecom Kinect Face Dataset for the evaluation phase, achieving promising results both in terms of accuracy and computational speed.

Driver Attention through Head Localization and Pose Estimation


Automatic recognition of the driver's attention level is a problem not yet solved in research.
This project investigates new non-invasive systems for real-time monitoring of the state of attention of drivers and aims at developing a low-cost multi-sensory system that can be installed on circulating vehicles. Computer vision and machine learning techniques as well as multi-physical technologies  will be explored.

Dr(eye)ve a Dataset for Attention-Based Tasks with Applications to Autonomous Driving


Autonomous and assisted driving are undoubtedly hot topics in computer vision. However, the driving task is extremely complex and a deep understanding of drivers’ behavior is still lacking. Several researchers are now investigating the attention mechanism in order to define computational models for detecting salient and interesting objects in the scene.

Nevertheless, most of these models only refer to bottom up visual saliency and are focused on still images. Instead, during the driving experience the temporal nature and peculiarity of the task influence the attention mechanisms, leading to the conclusion that real life driving.


Learning to Map Vehicles into Bird's Eye View


Awareness of the road scene is an essential component for both autonomous vehicles and Advances Driver Assistance Systems and its relevance is growing both in academic research fields and in car companies. 
This paper presents a way to learn a semantic-aware transformation which maps detections from a dashboard camera view onto a broader bird's eye occupancy map of the scene. To this end, a huge synthetic dataset featuring couples of frames taken from both dashboard and bird's eye view in driving scenarios is collected: more than 1 million examples are automatically annotated. A deep-network is then trained to warp detections from the first to the second view. We demonstrate the effectiveness of our model against several baselines and observe that is able to generalize on real-world data despite having been trained solely on synthetic data.

Novelty & Anomaly detection


Anomaly detection in videos refers to the identification of events that do not conform to expected behavior. Standard classification settings are infeasible for the goal since the nature of abnormal events is not known a priori for real-world applications. For such reason, we tackle the problem in an unsupervised learning setting. We are interested in the formalization and assessment of novel models capable of learning the distribution of normal events and situations, and deem as anomalous the ones which are less explicable in a probabilistic sense.