Optimization and Generative Models for Face Analysis

  • Human analysis, with an emphasis on the head and face, has been an important subject of study in a wide range of scientific fields. The facial expressions, eye gaze, gestures and head pose provide non-verbal cues about the physical and mental state of individuals, their emotions, consciousness and attentiveness. In computer vision, these cues have been leveraged to study face images, where fields such as emotion recognition, face identification and gaze estimation have emerged as cornerstones in human modeling and understanding. Research on face analysis has enabled the development of assistance tools, for instance, to identify the fatigue and inattention of a vehicle driver, detect the level of asymmetry in patients with facial paralysis or recognize the emotions of children with autism. Such tools demand a high level of robustness in a wide range of situations, such as varying illumination, large head rotations and partial occlusion, in addition to being able to operate with low computational costs and with real-time capabilities. This thesis focuses on the analysis of the rigid and non-rigid motions of the head, with the aim of targeting existing challenges in assistance tools and assistive technologies. The proposed approaches are based on intensity and RGB images, and cover the estimation of the head pose and the detection and tracking of facial landmarks. Additionally, facial expression transfer for avatar animation and the generation of sign language images are explored, where the face and upper body play a crucial part in conveying information. The main contributions on this thesis are grouped in four categories: (i) the study of head pose estimation from a monocular system, to obtain a robust estimate of the rigid head pose in real time. To that end, multiple architectures are proposed, where the head pose is formulated as an optimization problem based on the 3D-2D correspondences of facial features; (ii) the detection and tracking of fiducial facial landmarks, to model the non-rigid facial motion. The sparse set of facial landmarks are computed using a deep-learning-based generative pipeline, extending the detection to a wide range of facial expressions, including faces with substantial asymmetry. Tracking of 3D landmarks is also investigated, where a method to exploit the landmark connectivity and confidence score is introduced; (iii) the development of a real-time framework for performance-driven facial animation using a monocular system. This method aims to transfer the rigid head pose and facial expression from 2D video sequences to a 3D head model with limited resources; and (iv) a rendering pipeline to extend current RGB-based sign language datasets, to include multiple types of annotated data such as segmentation masks, normal and depth maps, and 3D-2D body joints. A sign language image synthesis architecture based on generative models is also investigated, conditioned on pose and appearance. Our architecture is trained and evaluated on synthetic and real data. The proposed methods achieve outstanding performance compared to related work on publicly available benchmarks, as well as in datasets introduced in this thesis. Furthermore, the presented approaches for head pose estimation and performance-driven facial animation perform in real time on CPU, while the proposed pipelines for face alignment and sign language image synthesis achieve real-time capabilities on GPU.

Download full text files

Export metadata

Metadaten
Author:Jilliam Maria Diaz BarrosORCiD
URN:urn:nbn:de:hbz:386-kluedo-87519
DOI:https://doi.org/10.26204/KLUEDO/8751
Advisor:Didier Stricker
Document Type:Doctoral Thesis
Cumulative document:No
Language of publication:English
Date of Publication (online):2025/02/25
Year of first Publication:2025
Publishing Institution:Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau
Granting Institution:Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau
Acceptance Date of the Thesis:2024/12/19
Date of the Publication (Server):2025/02/26
Tag:Face analysis; facial animation; facial landmark detection; head pose estimation; sign language production
Page Number:XI, 178
Faculties / Organisational entities:Kaiserslautern - Fachbereich Informatik
CCS-Classification (computer science):J. Computer Applications
DDC-Cassification:0 Allgemeines, Informatik, Informationswissenschaft / 004 Informatik
Licence (German):Creative Commons 4.0 - Namensnennung, nicht kommerziell, keine Bearbeitung (CC BY-NC-ND 4.0)