Optimization and Generative Models for Face Analysis
- Human analysis, with an emphasis on the head and face, has been an important subject of study in a wide range of scientific fields. The facial expressions, eye gaze, gestures and head pose provide non-verbal cues about the physical and mental state of individuals, their emotions, consciousness and attentiveness. In computer vision, these cues have been leveraged to study face images, where fields such as emotion recognition, face identification and gaze estimation have emerged as cornerstones in human modeling and understanding. Research on face analysis has enabled the development of assistance tools, for instance, to identify the fatigue and inattention of a vehicle driver, detect the level of asymmetry in patients with facial paralysis or recognize the emotions of children with autism. Such tools demand a high level of robustness in a wide range of situations, such as varying illumination, large head rotations and partial occlusion, in addition to being able to operate with low computational costs and with real-time capabilities. This thesis focuses on the analysis of the rigid and non-rigid motions of the head, with the aim of targeting existing challenges in assistance tools and assistive technologies. The proposed approaches are based on intensity and RGB images, and cover the estimation of the head pose and the detection and tracking of facial landmarks. Additionally, facial expression transfer for avatar animation and the generation of sign language images are explored, where the face and upper body play a crucial part in conveying information. The main contributions on this thesis are grouped in four categories: (i) the study of head pose estimation from a monocular system, to obtain a robust estimate of the rigid head pose in real time. To that end, multiple architectures are proposed, where the head pose is formulated as an optimization problem based on the 3D-2D correspondences of facial features; (ii) the detection and tracking of fiducial facial landmarks, to model the non-rigid facial motion. The sparse set of facial landmarks are computed using a deep-learning-based generative pipeline, extending the detection to a wide range of facial expressions, including faces with substantial asymmetry. Tracking of 3D landmarks is also investigated, where a method to exploit the landmark connectivity and confidence score is introduced; (iii) the development of a real-time framework for performance-driven facial animation using a monocular system. This method aims to transfer the rigid head pose and facial expression from 2D video sequences to a 3D head model with limited resources; and (iv) a rendering pipeline to extend current RGB-based sign language datasets, to include multiple types of annotated data such as segmentation masks, normal and depth maps, and 3D-2D body joints. A sign language image synthesis architecture based on generative models is also investigated, conditioned on pose and appearance. Our architecture is trained and evaluated on synthetic and real data. The proposed methods achieve outstanding performance compared to related work on publicly available benchmarks, as well as in datasets introduced in this thesis. Furthermore, the presented approaches for head pose estimation and performance-driven facial animation perform in real time on CPU, while the proposed pipelines for face alignment and sign language image synthesis achieve real-time capabilities on GPU.
Author: | Jilliam Maria Diaz BarrosORCiD |
---|---|
URN: | urn:nbn:de:hbz:386-kluedo-87519 |
DOI: | https://doi.org/10.26204/KLUEDO/8751 |
Advisor: | Didier Stricker |
Document Type: | Doctoral Thesis |
Cumulative document: | No |
Language of publication: | English |
Date of Publication (online): | 2025/02/25 |
Year of first Publication: | 2025 |
Publishing Institution: | Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau |
Granting Institution: | Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau |
Acceptance Date of the Thesis: | 2024/12/19 |
Date of the Publication (Server): | 2025/02/26 |
Tag: | Face analysis; facial animation; facial landmark detection; head pose estimation; sign language production |
Page Number: | XI, 178 |
Faculties / Organisational entities: | Kaiserslautern - Fachbereich Informatik |
CCS-Classification (computer science): | J. Computer Applications |
DDC-Cassification: | 0 Allgemeines, Informatik, Informationswissenschaft / 004 Informatik |
Licence (German): |