Monocular Visual Human Pose Parameter Estimation under Partial Observations
- Human pose estimation, a crucial step for understanding body postures and movements in visual data, faces challenges in monocular camera setups due to inherent limitations. These include the lack of depth information, susceptibility to occlusions, and sensitivity to viewpoint changes. Despite these challenges, advancements in computer vision, especially through deep learning techniques, continue to enhance pose estimation robustness. This thesis contributes to extending human pose parameter extraction in monocular cameras, addressing challenges related to varying camera viewpoints and partial body observation. It explores pose estimation in exo-centric and ego-centric views, offering approaches to handle self-occlusion and partial visibility. Initially, we focus on accurate estimation and synthesis of articulated human pose. Utilizing a data-driven approach, joint limits are learned from 3D motion capture datasets, facilitating the generation of synthetic datasets for training neural network discriminators. These discriminators serve as priors for human pose estimation and motion synthesis. A novel deep learning module for 3D joint angle regression is introduced, utilizing swing-twist representation. This approach, more generic than estimating 3D joint positions, allows for predicting twists around body segments, enhancing 3D human body shape estimation. The thesis further introduces DiveNet, a diving sports motion analysis framework, specifically designed for accurate localization and segmentation of complex diving motions in videos. It includes a diving action localization network and a dive pose regression model, achieving high accuracy in trajectory and body joint angle estimation. Furthermore, we address a car driving scenario with a static camera to monitor the driver behavior. A driver pose estimation dataset collected with common activities along with pose data. The dataset is generated in a driving simulator, encompasses diverse data types and annotations, contributing to improved accuracy in human pose estimation. A PointConvNet neural network consisting of local feature aggregation is trained on recorded dataset to demonstrate a robust 3D driver pose estimation. Lastly, the research introduces a framework for 3D egocentric pose estimation using a head mounted fisheye camera. We introduce a triple-branch network that integrates the Waterfall Atrous Spatial Pyramid module along with effective loss functions. This network is designed to facilitate simultaneous learning of joint rotations and confidence estimation, enhancing the accuracy of both 2D and 3D pose predictions. Overall, these contributions advance the field of human pose estimation by identifying and addressing challenges in monocular camera setups for diverse applications, and improving accuracy through innovative approaches and datasets.
Author: | Pramod Narasimha MurthyORCiD |
---|---|
URN: | urn:nbn:de:hbz:386-kluedo-87453 |
DOI: | https://doi.org/10.26204/KLUEDO/8745 |
Advisor: | Didier Stricker |
Document Type: | Doctoral Thesis |
Cumulative document: | No |
Language of publication: | English |
Date of Publication (online): | 2025/02/24 |
Year of first Publication: | 2025 |
Publishing Institution: | Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau |
Granting Institution: | Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau |
Acceptance Date of the Thesis: | 2025/01/24 |
Date of the Publication (Server): | 2025/02/25 |
Page Number: | XVII, 261 |
Faculties / Organisational entities: | Kaiserslautern - Fachbereich Informatik |
CCS-Classification (computer science): | J. Computer Applications |
DDC-Cassification: | 0 Allgemeines, Informatik, Informationswissenschaft / 004 Informatik |
Licence (German): |