Monocular Visual Human Pose Parameter Estimation under Partial Observations

  • Human pose estimation, a crucial step for understanding body postures and movements in visual data, faces challenges in monocular camera setups due to inherent limitations. These include the lack of depth information, susceptibility to occlusions, and sensitivity to viewpoint changes. Despite these challenges, advancements in computer vision, especially through deep learning techniques, continue to enhance pose estimation robustness. This thesis contributes to extending human pose parameter extraction in monocular cameras, addressing challenges related to varying camera viewpoints and partial body observation. It explores pose estimation in exo-centric and ego-centric views, offering approaches to handle self-occlusion and partial visibility. Initially, we focus on accurate estimation and synthesis of articulated human pose. Utilizing a data-driven approach, joint limits are learned from 3D motion capture datasets, facilitating the generation of synthetic datasets for training neural network discriminators. These discriminators serve as priors for human pose estimation and motion synthesis. A novel deep learning module for 3D joint angle regression is introduced, utilizing swing-twist representation. This approach, more generic than estimating 3D joint positions, allows for predicting twists around body segments, enhancing 3D human body shape estimation. The thesis further introduces DiveNet, a diving sports motion analysis framework, specifically designed for accurate localization and segmentation of complex diving motions in videos. It includes a diving action localization network and a dive pose regression model, achieving high accuracy in trajectory and body joint angle estimation. Furthermore, we address a car driving scenario with a static camera to monitor the driver behavior. A driver pose estimation dataset collected with common activities along with pose data. The dataset is generated in a driving simulator, encompasses diverse data types and annotations, contributing to improved accuracy in human pose estimation. A PointConvNet neural network consisting of local feature aggregation is trained on recorded dataset to demonstrate a robust 3D driver pose estimation. Lastly, the research introduces a framework for 3D egocentric pose estimation using a head mounted fisheye camera. We introduce a triple-branch network that integrates the Waterfall Atrous Spatial Pyramid module along with effective loss functions. This network is designed to facilitate simultaneous learning of joint rotations and confidence estimation, enhancing the accuracy of both 2D and 3D pose predictions. Overall, these contributions advance the field of human pose estimation by identifying and addressing challenges in monocular camera setups for diverse applications, and improving accuracy through innovative approaches and datasets.

Download full text files

Export metadata

Metadaten
Author:Pramod Narasimha MurthyORCiD
URN:urn:nbn:de:hbz:386-kluedo-87453
DOI:https://doi.org/10.26204/KLUEDO/8745
Advisor:Didier Stricker
Document Type:Doctoral Thesis
Cumulative document:No
Language of publication:English
Date of Publication (online):2025/02/24
Year of first Publication:2025
Publishing Institution:Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau
Granting Institution:Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau
Acceptance Date of the Thesis:2025/01/24
Date of the Publication (Server):2025/02/25
Page Number:XVII, 261
Faculties / Organisational entities:Kaiserslautern - Fachbereich Informatik
CCS-Classification (computer science):J. Computer Applications
DDC-Cassification:0 Allgemeines, Informatik, Informationswissenschaft / 004 Informatik
Licence (German):Creative Commons 4.0 - Namensnennung, nicht kommerziell (CC BY-NC 4.0)