Towards Reliable Computer Vision Feature Extraction by Novel Autoencoder Methods

  • The generally unsupervised nature of autoencoder models implies that the main training metric is formulated as the error between input images and their corresponding reconstructions. Different reconstruction loss variations and latent space regularization have been shown to improve model performances depending on the tasks to solve and to induce new desirable properties like disentanglement. Nevertheless, measuring the success in, or enforcing properties by, the input pixel space is a challenging endeavor. In this work, we want to make more efficient use of the available data and provide design choices to be considered in the recording or generation of future datasets to implicitly induce desirable properties during training. To this end, we propose a new sampling technique which matches semantically important parts of the image while randomizing the other parts, leading to salient feature extraction and a neglection of unimportant details. Further, we propose to recursively apply a previously trained autoencoder model, which can then be interpreted as a dynamical system with desirable properties for generalization and uncertainty estimation. The proposed methods can be combined with any existing reconstruction loss. We give a detailed analysis of the resulting properties on various datasets and show improvements on several computer vision tasks: image and illumination normalization, invariances, synthetic to real generalization, uncertainty estimation and improved classification accuracy by means of simple classifiers in the latent space. These investigations are adopted in the automotive application of vehicle interior rear seat occupant classification. For the latter, we release a synthetic dataset with several fine-grained extensions such that all the aforementioned topics can be investigated in isolation, or together, in a single application environment. We provide quantitative evidence that machine learning, and in particular deep learning methods cannot readily be used in industrial applications when only a limited amount of variation is available for training. The latter can, however, often be the case because of constraints enforced by the application to be considered and financial limitations.

Volltext Dateien herunterladen

Metadaten exportieren

Metadaten
Verfasser*innenangaben:Steve Dias da CruzORCiD
URN:urn:nbn:de:hbz:386-kluedo-71182
DOI:https://doi.org/10.26204/KLUEDO/7118
Betreuer*in:Didier Stricker, Marius Kloft
Dokumentart:Dissertation
Sprache der Veröffentlichung:Englisch
Datum der Veröffentlichung (online):21.01.2023
Jahr der Erstveröffentlichung:2023
Veröffentlichende Institution:Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau
Titel verleihende Institution:Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau
Datum der Annahme der Abschlussarbeit:21.12.2022
Datum der Publikation (Server):23.01.2023
Freies Schlagwort / Tag:autoencoder; automotive; computer vision; deep learning
Seitenzahl:XXII, 230
Fachbereiche / Organisatorische Einheiten:Kaiserslautern - Fachbereich Informatik
CCS-Klassifikation (Informatik):J. Computer Applications
DDC-Sachgruppen:0 Allgemeines, Informatik, Informationswissenschaft / 004 Informatik
Lizenz (Deutsch):Creative Commons 4.0 - Namensnennung (CC BY 4.0)