Sequence Learning for OCR in Unsupervised Training Cases

  • Sequence learning describes the process of understanding the spatio-temporal relations in a sequence in order to classify it, label its elements or generate new sequences. Due to the prevalence of structured sequences in nature and everyday life, it has many practical applications including any language related processing task. One particular such task that has seen recent success using sequence learning techniques is the optical recognition of characters (OCR). State-of-the-art sequence learning solutions for OCR achieve high performance through supervised training, which requires large amounts of transcribed training data. On the other hand, few solutions have been proposed on how to apply sequence learning in the absence of such data, which is especially common for hard to transcribe historical documents. Rather than solving the unsupervised training problem, research has focused on creating efficient methods for collecting training data through smart annotation tools or generating synthetic training data. These solutions come with various limitations and do not solve all of the related problems. In this work, first the use of erroneous transcriptions for supervised sequence learning is introduced and it is described how this concept can be applied in unsupervised training scenarios by collecting or generating such transcriptions. The proposed OCR pipeline reduces the need of domain specific expertise to apply OCR, with the goal of making it more accessible. Furthermore, an approach for evaluating sequence learning OCR models in the absence of reference transcriptions is presented and its different properties compared to the standard method are discussed. In a second approach, unsupervised OCR is treated as an alignment problem between the latent features of the different language modalities. The outlined solution is to extract language properties from both the text and image domain through adversarial training and learn to align them by adding a cycle consistency constraint. The proposed approach has some strict limitations on the input data, but the results encourage future research into more widespread applications.
Metadaten
Verfasser*innenangaben:Martin Jenckel
URN:urn:nbn:de:hbz:386-kluedo-67470
DOI:https://doi.org/10.26204/KLUEDO/6747
Betreuer*in:Andreas DengelORCiD
Dokumentart:Dissertation
Sprache der Veröffentlichung:Englisch
Datum der Veröffentlichung (online):18.02.2022
Jahr der Erstveröffentlichung:2022
Veröffentlichende Institution:Technische Universität Kaiserslautern
Titel verleihende Institution:Technische Universität Kaiserslautern
Datum der Annahme der Abschlussarbeit:24.02.2021
Datum der Publikation (Server):22.02.2022
GND-Schlagwort:Künstliche Intelligenz; Optische Zeichenerkennung
Seitenzahl:154
Fachbereiche / Organisatorische Einheiten:Kaiserslautern - Fachbereich Informatik
CCS-Klassifikation (Informatik):I. Computing Methodologies
DDC-Sachgruppen:0 Allgemeines, Informatik, Informationswissenschaft / 004 Informatik
Lizenz (Deutsch):Creative Commons 4.0 - Namensnennung, nicht kommerziell, keine Bearbeitung (CC BY-NC-ND 4.0)