Sequence Learning for OCR in Unsupervised Training Cases

  • Sequence learning describes the process of understanding the spatio-temporal relations in a sequence in order to classify it, label its elements or generate new sequences. Due to the prevalence of structured sequences in nature and everyday life, it has many practical applications including any language related processing task. One particular such task that has seen recent success using sequence learning techniques is the optical recognition of characters (OCR). State-of-the-art sequence learning solutions for OCR achieve high performance through supervised training, which requires large amounts of transcribed training data. On the other hand, few solutions have been proposed on how to apply sequence learning in the absence of such data, which is especially common for hard to transcribe historical documents. Rather than solving the unsupervised training problem, research has focused on creating efficient methods for collecting training data through smart annotation tools or generating synthetic training data. These solutions come with various limitations and do not solve all of the related problems. In this work, first the use of erroneous transcriptions for supervised sequence learning is introduced and it is described how this concept can be applied in unsupervised training scenarios by collecting or generating such transcriptions. The proposed OCR pipeline reduces the need of domain specific expertise to apply OCR, with the goal of making it more accessible. Furthermore, an approach for evaluating sequence learning OCR models in the absence of reference transcriptions is presented and its different properties compared to the standard method are discussed. In a second approach, unsupervised OCR is treated as an alignment problem between the latent features of the different language modalities. The outlined solution is to extract language properties from both the text and image domain through adversarial training and learn to align them by adding a cycle consistency constraint. The proposed approach has some strict limitations on the input data, but the results encourage future research into more widespread applications.
Author:Martin Jenckel
Advisor:Andreas DengelORCiD
Document Type:Doctoral Thesis
Language of publication:English
Publication Date:2022/02/18
Year of Publication:2022
Publishing Institute:Technische Universität Kaiserslautern
Granting Institute:Technische Universität Kaiserslautern
Acceptance Date of the Thesis:2021/02/24
Date of the Publication (Server):2022/02/22
GND-Keyword:Künstliche Intelligenz; Optische Zeichenerkennung
Number of page:154
Faculties / Organisational entities:Kaiserslautern - Fachbereich Informatik
CCS-Classification (computer science):I. Computing Methodologies
DDC-Cassification:0 Allgemeines, Informatik, Informationswissenschaft / 004 Informatik
Licence (German):Creative Commons 4.0 - Namensnennung, nicht kommerziell, keine Bearbeitung (CC BY-NC-ND 4.0)