Structural Information Extraction from Document Images: Addressing Challenges in Layout Analysis, Table Detection, and Classification

Minouei, Mohammad

doi:10.26204/KLUEDO/9510

Paper documents remain a vital part of our daily lives, and the need for automated systems to analyze and extract valuable information from these documents is in- creasingly important. Recent advancements in artificial intelligence have raised user expectations for the extraction of structural information from document images, going beyond the traditional goal of extracting raw text from documents. Typically, document understanding systems comprise multiple components, including layout analysis, table detection, and document classification, each of which presents unique challenges. These challenges include handling complex and varied layouts, address- ing the issue of imbalanced datasets, and developing systems that can adapt and learn over time. Layout analysis is a critical component of document understanding, as it involves organizing and structuring the various elements of a document, such as text, tables, and figures. Accurate table recognition is also essential, as it enables the effective extraction and interpretation of structured data. This research enhances document analysis by increasing accuracy, robustness, and efficiency, which addresses current shortcomings in structural information extraction from documents through novel datasets, model architectures, and learning strategies. The dissertation presents multiple contributions to the field of document under- standing. Initially, we developed a CNN-based method for layout analysis, achieving a 3 percent enhancement over baseline techniques on PubLayNet. Secondly, we introduced a continual learning strategy employing experience-replay techniques, which reduced catastrophic forgetting in table detection by 15 percent. Third, we presented a novel dataset and developed an asymmetric convolution-based neural network, improving table ruling line recognition. To mitigate class imbalance in document classification, we integrated visual and textual features with a customized loss function, resulting in a 13 percent increase in accuracy. The utilization of Large Language Models (LLMs) for document comprehension was also studied. A technique for fine-tuning large language models by structuring input as HTML was created, yielding results on par with state-of-the-art methods while requiring less computational power. And a three-phase prompt engineering strategy for zero-shot information extraction was empirically evaluated, yielding promising outcomes.

Author:	Mohammad Minouei
URN:	urn:nbn:de:hbz:386-kluedo-95106
DOI:	https://doi.org/10.26204/KLUEDO/9510
Advisor:	Didier Stricker
Document Type:	Doctoral Thesis
Cumulative document:	No
Language of publication:	English
Date of Publication (online):	2026/01/23
Year of first Publication:	2026
Publishing Institution:	Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau
Granting Institution:	Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau
Acceptance Date of the Thesis:	2026/01/16
Date of the Publication (Server):	2026/01/26
Page Number:	IX, 132
Faculties / Organisational entities:	Kaiserslautern - Fachbereich Informatik
DDC-Cassification:	0 Allgemeines, Informatik, Informationswissenschaft / 004 Informatik
Licence (German):	Creative Commons 4.0 - Namensnennung, nicht kommerziell, keine Bearbeitung (CC BY-NC-ND 4.0)

Structural Information Extraction from Document Images: Addressing Challenges in Layout Analysis, Table Detection, and Classification

Download full text files

Export metadata

Additional Services