Optical Character Recognition - A Combined ANN/HMM Approach

  • Optical character recognition (OCR) of machine printed text is ubiquitously considered as a solved problem. However, error free OCR of degraded (broken and merged) and noisy text is still challenging for modern OCR systems. OCR of degraded text with high accuracy is very important due to many applications in business, industry and large scale document digitization projects. This thesis presents a new OCR method for degraded text recognition by introducing a combined ANN/HMM OCR approach. The approach provides significantly better performance in comparison with state-of-the-art HMM based OCR methods and existing open source OCR systems. In addition, the thesis introduces novel applications of ANNs and HMMs for document image preprocessing and recognition of low resolution text. Furthermore, the thesis provides psychophysical experiments to determine the effect of letter permutation in visual word recognition of Latin and Cursive script languages. HMMs and ANNs are widely employed pattern recognition paradigms and have been used in numerous pattern classification problems. This work presents a simple and novel method for combining the HMMs and ANNs in application to segmentation free OCR of degraded text. HMMs and ANNs are powerful pattern recognition strategies and their combination is interesting to improve current state-of-the-art research in OCR. Mostly, previous attempts in combining the HMMs and ANNs were focused on applying ANNs as approximation of the probability density function or as a neural vector quantizer for HMMs. These methods either require combined NN/HMM training criteria [ECBG-MZM11] or they use complex neural network architecture like time delay or space displacement neural networks [BLNB95]. However, in this work neural networks are used as discriminative feature extractor, in combination with novel text line scanning mechanism, to extract discriminative features from unsegmented text lines. The features are processed by HMMs to provide segmentation free text line recognition. The ANN/HMM modules are trained separately on a common dataset by using standard machine learning procedures. The proposed ANN/HMM OCR system also realizes to some extent several cognitive reading based strategies during the OCR. On a dataset of 1,060 degraded text lines extracted from the widely used UNLV-ISRI benchmark database [TNBC99], the presented system achieves a 30% reduction in error rate as compared to Google’s Tesseract OCR system [Smi13] and 43% reduction in error as compared to OCRopus OCR system [Bre08], which are the best open source OCR systems available today. In addition, this thesis introduces new applications of HMMs and ANNs in OCR and document images preprocessing. First, an HMMs-based segmentation free OCR approach is presented for recognition of low resolution text. OCR of low resolution text is quite important due to presence of low resolution text in screen-shots, web images and video captions. OCR of low resolution text is challenging because of antialiased rendering and use of very small font size. The characters in low resolution text are usually joined to each other and they may appear differently at different locations on computer screen. This work presents the use of HMMs in optical recognition of low resolution isolated characters and text lines. The evaluation of the proposed method shows that HMMs-based OCR techniques works quite well and reaches the performance of specialized approaches for OCR of low resolution text. Then, this thesis presents novel applications of ANNs for automatic script recognition and orientation detection. Script recognition determines the written script on the page for the application of an appropriate character recognition algorithm. Orientation detection detects and corrects the deviation of the document’s orientation angle from the horizontal direction. Both, script recognition and orientation detection, are important preprocessing steps in developing robust OCR systems. In this work, instead of extracting handcrafted features, convolutional neural networks are used to extract relevant discriminative features for each classification task. The proposed method resulted in more than 95% script recognition accuracy on various multi-script documents at connected component level and 100% page orientation detection accuracy for Urdu documents. Human reading is a nearly analogous cognitive process to OCR that involves decoding of printed symbols into meanings. Studying the cognitive reading behavior may help in building a robust machine reading strategy. This thesis presents a behavioral study that deals on how cognitive system works in visual recognition of words and permuted non-words. The objective of this study is to determine the impact of overall word shape in visual word recognition process. The permutation is considered as a source of shape degradation and visual appearance of actual words can be distorted by changing the constituent letter positions inside the words. The study proposes a hypothesis that reading of words and permuted non-words are two distinct mental level processes, and people use different strategies in handling permuted non-words as compared to normal words. The hypothesis is tested by conducting psychophysical experiments in visual recognition of words from orthographically different languages i.e. Urdu, German and English. Experimental data is analyzed using analysis of variance (ANOVA) and distribution free rank tests to determine significance differences in response time latencies for two classes of data. The results support the presented hypothesis and the findings are consistent with the dual route theories of reading.

Download full text files

Export metadata

Author:Sheikh Faisal Rashid
Advisor:Thomas Breuel
Document Type:Doctoral Thesis
Language of publication:English
Date of Publication (online):2014/05/12
Year of first Publication:2014
Publishing Institution:Technische Universität Kaiserslautern
Granting Institution:Technische Universität Kaiserslautern
Acceptance Date of the Thesis:2014/11/07
Date of the Publication (Server):2014/12/05
GND Keyword:OCR
Page Number:154, 1-5
Faculties / Organisational entities:Kaiserslautern - Fachbereich Informatik
CCS-Classification (computer science):J. Computer Applications / J.0 GENERAL
DDC-Cassification:0 Allgemeines, Informatik, Informationswissenschaft / 004 Informatik
Licence (German):Standard gemäß KLUEDO-Leitlinien vom 28.10.2014