Algorithms of the Tiger and CuneiForm Optical Character Recognition Software

O. A. Slavina,b,* and V. L. Arlazarova,b,**

a Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, Moscow, 119333 Russian Federation

b LLC Smart Engines Service, Moscow, 119333 Russian Federation

Correspondence to: * e-mail: OSlavin@isa.ru
Correspondence to: ** e-mail: Vladimir.Arlazarov@gmail.com

Received 25 August, 2022

Abstract—In this paper, the optical character recognition (OCR) software is considered. The algorithmic solutions of two of the world’s best OCR systems, Tiger and CuneiForm, are described in detail. Together with the problems of the geometric identification of characters, the algorithms of image binarization and page fragmentation, combining different recognition, syntactic control, and postrecognition methods, are discussed. The algorithms described in this paper were implemented by a team of mathematicians and software engineers under the supervision of V.L. Arlazarov. This paper contains the references to the published papers of Tiger and CuneiForm OCR software developers, where the described methods and algorithms are described in more detail.

Keywords: image processing, scientific school

DOI: 10.1134/S1054661823040442