Benchmark Portal

‘Indian Language Text Recognition’ is a benchmarking portal for recognizing Indian language text in printed documents, handwritten documents, scene images, videos, etc.,framed by the Centre for Visual Information Technology (CVIT), IIIT Hyderabad. The aim of this project is to bring together researchers working in this field, and provide them with a suitable platform to evaluate and compare the performance of their methods on standard datasets.

As a part of initiative, we focus on the following aspects:

Provide standardised test datasets
Enable evaluation and comparison of different methods
Plan to run challenges, host workshops etc.
Provide resources for basic domain knowledge

We are considering five different types of documents images (1) printed document images (2) handwritten document images (3) scene text images (4) text in videos and (5) miscellaneous. Each of these document images has their own characteristic which makes difficulties to analysis those documents. Here we discuss all these types of documents and various tasks in those documents in our benchmarking system.

Printed Documents

There are machine printed documents. Due to large volumes of documents of unconstrained font types and image quality, it very difficult to process printed documents in application of digital library.

Indian language text printed documents are included in this portal. Various types of printed documents based on their modalities such as : (1) Scanned documents (2) scanned historical documents (3) scanned newspaper (4) mobile scanned documents (5) mobile scanned historical documents (6) mobile scanned newspaper (7) pdf-to-image documents (8) pdf-to-image book are considered in this benchmarking system. Various task like page segmentation or page recognition, line recognition, word recognition and character recognition in the printed documents are currently included in this portal.

Page Recognition: It recognizes all texts present in a page or a printed document image.

Line Recognition: The objective of the task is to recognize each line of text present in a page of a printed document image.

Word Recognition: It recognizes each individual word present in a document.

Character Recognition: This task recognizes each character in a document.

Handwritten Documents

Analysis of handwritten documents is very difficult due to variation in handwritten style, font, ink etc. It is important for digitized historical handwritten documents, verification of author’s authentication, etc.

Handwritten documents containing Indian language text are taken into consideration in this portal. Various types of modalities like: (1) scanned handwritten documents (offline) (2) mobile scanned handwritten documents (offline) (3) scanned handwritten historical documents (offline) (4) mobile scanned handwritten historical documents (offline) (5) online handwritten documents are considered. Different task such as page segmentation or page recognition, line recognition, word recognition and character recognition are currently included in this portal.

Page Recognition: It recognizes all texts present in a page or a printed document image.

Line Recognition: The objective of the task is to recognize each line of text present in a page of a printed document image.

Word Recognition: It recognizes each individual word present in a document.

Character Recognition: This task recognizes each character in a document.

Scene Text

Text in natural scene images usually contains a lot of semantic value such as product names, shop names, location names and traffic signs. Recognizing these texts is an important step for understanding those images. It has many content based image and video applications including content based web image search, video information retrieval and mobile based text analysis and recognition. Unlike printed documents, recognition of text in natural scene images are more difficult due to various complex factors like: the large variations in backgrounds, textures, fonts and illumination conditions; and occlusion.

Here we mainly consider natural scene images containing Indian language texts. Two different types of modalities like (1) natural scene images captured by camera and (2) natural scene images captured by mobile are taken into consideration. Three different tasks like text localization, word recognition and end-to-end word recognition are currently considered.

Text Localization: The objective of this task is to detect text locations in the image in the form of bounding box that corresponds to words.

Word Recognition: The locations (bounding boxes) of words in the image are assumed to be known.
Word recognition consists of identifying the characters and recognizing these characters as a word from a cropped image patch.

End-to-End Word Recognition: Its objective is to localize and recognize all the words in the image in a single step.

Text in Video

Nowadays the size of the available digital video content is rapidly increasing. This fact leads to need for fast and effective algorithms for information retrieval from videos. Textual information in videos constitutes a very rich source of high level semantics for retrieval and indexing. For this purpose, detection of texts in videos are important task. Text detection and recognition in video is also a challenging computer vision problem. It has numerous real world applications including video indexing, assistive technology for the visually impaired, automatic localization for businesses, and robotic navigation. Due to motion, large variation in background, color, font, texture, illumination; and occlusion, it is very difficult to localise and recognise text in video sequences.

Here video sequences containing Indian language texts are considered. Eight different types of modalities like: (1) Captured Video (2) Newsfeed Video are taken into consideration. Two different tasks like text localization and end-to-end word recognition are currently considered.

Text Localization: The objective of this task is to localize all the words in the form of bounding boxes and track all the words over the video sequence.

End-to-End Word Recognition: Its objective is to localize and recognize all the words in video sequence.
This task requires that correctly recognized words are also correctly localized in every frame and tracked correctly over the video sequence.