This project investigates a set of sub-problems related to recognition and retrieval of degraded and challenging
document images in Indian languages. Traditionally the problem of recognition is called OCR. However, OCRs are
reliable only when the document is printed and reasonably clean. Many practically important documents in Indian context (such as massive collection of manuscripts available in courts, historical newspaper articles, handwritten notes of freedom fighters) have variable inprint style, are affected by ageing related noise and varying scan settings.
We focus on the content aware image processing algorithms for robust and efficient recognition and retrieval from Indian language document images. Our image processing algorithms aim at improving the quality of document images by removing the noise and low resolution artifacts by adopting content aware operations. We also work on developing recognizers using state of the art machine learning techniques such as deep learning for handwritten Indian language text. In this project, we specifically work on
Some of the results and publications for this project have been added here.