Benchmark Portal

Hindi to English Transfer Based Machine Translation System

Akanksha Gehlot, Vaishali Sharma, Shashi Pal Singh, Ajai Kumar

In large societies like India there is a huge demand to convert one human language into another. Lots of work has been done in this area. Many transfer based MTS have developed for English to other languages, as MANTRA CDAC Pune, MATRA CDAC Pune, SHAKTI IISc Bangalore and IIIT Hyderabad. Still there is a little work done for Hindi to other languages. Currently we are working on it. In this paper we focus on designing a system, that translate the document from Hindi to English by using transfer based approach. This system takes an input text check its structure through parsing. Reordering rules are used to generate the text in target language. It is better than Corpus Based MTS because Corpus Based MTS require large amount of word aligned data for translation that is not available for many languages while Transfer Based MTS requires only knowledge of both the languages(source language and target language) to make transfer rules. We get correct translation for simple assertive sentences and almost correct for complex and compound sentences.

Classifier-Based Text Simplification for Improved Machine Translation

Shruti Tyagi, Deepti Chopra, Iti Mathur, Nisheeth Joshi

Machine Translation is one of the research fields of Computational Linguistics. The objective of many MT Researchers is to develop an MT System that produce good quality and high accuracy output translations and which also covers maximum language pairs. As internet and Globalization is increasing day by day, we need a way that improves the quality of translation. For this reason, we have developed a Classifier based Text Simplification Model for English-Hindi Machine Translation Systems. We have used support vector machines and Na\"ive Bayes Classifier to develop this model. We have also evaluated the performance of these classifiers.

Handwritten Malayalam Character Recognition using Curvelet Transform and ANN

Manju Manuel, R SaidasS.

Malayalam, the official language of Kerala, a southern state of India has been accorded the honour of language of eminence. Hence the researches in recognition and related works in Malayalam language is gaining more prominence in the current scenario. This paper proposes the use of Curvelet transform and neural network for the recognition of handwritten Malayalam character. Curvelet transform is to be used in the feature extraction stage and neural network for classification. Curvelet transform provides a compact representation for curved singularities and is well suited for malayalam language. Two different back propagation algorithms had been employed and the performance is compared on varying architecture. The promising feature of the work is successful classification of 53 characters which is an improvement over the existing works. Application of character recognition include sorting of bank cheques and postal letters, reading aid for blind, data compression etc. Besides, an automated tool with graphical user interface in MATLAB has been developed for Malayalam character recognition. General Terms Pattern Recognition, Artificial Neural Network (ANN), Curvelet Transform, Optical character recognition (OCR),

Assamese-English Bilingual Machine Translation

Kalyanee Kanchan Baruah, Pranjal Das, Abdul Hannan, Shikhar Kr. Sarma

Machine translation is the process of translating text from one language to another. In this paper, Statistical Machine Translation is done on Assamese and English language by taking their respective parallel corpus. A statistical phrase based translation toolkit Moses is used here. To develop the language model and to align the words we used two another tools IRSTLM, GIZA respectively. BLEU score is used to check our translation system performance, how good it is. A difference in BLEU scores is obtained while translating sentences from Assamese to English and vice-versa. Since Indian languages are morphologically very rich hence translation is relatively harder from English to Assamese resulting in a low BLEU score. A statistical transliteration system is also introduced with our translation system to deal basically with proper nouns, OOV (out of vocabulary) words which are not present in our corpus.

Bangla Text Recognition from Video Sequence: A New Focus

Souvik Bhowmick, Purnendu Banerjee

Extraction and recognition of Bangla text from video frame images is challenging due to complex color background, low-resolution etc. In this paper, we propose an algorithm for extraction and recognition of Bangla text form such video frames with complex background. Here, a two-step approach has been proposed. First, the text line is segmented into words using information based on line contours. First order gradient value of the text blocks are used to find the word gap. Next, a local binarization technique is applied on each word and text line is reconstructed using those words. Secondly, this binarized text block is sent to OCR for recognition purpose.

Supervised learning Methods for Bangla Web Document Categorization

Ashis Kumar Mandal, Rikta Sen

This paper explores the use of machine learning approaches, or more specifically, four supervised learning Methods, namely Decision Tree(C 4.5), K-Nearest Neighbour (KNN), Na\"ive Bays (NB), and Support Vector Machine (SVM) for categorization of Bangla web documents. This is a task of automatically sorting a set of documents into categories from a predefined set. Whereas a wide range of methods have been applied to English text categorization, relatively few studies have been conducted on Bangla language text categorization. Hence, we attempt to analyze the efficiency of those four methods for categorization of Bangla documents. In order to validate, Bangla corpus from various websites has been developed and used as examples for the experiment. For Bangla, empirical results support that all four methods produce satisfactory performance with SVM attaining good result in terms of high dimensional and relatively noisy document feature vectors.

Polarity detection movie reviews in hindi language

Richa Sharma, Shweta Nigam, Rekha Jain

Nowadays peoples are actively involved in giving comments and reviews on social networking websites and other websites like shopping websites, news websites etc. large number of people everyday share their opinion on the web, results is a large number of user data is collected .users also find it trivial task to read all the reviews and then reached into the decision. It would be better if these reviews are classified into some category so that the user finds it easier to read. Opinion Mining or Sentiment Analysis is a natural language processing task that mines information from various text forms such as reviews, news, and blogs and classify them on the basis of their polarity as positive, negative or neutral. But, from the last few years, user content in Hindi language is also increasing at a rapid rate on the Web. So it is very important to perform opinion mining in Hindi language as well. In this paper a Hindi language opinion mining system is proposed. The system classifies the reviews as positive, negative and neutral for Hindi language. Negation is also handled in the proposed system. Experimental results using reviews of movies show the effectiveness of the system

Opinion Mining In Hindi Language: A Survey

Richa Sharma, Shweta Nigam, Rekha Jain

Opinions are very important in the life of human beings. These Opinions helped the humans to carry out the decisions. As the impact of the Web is increasing day by day, Web documents can be seen as a new source of opinion for human beings. Web contains a huge amount of information generated by the users through blogs, forum entries, and social networking websites and so on To analyze this large amount of information it is required to develop a method that automatically classifies the information available on the Web. This domain is called Sentiment Analysis and Opinion Mining. Opinion Mining or Sentiment Analysis is a natural language processing task that mine information from various text forms such as reviews, news, and blogs and classify them on the basis of their polarity as positive, negative or neutral. But, from the last few years, enormous increase has been seen in Hindi language on the Web. Research in opinion mining mostly carried out in English language but it is very important to perform the opinion mining in Hindi language also as large amount of information in Hindi is also available on the Web. This paper gives an overview of the work that has been done Hindi language.

Auto Spell Suggestion for High Quality Speech Synthesis in Hindi

Shikha Kabra, Ritika Agarwal

The goal of Text-to-Speech (TTS) synthesis in a particular language is to convert arbitrary input text to intelligible and natural sounding speech. However, for a particular language like Hindi, which is a highly confusing language (due to very close spellings), it is not an easy task to identify errors/mistakes in input text and an incorrect text degrade the quality of output speech hence this paper is a contribution to the development of high quality speech synthesis with the involvement of Spellchecker which generates spell suggestions for misspelled words automatically. Involvement of spellchecker would increase the efficiency of speech synthesis by providing spell suggestions for incorrect input text. Furthermore, we have provided the comparative study for evaluating the resultant effect on to phonetic text by adding spellchecker on to input text.

Some Rigorous Results Relating Nonequilibrium, Equilibrium, Calorimetrically Measured and Residual Entropies during Cooling

P. D. Gujrati

We use rigorous nonequilibrium thermodynamic arguments to establish that (i) the nonequilibrium entropy S(T_{0}) of any system is bounded below by the experimentally (calorimetrically) determined entropy S_{expt}(T_{0}), (ii) S_{expt}(T_{0}) is bounded below by the equilibrium or stationary state (such as the supercooled liquid) entropy S_{SCL}(T_{0}) and consequently (iii) S(T_{0}) cannot drop below S_{SCL}(T_{0}). It then follows that the residual entropy S_{R} is bounded below by the extrapolated S_{expt}(0)>S_{SCL}(0) at absolute zero. These results are very general and applicable to all nonequilibrium systems regardless of how far they are from their stationary states.