We consider an isolated system in an arbitrary state and provide a general formulation using first principles for an additive and non-negative statistical quantity that is shown to reproduce the equilibrium thermodynamic entropy of the isolated system. We further show that the statistical quantity represents the nonequilibrium thermodynamic entropy when the latter is a state function of nonequilibrium state variables; see text. We consider an isolated 1-d ideal gas and determine its non-equilibrium statistical entropy as a function of the box size as the gas expands freely isoenergetically, and compare it with the equilibrium thermodynamic entropy S_{0eq}. We find that the statistical entropy is less than S_{0eq} in accordance with the second law, as expected. To understand how the statistical entropy is different from thermodynamic entropy of classical continuum models that is known to become negative under certain conditions, we calculate it for a 1-d lattice model and discover that it can be related to the thermodynamic entropy of the continuum 1-d Tonks gas by taking the lattice spacing {\delta} go to zero, but only if the latter is state-independent. We discuss the semi-classical approximation of our entropy and show that the standard quantity S_{f}(t) in the Boltzmann's H-theorem does not directly correspond to the statistical entropy.
Machine Translation for Indian languages is an emerging research area. Transliteration is one such module that we design while designing a translation system. Transliteration means mapping of source language text into the target language. Simple mapping decreases the efficiency of overall translation system. We propose the use of stemming and part-of-speech tagging for transliteration. The effectiveness of translation can be improved if we use part-of-speech tagging and stemming assisted transliteration.We have shown that much of the content in Gujarati gets transliterated while being processed for translation to Hindi language.
The Braille system has been used by the visually impaired for reading and writing. Due to limited availability of the Braille text books an efficient usage of the books becomes a necessity. This paper proposes a method to convert a scanned Braille document to text which can be read out to many through the computer. The Braille documents are pre processed to enhance the dots and reduce the noise. The Braille cells are segmented and the dots from each cell is extracted and converted in to a number sequence. These are mapped to the appropriate alphabets of the language. The converted text is spoken out through a speech synthesizer. The paper also provides a mechanism to type the Braille characters through the number pad of the keyboard. The typed Braille character is mapped to the alphabet and spoken out. The Braille cell has a standard representation but the mapping differs for each language. In this paper mapping of English, Hindi and Tamil are considered.
In this paper, we use statistical texture features for handwritten and printed text classification. We primarily aim for word level classification in south Indian scripts. Words are first extracted from the scanned document. For each extracted word, statistical texture features are computed such as mean, standard deviation, smoothness, moment, uniformity, entropy and local range including local entropy. These feature vectors are then used to classify words via k-NN classifier. We have validated the approach over several different datasets. Scripts like Kannada, Telugu, Malayalam and Hindi i.e., Devanagari are primarily employed where an average classification rate of 99.26% is achieved. In addition, to provide an extensibility of the approach, we address Roman script by using publicly available dataset and interesting results are reported.
The work presented here involves the design of a Multi Layer Perceptron (MLP) based classifier for recognition of handwritten Bangla alphabet using a 76 element feature set Bangla is the second most popular script and language in the Indian subcontinent and the fifth most popular language in the world. The feature set developed for representing handwritten characters of Bangla alphabet includes 24 shadow features, 16 centroid features and 36 longest-run features. Recognition performances of the MLP designed to work with this feature set are experimentally observed as 86.46% and 75.05% on the samples of the training and the test sets respectively. The work has useful application in the development of a complete OCR system for handwritten Bangla text.
Universal Networking Language (UNL) is a declarative formal language that is used to represent semantic data extracted from natural language texts. This paper presents a novel approach to converting Bangla natural language text into UNL using a method known as Predicate Preserving Parser (PPP) technique. PPP performs morphological, syntactic and semantic, and lexical analysis of text synchronously. This analysis produces a semantic-net like structure represented using UNL. We demonstrate how Bangla texts are analyzed following the PPP technique to produce UNL documents which can then be translated into any other suitable natural language facilitating the opportunity to develop a universal language translation method via UNL.
The status of heat and work in nonequilibrium thermodynamics is quite confusing and non-unique at present with conflicting interpretations even after a long history of the first law in terms of exchange heat and work, and is far from settled. Moreover, the exchange quantities lack certain symmetry. By generalizing the traditional concept to also include their time-dependent irreversible components allows us to express the first law in a symmetric form dE(t)= dQ(t)-dW(t) in which dQ(t) and work dW(t) appear on an equal footing and possess the symmetry. We prove that irreversible work turns into irreversible heat. Statistical analysis in terms of microstate probabilities p_{i}(t) uniquely identifies dW(t) as isentropic and dQ(t) as isometric (see text) change in dE(t); such a clear separation does not occur for exchange quantities. Hence, our new formulation of the first law provides tremendous advantages and results in an extremely useful formulation of non-equilibrium thermodynamics, as we have shown recently. We prove that an adiabatic process does not alter p_{i}. All these results remain valid no matter how far the system is out of equilibrium. When the system is in internal equilibrium, dQ(t)\equivT(t)dS(t) in terms of the instantaneous temperature T(t) of the system, which is reminiscent of equilibrium. We demonstrate that p_{i}(t) has a form very different from that in equilibrium. The first and second laws are no longer independent so that we need only one law, which is again reminiscent of equilibrium. The traditional formulas like the Clausius inequality {\oint}d_{e}Q(t)/T_{0}<0, etc. become equalities {\oint}dQ(t)/T(t)\equiv0, etc, a quite remarkable but unexpected result in view of irreversibility. We determine the irreversible components in two simple cases to show the usefulness of our approach; here, the traditional formulation is of no use.
Written Communication on Computers requires knowledge of writing text for the desired language using Computer. Mostly people do not use any other language besides English. This creates a barrier. To resolve this issue we have developed a scheme to input text in Hindi using phonetic mapping scheme. Using this scheme we generate intermediate code strings and match them with pronunciations of input text. Our system show significant success over other input systems available.
India is a multilingual multi-script country. In every state of India there are two languages one is state local language and the other is English. For example in Andhra Pradesh, a state in India, the document may contain text words in English and Telugu script. For Optical Character Recognition (OCR) of such a bilingual document, it is necessary to identify the script before feeding the text words to the OCRs of individual scripts. In this paper, we are introducing a simple and efficient technique of script identification for Kannada, English and Hindi text words of a printed document. The proposed approach is based on the horizontal and vertical projection profile for the discrimination of the three scripts. The feature extraction is done based on the horizontal projection profile of each text words. We analysed 700 different words of Kannada, English and Hindi in order to extract the discrimination features and for the development of knowledge base. We use the horizontal projection profile of each text word and based on the horizontal projection profile we extract the appropriate features. The proposed system is tested on 100 different document images containing more than 1000 text words of each script and a classification rate of 98.25%, 99.25% and 98.87% is achieved for Kannada, English and Hindi respectively.
Aiming at increasing system simplicity and flexibility, an audio evoked based system was developed by integrating simplified headphone and user-friendly software design. This paper describes a Hindi Speech Actuated Computer Interface for Web search (HSACIWS), which accepts spoken queries in Hindi language and provides the search result on the screen. This system recognizes spoken queries by large vocabulary continuous speech recognition (LVCSR), retrieves relevant document by text retrieval, and provides the search result on the Web by the integration of the Web and the voice systems. The LVCSR in this system showed enough performance levels for speech with acoustic and language models derived from a query corpus with target contents.