Indian Language Benchmark Portal

Sort:

Please Login/Register to submit the new Resources

A CRF Based POS Tagger for Code-mixed Indian Social Media Text
Kamal Sarkar

In this work, we describe a conditional random fields (CRF) based system for Part-Of- Speech (POS) tagging of code-mixed Indian social media text as part of our participation in the tool contest on POS tagging for codemixed Indian social media text, held in conjunction with the 2016 International Conference on Natural Language Processing, IIT(BHU), India. We participated only in constrained mode contest for all three language pairs, Bengali-English, Hindi-English and Telegu-English. Our system achieves the overall average F1 score of 79.99, which is the highest overall average F1 score among all 16 systems participated in constrained mode contest.

A POS Tagger for Code Mixed Indian Social Media Text - ICON-2016 NLP Tools Contest Entry from Surukam
Sree Harsha RameshRaveena R Kumar

Building Part-of-Speech (POS) taggers for code-mixed Indian languages is a particularly challenging problem in computational linguistics due to a dearth of accurately annotated training corpora. ICON, as part of its NLP tools contest has organized this challenge as a shared task for the second consecutive year to improve the state-of-the-art. This paper describes the POS tagger built at Surukam to predict the coarse-grained and fine-grained POS tags for three language pairs - Bengali-English, Telugu-English and Hindi-English, with the text spanning three popular social media platforms - Facebook, WhatsApp and Twitter. We employed Conditional Random Fields as the sequence tagging algorithm and used a library called sklearn-crfsuite - a thin wrapper around CRFsuite for training our model. Among the features we used include - character n-grams, language information and patterns for emoji, number, punctuation and web-address. Our submissions in the constrained environment,i.e., without making any use of monolingual POS taggers or the like, obtained an overall average F1-score of 76.45%, which is comparable to the 2015 winning score of 76.79%.

Filter by Author
P. D. Gujrati (8)
Manish Shrivastava (7)
Partha Pratim Roy (5)
Umapada Pal (5)
Ayan Kumar Bhunia (4)
Iti Mathur (4)
More