e-ISSN : 0975-3397
Print ISSN : 2229-5631
Home | About Us | Contact Us

ARTICLES IN PRESS

Articles in Press

ISSUES

Current Issue
Archives

CALL FOR PAPERS

CFP 2021

TOPICS

IJCSE Topics

EDITORIAL BOARD

Editors

Indexed in

oa
 

ABSTRACT

Title : A Novel Approach for English to South Dravidian Language Statistical Machine Translation System
Authors : Unnikrishnan P, Antony P J, Dr. Soman K P
Keywords : SMT; Dravidian languages; parsing; morphology; inflections
Issue Date : November 2010
Abstract :
Development of a well fledged bilingual machine translation (MT) system for any two natural languages with limited electronic resources and tools is a challenging and demanding task. This paper presents the development of a statistical machine translation (SMT) system for English to South Dravidian languages like Malayalam and Kannada by incorporating syntactic and morphological information. SMT is a data oriented statistical framework for translating text from one natural language to another based on the knowledge extracted from bilingual corpus. Even though there are efforts towards building such an English to South Dravidian translation system ,unfortunately we do not have an efficient translation system till now. The first and most important step in SMT is creating a well aligned parallel corpus for training the system. Experimental research shows that the existing methodology for bilingual parallel corpus creation is not efficient for English to South Dravidian language in the SMT system. In order to increase the performance of the translation system, we have introduced a new approach in creating parallel corpus. The main ideas which we have implemented and proven very effective for English to south Dravidian languages SMT system are: (i) reordering the English source sentence according to Dravidian syntax, (ii) using the root suffix separation on both English and Dravidian words and iii) use of morphological information which substantially reduce the corpus size required for training the system. Since the unavailability of full fledged parsing and morphological tools for Malayalam and Kannada languages, sentence synthesis was done both manually and existing morph analyzer created by Amrita university. From the experiment we found that the performance of our systems are significantly well and achieves a very competitive accuracy for small sized bilingual corpora. The proposed ideas can be directly used for other south Dravidian languages like Tamil and Telugu with some minor changes.
Page(s) : 2749-2759
ISSN : 0975–3397
Source : Vol. 2, Issue.8

All Rights Reserved © 2009-2024 Engg Journals Publications
Page copy protected against web site content infringement by CopyscapeCreative Commons License