ThamizhiMorph

ThamizhiMorph is a morphological analyser cum generator which is developed using Finite-State Transducer approach. This tool can accept text, either inform of word or sentence, and provide the analysis. This is also capable of doing the generation when a lemma and its inflections are given.

Abstract

This paper outlines an open, and extendable inflectional Morphological Analyser cum Generator (MAG) called ThamizhiMorph for Tamil. Tamil is still a low-resource language in terms of the number of processing tools/applications available. Moreover, most of the available tools are not open and extendable. ThamizhiMorph is the only openly available morphological analyser for Tamil, which can also be extended easily. A morphological analyser is a key resource to elicit morphophonological and morphosyntactical information, especially for morphologically rich languages and is useful for developing applications such as Machine Translators. This paper describes how ThamizhiMorph is modelled using a Finite-State Transducer (FST), and is implemented using Foma. We also discuss our design decisions based on the peculiarities of Tamil's structure, nominal and verbal paradigms, along with a high-level meta language to specify the inflectional morphology of morphologically regular languages. We have also evaluated ThamizhiMorph using the test data set available in the Tamil Universal Dependency treebank version(2.5). We have published the tools, FST models, lexicons, meta-morphological rules, and a list of 1 million verbs generated using ThamizhiMorph for others to use and extend upon.

Cite