About Śata-Anuva̅dak Translation Resources
The Śata-Anuva̅dak translation resources consist of trained SMT models for translation among 10 Indian languages and English. The translation models have been trained on the Indian Language Corpora Initiative (ILCI) corpus, containing tourism and health domains sentences. The resources can be used to translate sentences using the Moses decoder. The systems have been tested using Moses 1.0. Since Moses 2.0, the format of moses.ini has changed, so you may want to use Moses 1.0 for translation.
The resources and their creation are described in detail in the following paper:
Anoop Kunchukuttan, Abhijit Mishra, Rajen Chatterjee, Ritesh Shah, Pushpak Bhattacharyya. 2014. Śata-Anuva̅dak: Tackling Multiway Translation of Indian Languages . Language and Resources and Evaluation Conference (LREC 2014). 2014.
What can you do with these resources?
Instructions for Use
- For each language, a 5-gram language model in APRA format can be downloaded.
- For each language pair, the phrase table, reordering model and moses.ini file for the tuned model can be downloaded.
- The systems have been trained using Moses 1.0. Since Moses 2.0, the format of moses.ini has changed, so you may want to use Moses 1.0 for translation. Alternatively, Moses provides a script to convert v1 format files to v2 (see: mosesdecoder/scripts/training/convert-moses-ini-to-v2.perl)
- In the moses.ini file, edit the paths to the language model, phrase table and the reordering model before using these models in Moses:
- Language Model: Under Section [lmodel-file], field number 4
- Phrase Table: Under Section [ttable-file], field number 5
- Reordering Model: Under Section [distortion-file], field number 4
- Known Issues:
- Some phrase tables contain the characters '[' and ']'. The latest Moses decoder fails to decode such phrase tables. The issue can be resolved either by escaping these characters or deleting them from the phrase tables.
For any queries/information, please contact Prof. Pushpak Bhattacharyya (firstname.lastname@example.org) or Anoop Kunchukuttan (email@example.com)
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
For Research Use only
Source Reordered, Phrase Based Translation Models
English to Indian language models
Phrase Based Translation Models
Row: source language, Column: target language