About Śata-Anuva̅dak Translation Resources

The Śata-Anuva̅dak translation resources consist of trained SMT models for translation among 10 Indian languages and English. The translation models have been trained on the Indian Language Corpora Initiative (ILCI) corpus, containing tourism and health domains sentences. The resources can be used to translate sentences using the Moses decoder. The systems have been tested using Moses 1.0. Since Moses 2.0, the format of moses.ini has changed, so you may want to use Moses 1.0 for translation. The resources and their creation are described in detail in the following paper:

Anoop Kunchukuttan, Abhijit Mishra, Rajen Chatterjee, Ritesh Shah, Pushpak Bhattacharyya. 2014. Śata-Anuva̅dak: Tackling Multiway Translation of Indian Languages . Language and Resources and Evaluation Conference (LREC 2014). 2014.

What can you do with these resources?

Instructions for Use

Contact

For any queries/information, please contact Prof. Pushpak Bhattacharyya (pb@cse.iitb.ac.in) or Anoop Kunchukuttan (anoopk@cse.iitb.ac.in)

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
For Research Use only

Download

Language Models:

Bengali English Gujarati Hindi Konkani Malayalam Marathi Punjabi Tamil Telugu Urdu

Source Reordered, Phrase Based Translation Models

English to Indian language models
BengaliGujaratiHindiKonkaniMalayalamMarathiPunjabiTamilTeluguUrdu
Englishen-bnen-guen-hien-kKen-mlen-mren-paen-taen-teen-ur

Phrase Based Translation Models

Row: source language, Column: target language
BengaliEnglishGujaratiHindiKonkaniMalayalamMarathiPunjabiTamilTeluguUrdu
Bengalibn-enbn-gubn-hibn-kKbn-mlbn-mrbn-pabn-tabn-tebn-ur
Englishen-bnen-guen-hien-kKen-mlen-mren-paen-taen-teen-ur
Gujaratigu-bngu-engu-higu-kKgu-mlgu-mrgu-pagu-tagu-tegu-ur
Hindihi-bnhi-enhi-guhi-kKhi-mlhi-mrhi-pahi-tahi-tehi-ur
KonkanikK-bnkK-enkK-gukK-hikK-mlkK-mrkK-pakK-takK-tekK-ur
Malayalamml-bnml-enml-guml-himl-kKml-mrml-paml-taml-teml-ur
Marathimr-bnmr-enmr-gumr-himr-kKmr-mrmr-pamr-tamr-temr-ur
Punjabipa-bnpa-enpa-gupa-hipa-kKpa-mlpa-mrpa-tapa-tepa-ur
Tamilta-bnta-enta-guta-hita-kKta-mlta-mrta-pata-teta-ur
Telugute-bnte-ente-gute-hite-kKte-mlte-mrte-pate-tate-ur
Urduur-bnur-enur-guur-hiur-kKur-mlur-mrur-paur-taur-te