Show simple item record

dc.contributor.author Mohamed, MZ
dc.contributor.author Ihalapathirana, A
dc.contributor.author Hameed, RA
dc.contributor.author Pathirennehelage, N
dc.contributor.author Ranathunga, S
dc.contributor.author Jayasena, S
dc.contributor.author Dias, G
dc.date.accessioned 2018-07-31T18:48:24Z
dc.date.available 2018-07-31T18:48:24Z
dc.identifier.uri http://dl.lib.mrt.ac.lk/handle/123/13337
dc.description.abstract A parallel corpus aligned at both sentence and word level is an important prerequisite in statistical machine translation. However, manual creation of such a parallel corpus is time consuming, and requires experts fluent in both languages. This paper presents the first ever empirical evaluation carried out to identify the best unsupervised word alignment technique for Sinhala and Tamil. It also presents a novel approach that combines the output of individual aligners, which outperforms the solitary use of these aligners. Sentence aligned parallel text from annual reports and letters of Sri Lankan Government institutions, and order papers from the Parliament of Sri Lanka were used in the evaluation. en_US
dc.subject word alignment; parallel corpus; sinhala; tamil en_US
dc.title Automatic creation of a word aligned Sinhala-Tamil parallel corpus en_US
dc.type Conference-Abstract en_US
dc.identifier.faculty Engineering en_US
dc.identifier.department Department of Computer Science and Engineering en_US
dc.identifier.year 2017 en_US
dc.identifier.conference Moratuwa Engineering Research Conference - MERCon 2017 en_US
dc.identifier.place Moratuwa, Sri Lanka en_US
dc.identifier.email maryamzi.12@cse.mrt.ac.lk en_US
dc.identifier.email anusha.12@cse.mrt.ac.lk en_US
dc.identifier.email riyafa.12@cse.mrt.ac.lk en_US
dc.identifier.email pnadeeshani.12@cse.mrt.ac.lk en_US
dc.identifier.email surangika@cse.mrt.ac.lk en_US
dc.identifier.email sanath@cse.mrt.ac.lk en_US
dc.identifier.email gihan@cse.mrt.ac.lk en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record