Cross-Lingual Word Sense Disambiguation for Low-Resource Hybrid Machine Translation

Rudnick, Alexander James

Sangam Home
→
Electronic Theses and Dissertations (ETDs)
→
IUScholarWorks
→
View Item

dc.creator	Rudnick, Alexander James
dc.date	2019-01-22T17:50:32Z
dc.date	2019-01-22T17:50:32Z
dc.date	2019-01
dc.date.accessioned	2023-02-21T11:21:29Z
dc.date.available	2023-02-21T11:21:29Z
dc.identifier	http://hdl.handle.net/2022/22672
dc.identifier.uri	http://localhost:8080/xmlui/handle/CUHPOERS/253155
dc.description	Thesis (Ph.D.) - Indiana University, School of Informatics, Computing, and Engineering, 2019
dc.description	This thesis argues that cross-lingual word sense disambiguation (CL-WSD) can be used to improve lexical selection for machine translation when translating from a resource- rich language into an under-resourced one, especially when relatively little bitext is avail- able. In CL-WSD, we perform word sense disambiguation, considering the senses of a word to be its possible translations into some target language, rather than using a sense inventory developed manually by lexicographers. Using explicitly trained classifiers that make use of source-language context and of resources for the source language can help machine translation systems make better decisions when selecting target-language words. This is especially the case when the alternative is hand-written lexical selection rules developed by researchers with linguistic knowledge of the source and target languages, but also true when lexical selection would be performed by a statistical machine translation system, when there is a relatively small amount of available target-language text for training language models. In this work, I present the Chipa system for CL-WSD and apply it to the task of translating from Spanish to Guarani and Quechua, two indigenous languages of South America. I demonstrate several extensions to the basic Chipa system, including tech- niques that allow us to benefit from the wealth of available unannotated Spanish text and existing text analysis tools for Spanish, as well as approaches for learning from bitext resources that pair Spanish with languages unrelated to our intended target lan- guages. Finally, I provide proof-of-concept integrations of Chipa with existing machine translation systems, of two completely different architectures.
dc.language	en
dc.publisher	[Bloomington, Ind.] : Indiana University
dc.rights	Creative Commons Attribution 4.0 International
dc.rights	https://creativecommons.org/licenses/by/4.0/
dc.subject	machine translation
dc.subject	artificial intelligence
dc.subject	computational linguistics
dc.title	Cross-Lingual Word Sense Disambiguation for Low-Resource Hybrid Machine Translation
dc.type	Doctoral Dissertation

Files in this item

Files	Size	Format	View
dissertation.pdf	1.011Mb	application/pdf	View/Open

This item appears in the following Collection(s)

IUScholarWorks [635]
Indiana University Bloomington

Show simple item record

Search DSpace

Advanced Search

Browse

All of DSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

Cross-Lingual Word Sense Disambiguation for Low-Resource Hybrid Machine Translation

Files in this item

This item appears in the following Collection(s)

Search DSpace

Browse

All of DSpace

This Collection