Electronic Thesis and Dissertation Repository

Enhancing the Performance of NMT Models Using the Data-Based Domain Adaptation Technique for Patent Translation

Maimoonah Ahmed, Western University

Abstract

During today’s age of unparalleled connectivity, language and data have become powerful tools capable of enabling effective communication and cross-cultural collaborations. Neural machine translation (NMT) models are especially capable of leveraging linguistic knowledge and parallel corpora to increase global connectivity and act as a tool for the transmission of knowledge. In this thesis, we apply a data-based domain adaptation technique to fine-tune three pre-existing NMT transformer models with attention mechanisms for the task of patent translation from English to Japanese. Languages, especially in the context of patents, can be very nuanced. A clear understanding of the intended meaning requires comprehensive domain knowledge and expert linguistic abilities which may become expensive and time-consuming. Automating the process of translation is helpful, however, commercially available NMT models perform poorly for this task as they are not trained on highly technical words whose meaning may be dependent on the relevant domain in which they are used. Our aim is to enhance the performance of translation models on highly technical inputs using a range of essential steps, focusing on data-based domain adaptation. These steps collectively contribute to the enhancement of the NMT model's performance by a 41.22\% increase in the baseline BLEU score.