Electronic Thesis and Dissertation Repository

Thesis Format



Master of Science


Computer Science


Mercer, Robert E.


Machine Learning in Natural Language Processing (NLP) deals directly with distributed representations of words and sentences. Words are transformed into vectors of real values, called embeddings, and used as the inputs to machine learning models. These architectures are then used to solve NLP tasks such as Sentiment Analysis and Natural Language Inference. While solving these tasks many models will create word embeddings and sentence embeddings as outputs. We are interested in how we can transform and analyze these output embeddings and modify our models, to both improve the task result and give us an understanding of the spaces. To this end we introduce the notion of explicit features, the actual values of the embeddings, and implicit features, information encoded into the space of vectors by solving the task, and hypothesis on an idealized spaces, where implicit features directly create the explicit features by means of basic linear algebra and set theory. To test if our output spaces are similar to our ideal space we vary the model and, motivated by Transformer architectures, introduce the notion of Self-Enriching layers. We also create idealized spaces, and run task experiments to see if the patterns of results can give us insight into the output spaces, as well we run transfer learning experiments to see what kinds of information are being represented by our models. Finally, we run direct analysis of the vectors of the word and sentence outputs for comparison.

Summary for Lay Audience

Machine Learning in Natural Language Processing is using algorithms to solve problems of language: such as, is this movie review positive; does this argument follow from its premises; are these sentences paraphrases. To achieve this word and sentences are changed in vectors, lists of numbers, and fed into said algorithms. Often, when running the algorithms, we will get output vectors for the word and sentences, and these vectors have their own internal spaces. We are interested in how we can change our algorithms and modify the spaces, so that both the tasks are more successful and the structures are easier to interpret for humans. We are interested in both the explicit information, the actual vectors, and implicit information, the structure and relation of the vectors. To solve this problem, we hypothesize about an ideal space, and run experiments and analysis to see if our outputs match experiments run on an ideal space. We also modify our existing algorithms borrowing techniques from the successful Transformer algorithms, as we hypothesis they will both improve our task performance, and improve the relationship between our explicit and implicit information. Lastly, we examine the explicit spaces directly using linear algebraic methods, to see if a comparison of these direct metrics can give us an understanding of how the spaces relate.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License