Electronic Thesis and Dissertation Repository

Thesis Format

Integrated Article

Degree

Doctor of Philosophy

Program

Computer Science

Collaborative Specialization

Artificial Intelligence

Supervisor

Ling, Charles X.

Abstract

This thesis focuses on the problem of increasing reading motivation with machine learning (ML). The act of reading is central to modern human life, and there is much to be gained by improving the reading experience. For example, the internal reading motivation of students, especially their interest and enjoyment in reading, are important factors in their academic success.

There are many topics in natural language processing (NLP) which can be applied to improving the reading experience in terms of readability, comprehension, reading speed, motivation, etc. Such topics include personalized recommendation, headline optimization, text simplification, and many others. However, to the best of our knowledge, this is the first work to explicitly address the broad and meaningful impact that NLP and ML can have on the reading experience.

In particular, the aim of this thesis is to explore new approaches to supporting internal reading motivation, which is influenced by readability, situational interest, and personal interest. This is performed by identifying new or existing NLP tasks which can address reader motivation, designing novel machine learning approaches to perform these tasks, and evaluating and examining these approaches to determine what they can teach us about the factors of reader motivation.

In executing this research, we make use of concepts from NLP such as textual coherence, interestingness, and summarization. We additionally use techniques from ML including supervised and self-supervised learning, deep neural networks, and sentence embeddings.

This thesis, presented in an integrated-article format, contains three core contributions among its three articles. In the first article, we propose a flexible and insightful approach to coherence estimation. This approach uses a new sentence embedding which reflects predicted position distributions. Second, we introduce the new task of pull quote selection, examining a spectrum of approaches in depth. This article identifies several concrete heuristics for finding interesting sentences, both expected and unexpected. Third, we introduce a new interactive summarization task called HARE (Hone as You Read), which is especially suitable for mobile devices. Quantitative and qualitative analysis support the practicality and potential usefulness of this new type of summarization.

Summary for Lay Audience

Reading is an increasingly important human skill. The interest and enjoyment students have in reading for example is an important factor in their academic success. This thesis is concerned with how to apply techniques from machine learning (ML) and natural language processing (NLP) in order to improve how readable, attention grabbing, or personally relevant reading material is, especially in a digital setting. ML allows us to automatically identify patterns and trends in large datasets, and NLP is concerned with the application of computer science to naturally occurring language, such as news articles.

In this thesis, we consider three NLP problems which are related to reader enjoyment and interest, and we propose new solutions to those problems. The first problem we consider is related to determining the readability of a text based on how well its concepts are organized (a property known as coherence). The solution we propose works by learning to look at each sentence out of context and predicting where it should belong. Second, we propose a new problem called pull quote (PQ) selection. PQs are often found in newspapers or online news articles, and are sentences or quotations from the article placed in an eye-catching graphic. They are intended to grab the reader’s attention and make them interested in reading more of the article. We propose several methods for learning to choose good PQs from a text, and learn about unexpected properties of PQs in the process. Third, we introduce a new type of reading assistance tool suitable for mobile devices. This tool is based on the NLP problem of interactive personalized summarization, and is intended to use low-effort feedback during reading to understand reader preferences and provide them with personalized summaries. We propose several approaches capable of predicting what parts of an article they will be interested in reading and demonstrate the practicality of this type of tool.

Aside from topics in NLP, research completed during the course of this PhD (but not included in thesis) touched on abstract visual reasoning problems and lifelong machine learning (learning many tasks in sequence, especially without forgetting earlier tasks).

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS