Electronic Thesis and Dissertation Repository

Thesis Format



Doctor of Philosophy


Electrical and Computer Engineering

Collaborative Specialization

Artificial Intelligence


Capretz, Miriam


Reinforcement Learning (RL) has seen exponential performance improvements over the past decade, achieving super-human performance across many domains. Deep Reinforcement Learning (DRL), the combination of RL methods with deep neural networks (DNN) as function approximators, has unlocked much of this progress. The path to generalized artificial intelligence (GAI) will depend on deep learning (DL) and RL. However, much work is required before the technology reaches anything resembling GAI. Therefore, this thesis focuses on a subset of areas within RL that require additional research to advance the field, specifically: sample efficiency, planning, and task transfer. The first area, sample efficiency, refers to the amount of data an algorithm requires before converging to stable performance. Within RL, all models require immense amounts of samples from the environment, a far cry from other sub-areas, such as supervised learning which often require an order of magnitude fewer samples. This research proposes a method that learns to reuse previously seen data instead of throwing it away to improve sample efficiency by a factor at ~2x, while training 30% faster than state-of-the-art methods. The second area is planning within RL, where planning refers to an agent using an environment model to predict how possible actions will affect its performance. Improved planning in RL leads to increased performance as the model gains context on its next action. This thesis proposes a model that learns how to act optimally in an environment through a dynamic planning mechanism that adapts on the fly. This dynamic planning ability gives the resulting RL model immense flexibility as it can adapt to the demand of particular states on the environment and outperforms related methods by 30-45%. The final area is that of task transfer, which deals with how readily a model trained on one task can transfer its knowledge to another related task within the same environment. RL models must be fully retrained on the new task even if the environment structure does not change. Here, we introduce two contributions that improve an existing transfer framework known as the Successor Features (SF). The first introduces a reward model with greater flexibility with stronger performance and transfer abilities than baseline models; achieving nearly 2x the reward on highly demanding tasks. The second contribution rephrases the SF framework as a simple pair of supervised tasks that can dynamically induce policies, drastically simplifying the learning problem and while matching performance.

Summary for Lay Audience

In Artificial Intelligence (AI), different algorithms are used to solve problems automatically. Some algorithms are designed to excel in different mediums, such as visual tasks, translation of words, or exerting control over other systems. The field of Reinforcement Learning (RL) deals with algorithms that learn to exert control over other systems; they learn to do this automatically without any external direction besides a reward and punishment system, similar to how humans learn. It was discovered that the performance of RL algorithms could be improved significantly by another type of technology known as deep learning. The resulting performance increase of these Deep Reinforcement Learning (DRL) algorithms has been so significant that they have been able to best human experts. A well-known example of one of these algorithms is AlphaGo, which beat grandmasters at the game of Go without any special instruction; it learned to do this on its own by playing the game.

However, these algorithms could be more efficient; they take a very long time to learn, use a lot of energy and are unable to learn as well as a human from small amounts of data. It makes sense to label these types of algorithms as quite inefficient.

This thesis contributes new DRL algorithms to improve the efficiency of the algorithms. Within the thesis, we identified and introduced solutions to areas we felt had the most significant possible impact, such as sample efficiency, planning, and task transfer.

By improving sample efficiency, the amount of information the algorithms need before performing well goes down. Our proposed algorithm does this by reducing the amount of information thrown away by the algorithms. In planning, where the algorithm can think ahead of picking an action, this thesis proposes making another part of the algorithm learnable by the computer, which we show helps improve its performance. Finally, in task transfer, where an algorithm has finished learning and is moved to another related problem, we propose two contributions: one improves the expressiveness of the algorithm while the other breaks it into two simpler problems that are easily solved.

Available for download on Sunday, December 31, 2023