
Thesis Format
Monograph
Degree
Doctor of Philosophy
Program
Electrical and Computer Engineering
Supervisor
Capretz, Miriam A.M.
Abstract
This study contributes to safer decision-making towards complex environments, which include robotic systems, by developing separate reinforcement learning approaches that address different aspects of rational decision-making. These safety approaches are motivated to address the unsafe standard reinforcement learning objective, which requires consideration in complex environments. The first contribution focuses on constraint Markov decision processes, introducing an indicated constraint method to modify the Soft Actor-Critic algorithm. This method helps with the sampling distribution problem in the replay buffers while using explicit cost-defined labels, to create clearer boundaries between ``safe'' and ``unsafe'' states in dynamic environments, when using a soft constraint approach. The second contribution examines risk-sensitivity through a Prospect Theory-shaped utility function called PTanh, which emphasizes analyzing how marginal utility affects agent decision-making, revealing critical insights about diminishing returns in risk-averse parameters. The third contribution implements Cumulative Prospect Theory principles directly within an actor-critic reinforcement learning architecture and modifies the Twin Delayed Actor-Critic algorithm to include a risk-sensitive critic that models nonlinear probability weighting and asymmetric evaluation of gains and losses.
The findings from these contributions demonstrate that each method reduces ``unsafe'' state visitations through different mechanisms. In the case of the first contribution, this reduction is achieved through the clearer boundary obtained from the indicated constraint method. Through the prospect-shaped utility function PTanh, it also reduced ``unsafe'' state visitations when the margins are properly considered to induce risk-sensitivity on the agent, offering an additional perspective for practitioners that make use of similar utility shape. It is also assessed that even with small stochasticity in the environment, risk-seeking strategies throughout the entire training steps is not favoured. The third contribution involving Cumulative Prospect Theory in the actor-critic architecture, which is demonstrated on deterministic environments in terms of transitions, reflects safer decision-making in the mean rewards compared to other risk-neutral algorithms, and the empirical evaluations demonstrate faster asymptotic stabilization. Calibrating the probability weighting parameters achieves a balanced risk assessment that prevents excessive emphasis on early failures, which leads to conservative behavior. Both algorithmic variants in the third contribution, the mean-based and max-based implementations, demonstrate competitive performance, with theoretical analysis establishing convergence guarantees through contraction mapping properties.
Summary for Lay Audience
This research explores how to make artificial intelligent agents safer when operating in un- predictable environments. These agents may control robots or other systems where safety is critical. Traditional safety approaches often struggle when facing changing conditions or uncertainty. The study reveals three contributions to provide insights on safety challenges.
The first contribution improves how agents learn to avoid dangerous situations by creating clearer boundaries between what action to take in a “safe” and “unsafe” situation. By modi- fying a learning algorithm to include a binary term that helps the agent discern states, agents can better remember and avoid risky behaviors even in changing environments.
The second contribution examines how agents evaluate risk, similar to how humans make decisions. Using a concept from behavioral economics called Prospect Theory, the research shows that agents need properly calibrated ”risk sensitivity” to avoid dangerous situations effectively. Importantly, the study reveals that even striving to make the agent overly cau- tious can be ineffective if particular parameters are not calibrated well for the sensitivity in the margins to take effect. The parameters calibrating the risk sensitivity in the margins are more important than those used to produce increased penalties for undesired realized outcomes.
Similarly, the final method incorporates human-like decision-making patterns directly into the agent learning process using Cumulative Prospect Theory. Incorporating this approach in the decision pattern not only weighs potential losses more heavily than gains but also ap- plies nonlinear probability weighting to rare events —similar to how humans tend to over- estimate unlikely catastrophic outcomes. This probability-weighted approach, tested in various simulated environments, showed faster learning and more stable performance than traditional methods.
Each method reduces unsafe behaviors through different mechanisms. The first creates clearer safety boundaries, the second fine-tunes the understanding of risk perception, and the third builds safety directly into decision-making. These approaches offer new ways to enhance agent safety in complex environments; therefore, it is applicable in industrial control systems, autonomous vehicles, or other applications where traditional safety pro- gramming can fall short.
Recommended Citation
Adjei, Patrick, "Safety Considerations In Complex Environments With Reinforcement Learning" (2025). Electronic Thesis and Dissertation Repository. 10852.
https://ir.lib.uwo.ca/etd/10852
Included in
Artificial Intelligence and Robotics Commons, Risk Analysis Commons, Software Engineering Commons