Thesis Format
Integrated Article
Degree
Master of Science
Program
Computer Science
Collaborative Specialization
Artificial Intelligence
Supervisor
Boyu Wang
2nd Supervisor
Charles X. Ling
Co-Supervisor
Abstract
Continual learning aims to learn a series of tasks sequentially without forgetting the knowledge acquired from the previous ones. In this work, we propose the Hessian-Aware Low-Rank Perturbation (HALRP) algorithm for continual learning. By modeling the parameter transitions along the sequential tasks with the weight matrix transformation, we propose to apply the low-rank approximation on the task-adaptive parameters in each layer of the neural networks. Specifically, we theoretically demonstrate the quantitative relationship between Hessian information and the proposed low-rank approximation. The approximation ranks are then globally determined according to the marginal change of the empirical loss estimated by the layer-specific gradient and low-rank approximation error. Furthermore, we control the model capacity by pruning the less important parameters to diminish the parameter growth. We conduct extensive experiments on various benchmarks, including a dataset with large-scale tasks, and compare our method against some recent state-of-the-art methods to demonstrate the effectiveness and scalability of our proposed method. Empirical results show that our method performs better on different benchmarks, especially in achieving task order robustness and handling the forgetting issue. The source code is at https://github.com/lijiaqi/HALRP.
Summary for Lay Audience
Unlike the conventional machine learning paradigm that assumes that there is merely one task and the training data are provided simultaneously for learning, continual learning focuses on practical scenarios where the data can be collected from different tasks and will be learned sequentially. Naturally, the fundamental objective in continual learning is to achieve good performance in the new task and keep the ability gained from existing ones, usually termed the stability-plasticity dilemma.
To overcome the catastrophic forgetting issue on previous tasks, task-specific parameters are usually introduced for each new task to isolate the learned knowledge among tasks. However, this parameter isolation strategy usually leads to a significant increase in model size when learning more tasks. Thus, a trade-off between the overall performance and the model growth control should necessarily be considered. Furthermore, some previous studies also showed that when facing different orders of the same set of tasks, a learner sometimes fails to guarantee consistent performance for each individual task, raising concerns with respect to the task-order robustness in continual learning.
In this work, we proposed the Hessian-Aware Low-Rank Perturbation (HALRP) for continual learning. Specifically, we model the parameter transition among tasks under the form of residual matrix transformation, then the low-rank approximation on the task-adaptive parameters in each layer of a neural network. With theoretical support, we show the relationship between the Hessian matrix and the low-rank approximation error. We proposed determining the approximation rank in each layer according to the marginal change of the empirical loss. Thus, a better trade-off between overall performance and model size growth can be achieved. Furthermore, by adopting the residual form for the weight transformation, our proposed method is more robust on different task orders. Experiments on common datasets and network architectures were conducted to demonstrate the effectiveness of our method.
Recommended Citation
Li, Jiaqi, "Continual Learning via Hessian-Aware Low-Rank Perturbation" (2024). Electronic Thesis and Dissertation Repository. 10445.
https://ir.lib.uwo.ca/etd/10445
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License