Thesis Format

Integrated Article

Continual Learning via Hessian-Aware Low-Rank Perturbation

Jiaqi Li, Western UniversityFollow

Degree

Master of Science

Program

Computer Science

Collaborative Specialization

Artificial Intelligence

Supervisor

Boyu Wang

2nd Supervisor

Charles X. Ling

Co-Supervisor

Abstract

Continual learning aims to learn a series of tasks sequentially without forgetting the knowledge acquired from the previous ones. In this work, we propose the Hessian-Aware Low-Rank Perturbation (HALRP) algorithm for continual learning. By modeling the parameter transitions along the sequential tasks with the weight matrix transformation, we propose to apply the low-rank approximation on the task-adaptive parameters in each layer of the neural networks. Specifically, we theoretically demonstrate the quantitative relationship between Hessian information and the proposed low-rank approximation. The approximation ranks are then globally determined according to the marginal change of the empirical loss estimated by the layer-specific gradient and low-rank approximation error. Furthermore, we control the model capacity by pruning the less important parameters to diminish the parameter growth. We conduct extensive experiments on various benchmarks, including a dataset with large-scale tasks, and compare our method against some recent state-of-the-art methods to demonstrate the effectiveness and scalability of our proposed method. Empirical results show that our method performs better on different benchmarks, especially in achieving task order robustness and handling the forgetting issue. The source code is at https://github.com/lijiaqi/HALRP.

Summary for Lay Audience

Unlike the conventional machine learning paradigm that assumes that there is merely one task and the training data are provided simultaneously for learning, continual learning focuses on practical scenarios where the data can be collected from different tasks and will be learned sequentially. Naturally, the fundamental objective in continual learning is to achieve good performance in the new task and keep the ability gained from existing ones, usually termed the stability-plasticity dilemma.

To overcome the catastrophic forgetting issue on previous tasks, task-specific parameters are usually introduced for each new task to isolate the learned knowledge among tasks. However, this parameter isolation strategy usually leads to a significant increase in model size when learning more tasks. Thus, a trade-off between the overall performance and the model growth control should necessarily be considered. Furthermore, some previous studies also showed that when facing different orders of the same set of tasks, a learner sometimes fails to guarantee consistent performance for each individual task, raising concerns with respect to the task-order robustness in continual learning.

In this work, we proposed the Hessian-Aware Low-Rank Perturbation (HALRP) for continual learning. Specifically, we model the parameter transition among tasks under the form of residual matrix transformation, then the low-rank approximation on the task-adaptive parameters in each layer of a neural network. With theoretical support, we show the relationship between the Hessian matrix and the low-rank approximation error. We proposed determining the approximation rank in each layer according to the marginal change of the empirical loss. Thus, a better trade-off between overall performance and model size growth can be achieved. Furthermore, by adopting the residual form for the weight transformation, our proposed method is more robust on different task orders. Experiments on common datasets and network architectures were conducted to demonstrate the effectiveness of our method.

Recommended Citation

Li, Jiaqi, "Continual Learning via Hessian-Aware Low-Rank Perturbation" (2024). Electronic Thesis and Dissertation Repository. 10445.
https://ir.lib.uwo.ca/etd/10445

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Thesis Format

Continual Learning via Hessian-Aware Low-Rank Perturbation

Degree

Program

Collaborative Specialization

Supervisor

2nd Supervisor

Abstract

Summary for Lay Audience

Recommended Citation

Creative Commons License

Included in

Links

Browse

Author Corner

Links

Thesis Format

Continual Learning via Hessian-Aware Low-Rank Perturbation

Author

Degree

Program

Collaborative Specialization

Supervisor

2nd Supervisor

Abstract

Summary for Lay Audience

Recommended Citation

Creative Commons License

Included in

Share

Links

Browse

Author Corner

Links