
Potential of Vision Transformers for Advanced Driver-Assistance Systems: An Evaluative Approach
Abstract
In this thesis, we examine the performance of Vision Transformers concerning the current state of Advanced Driving Assistance Systems (ADAS). We explore the Vision Transformer model and its variants on the problems of vehicle computer vision. Vision transformers show performance competitive to convolutional neural networks but require much more training data. Vision transformers are also more robust to image permutations than CNNs. Additionally, Vision Transformers have a lower pre-training compute cost but can overfit on smaller datasets more easily than CNNs. Thus we apply this knowledge to tune Vision transformers on ADAS image datasets, including general traffic objects, vehicles, traffic lights, and traffic signs. We compare the performance of Vision Transformers on this problem to existing convolutional neural network approaches to determine the viability of Vision Transformer usage.