"Comparative Analysis of YOLOv8 and RT-DETR for Real-Time Object Detect" by Aryan Parekh
Electronic Thesis and Dissertation Repository

Thesis Format

Monograph

Degree

Master of Science

Program

Computer Science

Supervisor

Bauer, Michael

Abstract

Object detection is critical for Advanced Driver Assistance Systems (ADAS), enhancing vehicle navigation and safety by identifying and responding to various traffic objects. This thesis evaluates the performance of YOLOv8 and Real-Time Detection Transformer (RT-DETR) models, which represent leading CNN and transformer-based architectures for real-time object detection, in the domain of ADAS. Despite RT-DETR's superior performance over YOLOv8 on the COCO dataset, this study explores its effectiveness in ADAS applications. A comprehensive analysis was conducted using seven models (five YOLOv8 and two RT-DETR variants) across five datasets, including BDD100k and four Roadlab datasets. The evaluation focused on accuracy (Mean Average Precision or mAP), inference speed (latency), and F1 scores, highlighting YOLOv8's superior performance in both accuracy and speed, making it more suitable for ADAS tasks. The study also demonstrated YOLOv8 models' faster learning dynamics and better management of class imbalance, ensuring balanced detection of critical objects like pedestrians and cyclists. The findings underscore YOLOv8's current advantage in real-time detection for ADAS, while also identifying potential areas for future research to optimize transformer-based models. This work contributes to the development of safer and more reliable autonomous vehicles.

Summary for Lay Audience

In this thesis, we explored how advanced technology helps cars "see" and recognize objects on the road, which is crucial for making driving safer. This technology is part of Advanced Driver Assistance Systems (ADAS), which are designed to help drivers avoid accidents and navigate better by detecting things like other vehicles, pedestrians, traffic signs, and signals.

We compared two state-of-the-art computer models, YOLOv8 and Real-Time Detection Transformer (RT-DETR), which are like highly trained eyes for the car. These models use different methods to process visual information from cameras mounted on vehicles, aiming to identify and classify objects quickly and accurately. YOLOv8 is based on convolutional neural networks (CNNs), while RT-DETR uses a newer technology called transformers, which has been very successful in other areas, like language processing.

To evaluate these models, we used several datasets containing various road scenes, including different weather conditions, times of day, and types of traffic. One key dataset, BDD100k, provided a broad view of driving conditions, while the Roadlab dataset offered more specific categories, such as different types of vehicles and traffic signs.

The research found that YOLOv8 generally performed better than RT-DETR in terms of speed and accuracy. This means YOLOv8 was quicker at identifying objects and did so more accurately, which is essential for preventing accidents. Additionally, YOLOv8 handled the common issue of class imbalance—where there are many more examples of certain objects, like cars, than others, like pedestrians—more effectively. This balance is crucial because it ensures that less common but important objects are also detected reliably.

The study also showed that YOLOv8 models learned and adapted more quickly during the training phase, reaching a high level of performance faster than RT-DETR. This rapid learning curve is important for real-world applications where the models need to be updated and improved continuously.

Overall, this research helps us understand which technologies are best suited for making cars smarter and safer, paving the way for more advanced driver assistance systems and, eventually, fully autonomous vehicles. It provides valuable insights into how these systems can be optimized to ensure that they are both fast and accurate, which is crucial for real-time decision-making on the road.

Share

COinS