Electronic Thesis and Dissertation Repository

Thesis Format

Monograph

Degree

Master of Science

Program

Computer Science

Supervisor

Dr. Mostafa Milani

Abstract

The rise of large language models (LLMs) has significantly influenced various fields, including natural language processing (NLP) and image generation, by making complex computational tasks more accessible. Despite their remarkable generative capabilities, a fundamental ques- tion remains regarding their level of understanding, particularly in structured domains such as SQL, where precise logic and syntactic accuracy are essential. This work evaluates the extent to which LLMs comprehend SQL by assessing their performance on key tasks, includ- ing syntax error detection, missing token identification, query performance prediction, query equivalence checking, and query explanation. These tasks collectively examine the models’ ability to recognize patterns, maintain context awareness, interpret semantics, and ensure logical coherence—capabilities that are critical for genuine SQL understanding.

To enable a rigorous evaluation, we construct labeled datasets from well-established SQL workloads and conduct extensive experiments on state-of-the-art LLMs. Our analysis specif- ically investigates how query complexity and distinct syntactic features impact model perfor- mance. The results indicate that while models such as GPT4 excel in tasks that rely on pattern recognition and contextual awareness, they exhibit persistent difficulties in deeper semantic understanding and logical consistency. These challenges are particularly evident in tasks such as accurately predicting query performance and verifying query equivalence.

This gap suggests that current LLMs, despite their syntactic and structural proficiency, lack the ability to integrate deeper semantic reasoning required for comprehensive SQL comprehension. Our findings underscore the need for future advancements in LLMs to focus on im- proving their reasoning abilities and their capacity to incorporate domain-specific knowledge. Enhancing these aspects would enable a transition from syntactic fluency to a more logic- driven understanding, thereby unlocking the full potential of SQL in various computational applications.

Summary for Lay Audience

In recent years, artificial intelligence has made significant progress, especially through the development of large language models (LLMs) like ChatGPT. These models can generate human-like text and have been used in areas such as writing, translation, and even creating images. However, while they are good at mimicking language, it’s not clear whether they truly "understand" what they are doing—especially in areas that require strict logic, like programming languages.

This research focuses on one such language: SQL, which is used to interact with databases. Writing correct SQL queries requires not just good grammar, but also deep logical thinking. To test how well LLMs understand SQL, this study examines their performance on five key tasks: identifying errors, spotting missing pieces, predicting how long a query will take to run, checking whether two queries are equivalent, and explaining the logic behind queries.

To evaluate these abilities, the study uses real SQL data and designs experiments using leading LLMs. The results show that while models like GPT-4 can detect patterns and understand context well, they struggle with tasks that require deeper reasoning—such as accurately predicting performance or judging whether two queries do the same thing.

This finding suggests that even the most advanced LLMs are still limited in their understanding of structured, logic-based languages like SQL. They may appear intelligent but often lack true comprehension when deeper logic is required. The research concludes that if we want LLMs to be more reliable in professional or technical settings, especially those involving databases, future models must be trained not only on language patterns but also on reasoning skills and domain-specific knowledge.

Share

COinS