Electronic Thesis and Dissertation Repository

Thesis Format

Monograph

Degree

Master of Science

Program

Neuroscience

Collaborative Specialization

Machine Learning in Health and Biomedical Sciences

Supervisor

Daley, Mark J.

Abstract

Causal cognition, how beings perceive and reason about cause and effect, is crucial not only for survival and adaptation in biological entities but also for the development of causal artificial intelligence. Large language models (LLMs) have recently taken center stage due to their remarkable capabilities, demonstrating human-like reasoning in their generative responses. This thesis explores how LLMs perform on causal reasoning questions and how modifying information in the prompt affect their reasoning. Using 1392 causal inference questions from the CLADDER dataset, LLM responses were assessed for accuracy. With simple prompting, LLMs performed more accurately on intervention queries compared to association or counterfactual queries. Chain-of-Thought (CoT) prompting was also explored with formal reasoning steps included in the prompts. Contrary to expectations, LLMs achieved higher accuracy with simple prompts rather than CoT-enhanced prompts, suggesting that the framework for accurate causal cognition in LLMs differs from that of human cognition.

Summary for Lay Audience

Have you ever wondered how the latest, highly popular generative artificial intelligence (AI) models like ChatGPT manage to understand and respond to your questions? At the heart of both human and artificial “thinking” is the ability to understand causes and their effects. This is becoming increasingly important as models such as ChatGPT continue to improve and evolve. This study explores how well these models, known as large language models, grasp questions that require them to think about causes and effects, and whether changing how we pose these questions can influence their response.

We studied these models using 1392 questions designed to test their ability to think about various cause-and-effect scenarios. These questions covered scenarios that required thinking about connections between different events, taking action to change events, and imagining hypothetical events. We found that these models are better at answering questions about direct actions and their immediate results than questions about connections or hypothetical scenarios.

Interestingly, when we tried to help the models by breaking down the questions into step-by-step instructions, a method we thought would improve their answers, it did not help as much as we expected. In fact, simpler, more direct questions without extra guidance led to better answers. This discovery suggests that the way these generative models “think” about causes and effects might be quite different from how humans do, offering insights into both the potential and limitations of artificial intelligence in understanding complex reasoning.

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS