Faculty
Computer Science
Supervisor Name
Apurva Narayan
Keywords
Artificial Intelligence, Machine Learning, Large Language Models, Natural Language Processing, Retrieval Augmented Generation Systems, MultiHop-RAG, Chat-GPT, GPT-4, Mistral-AI, Llama
Description
In recent years, the popularization of large language model (LLM) applications such as ChatGPT has made it easy for anyone to access new knowledge and solve problems. However, these LLM applications come with precaution; often, the LLMs powering these applications can provide misleading or entirely incorrect answers referred to as hallucinations. Hallucinations can occur for many reasons, one of which is due to short- comings in the dataset used to train the LLM. In combatance to such events, re- searchers have devised a new method of response generation known as Retrieval Augmented Generation (RAG). However, inadequate response quality emerges in the system when handling complex multi-hop queries, which require retrieving and reasoning over multiple pieces of supporting evidence. In this paper, we will implement and benchmark a novel RAG system called MultiHop-RAG designed to handle multi-hop queries specifically. We will provide an instructive procedure for building the MultiHop-RAG system and demonstrate its utility by deriving benchmarks and comparing them against existing RAG systems.
Acknowledgements
Thank you to Dr. Apurva Narayan, Rishabh Agrawal, the Western USRI program, and the Faculty of Computer Science for all their support.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License
Document Type
Paper
Included in
Artificial Intelligence and Robotics Commons, Databases and Information Systems Commons, Other Computer Sciences Commons, Software Engineering Commons, Systems Architecture Commons, Theory and Algorithms Commons
MultiHop-RAG: A Longitudinal Study on its Implementation and Benchmarks
In recent years, the popularization of large language model (LLM) applications such as ChatGPT has made it easy for anyone to access new knowledge and solve problems. However, these LLM applications come with precaution; often, the LLMs powering these applications can provide misleading or entirely incorrect answers referred to as hallucinations. Hallucinations can occur for many reasons, one of which is due to short- comings in the dataset used to train the LLM. In combatance to such events, re- searchers have devised a new method of response generation known as Retrieval Augmented Generation (RAG). However, inadequate response quality emerges in the system when handling complex multi-hop queries, which require retrieving and reasoning over multiple pieces of supporting evidence. In this paper, we will implement and benchmark a novel RAG system called MultiHop-RAG designed to handle multi-hop queries specifically. We will provide an instructive procedure for building the MultiHop-RAG system and demonstrate its utility by deriving benchmarks and comparing them against existing RAG systems.