Faculty

Computer Science

Supervisor Name

Apurva Narayan

Keywords

Artificial Intelligence, Machine Learning, Large Language Models, Natural Language Processing, Retrieval Augmented Generation Systems, MultiHop-RAG, Chat-GPT, GPT-4, Mistral-AI, Llama

Description

In recent years, the popularization of large language model (LLM) applications such as ChatGPT has made it easy for anyone to access new knowledge and solve problems. However, these LLM applications come with precaution; often, the LLMs powering these applications can provide misleading or entirely incorrect answers referred to as hallucinations. Hallucinations can occur for many reasons, one of which is due to short- comings in the dataset used to train the LLM. In combatance to such events, re- searchers have devised a new method of response generation known as Retrieval Augmented Generation (RAG). However, inadequate response quality emerges in the system when handling complex multi-hop queries, which require retrieving and reasoning over multiple pieces of supporting evidence. In this paper, we will implement and benchmark a novel RAG system called MultiHop-RAG designed to handle multi-hop queries specifically. We will provide an instructive procedure for building the MultiHop-RAG system and demonstrate its utility by deriving benchmarks and comparing them against existing RAG systems.

Acknowledgements

Thank you to Dr. Apurva Narayan, Rishabh Agrawal, the Western USRI program, and the Faculty of Computer Science for all their support.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Document Type

Paper

Share

COinS
 

MultiHop-RAG: A Longitudinal Study on its Implementation and Benchmarks

In recent years, the popularization of large language model (LLM) applications such as ChatGPT has made it easy for anyone to access new knowledge and solve problems. However, these LLM applications come with precaution; often, the LLMs powering these applications can provide misleading or entirely incorrect answers referred to as hallucinations. Hallucinations can occur for many reasons, one of which is due to short- comings in the dataset used to train the LLM. In combatance to such events, re- searchers have devised a new method of response generation known as Retrieval Augmented Generation (RAG). However, inadequate response quality emerges in the system when handling complex multi-hop queries, which require retrieving and reasoning over multiple pieces of supporting evidence. In this paper, we will implement and benchmark a novel RAG system called MultiHop-RAG designed to handle multi-hop queries specifically. We will provide an instructive procedure for building the MultiHop-RAG system and demonstrate its utility by deriving benchmarks and comparing them against existing RAG systems.

 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.