Understanding RAG Part I: Why It’s Needed
Natural language processing (NLP) is an area of artificial intelligence (AI) aimed at teaching computers to understand written and verbal human language and interact with humans by using such a language. Whilst traditional NLP methods have been studied for decades, the recent emergence of large language models (LLMs) has virtually taken over all developments in the field. By combining sophisticated deep learning architectures with the self-attention mechanism capable of analyzing complex patterns and interdependences in language, LLMs have revolutionized the field of NLP and AI as a whole, due to the wide range of language generation and language understanding tasks they can address and their range of applications: conversational chatbots, in-depth document analysis, translation, and more.
LLM Capabilities and Limitations
The largest general-purpose LLMs launched by major AI firms, such as OpenAI’s ChatGPT models, mainly specialize in language generation, that is, given a prompt — a query, question, or request formulated by a user in human language — the LLM must produce a natural language response to that prompt, generating it word by word. To make this seemingly arduous task possible, LLMs are trained upon extremely vast datasets consisting of millions to billions of text documents ranging from any topic(s) you can imagine. This way, LLMs comprehensively learn the nuances of human language, mimicking how we communicate and using the learned knowledge to produce “human-like language” of their own, enabling fluent human-machine communication at unprecedented levels.
There’s no doubt LLMs have meant a big step forward in AI developments and horizons, yet they are not exempt from their limitations. Concretely, if a user asks an LLM for a precise answer in a certain context (for instance, the latest news), the model may not be able to provide a specific and accurate response by itself. The reason: LLMs’ knowledge about the world is limited to the data they have been exposed to, particularly during their training stage. An LLM would normally not be aware of the latest news unless it keeps being retrained frequently (which, we are not going to lie, is an overly expensive process).
What is worse, when LLMs lack ground information to provide a precise, relevant, or truthful answer, there is a significant risk they may still generate a convincing-looking response, even though that means formulating it upon completely invented information. This frequent problem in LLMs is known as hallucinations: generating inexact and unfounded text, thereby misleading the user.
Why RAG Emerged
Even the largest LLMs in the market have suffered from data obsolescence, costly retraining, and hallucination problems to some degree, and tech giants are well aware of the risks and impact they constitute when these models are used by millions of users across the globe. The prevalence of hallucinations in earlier ChatGPT models, for instance, was estimated at around 15%, having profound implications for the reputation of organizations using them and compromising the reliability and trust in AI systems as a whole.
This is why RAG (retrieval augmented generation) came onto the scene. RAG has unquestionably been one of the major NLP breakthroughs following the emergence of LLMs, due to their effective approach to addressing the LLM limitations above. The key idea behind RAG is to synthesize the accuracy and search capabilities of information retrieval techniques typically used by search engines, with the in-depth language understanding and generation capabilities of LLMs.
In broad terms, RAG systems enhance LLMs by incorporating up-to-date and truthful contextual information in user queries or prompts. This context is obtained as a result of a retrieval phase before the language understanding and subsequent response generation process led by the LLM.
Here’s how RAG can help addressing the aforementioned problems traditionally found in LLMs:
- Data obsolescence: RAG can help overcome data obsolescence by retrieving and integrating up-to-date information from outer sources so that responses reflect the latest knowledge available
- Re-training costs: by dynamically retrieving relevant information, RAG reduces the necessity of frequent and costly re-training, allowing LLMs to stay current without being fully retrained
- Hallucinations: RAG helps mitigate hallucinations by grounding responses in factual information retrieved from real documents, minimizing the generation of false or made-up responses lacking any truthfulness
At this point, we hope you gained an initial understanding of what RAG is and why it arose to improve existing LLM solutions. The next article in this series will dive deeper into understanding the general approach to how RAG processes work.