Large language models have become extremely powerful today; they can help provide answers to some of our hardest questions. But they can also lead us astray: they tend to hallucinate, which means that they give answers that seem right, but aren’t.
Here, we’ll look at three methods to stop LLMs from hallucinating: Retrieval-augmented generation (RAG), reasoning, and iterative querying.
With
Once the relevant documents have been retrieved, the query, along with these documents, is used by the LLM to summarize a response for the user. This way, the model doesn’t have to rely solely on its internal knowledge but can access whatever data you provide it at the right time. In a sense, it provides the LLM with “long-term memory” that it doesn’t possess on its own. The model can provide more accurate and contextually appropriate responses by including proprietary data stored in the vector database.
An alternate RAG approach incorporates fact checking. The LLM is prompted for an answer, which is then fact checked and reviewed against data in the vector database. An answer to the query is produced from the vector database, and then the LLM in turn uses that answer as a prompt to discern whether it's related to a fact.
LLMs are very good at a lot of things. They can predict the next word in a sentence, thanks to advances in “transformers,” which transform how machines understand human language by paying varying degrees of attention to different parts of the input data. LLMs are also good at boiling down a lot of information into a very concise answer, and finding and extracting something you’re looking for from a large amount of text. Surprisingly, LLMS can also plan – they can literally gather data and plan a trip for you.
And maybe even more surprisingly, LLMs can use reasoning to produce an answer, in an almost human-like fashion. Because people can reason, they don’t need tons of data to make a prediction or decision. Reasoning also helps LLMs to avoid hallucinations. An example of this is “
This method helps models to break multi-step problems into intermediate steps. With chain-of-thought prompting, LLMs can solve complex reasoning problems that standard prompt methods can’t (for an in-depth look, check out the blog post
If you give an LLM a complicated math problem, it might get it wrong. But if you provide the LLM with the problem as well as the method of solving it, it can produce an accurate answer – and share the reason behind the answer. A vector database is a key part of this method, as it provides examples of questions similar to this and populates the prompt with the example.
Even better, once you have the question and answer, you can store it back in the vector database to further improve the accuracy and usefulness of your generative AI applications.
There are a host of other reasoning advancements you can learn about, including
The third method to help reduce LLM hallucinations is interactive querying. In this case, an AI agent mediates calls that move back and forth between an LLM and a vector database. This can happen multiple times iteratively, in order to arrive at the best answer. An example of this forward-looking active retrieval generation, also known as FLARE.
You take a question, query your knowledge base for more, similar questions. You’d get a series of similar questions. Then you query the vector database with all the questions, summarize the answer, and check if the answer looks good and reasonable. If it doesn’t, repeat the steps until it does.
Other advanced interactive querying methods include
There are many tools that can help you with agent orchestration.
Another such tool is
The company pulls from a wide variety of data that is both structured and unstructured to provide AI-generated answers to prompts like “How many residents are currently on Medicare?” SkyPoint CEO Tisson Mathew told me recently. This helps care providers make informed decisions quickly, based on accurate data, he said.
Getting to that point, however, was a process, Mathew said. His team started by taking a standard LLM and fine-tuning it with SkyPoint data. “It came up with disastrous results – random words, even,” he said. Understanding and creating prompts was something SkyPoint could handle, but it needed an AI technology stack to handle generating accurate answers at scale.
SkyPoint ended up building a system that ingested structured data from operators and providers, including electronic healthcare record and payroll data, for example. This is stored in a columnar database; RAG is used to query it. Unstructured data, such as policies and procedures and state regulations, is stored in a vector database:
Tisson posed a question as an example: What if a resident becomes abusive? Astra DB provides an answer that is assembled based on state regulations and the users context and a variety of different documents and
“These are specific answers that have to be right,” Tisson said. “This is information an organization relies on to make informed decisions for their community and for their business.”
SkyPoint AI illustrates the importance of mitigating the risk of AI hallucinations; the consequences could be potentially dire without the methods and tools available to ensure accurate answers.
With RAG, reasoning, and iterative querying approaches such as FLARE, generative AI – particularly when fueled by proprietary data – is becoming an increasingly powerful tool to help enterprises serve their customers efficiently and effectively.
By Alan Ho, DataStax
Learn more about how DataStax helps you
Also published here.