A large corpus of text documents containing billions of text tokens is used to train large language models (LLMs). It has been demonstrated that performance at tasks like closed book QA improves accuracy as the number of model parameters increases, and larger models can produce more accurate factual statements. Even the largest models, which appear relatively seldom in the training corpus, can fail, particularly on less well-known torso and tail distribution facts. When the model is flawed, they produce an alternative answer that generally appears realistic.
Beyond only predicting words to come, the most recent wave of language modeling research has concentrated on how well they can reason. Encouragement of language models to first construct internal thoughts or reasoning chains before replying and changing their original response through self-critique can lead to improved performance on reasoning challenges.
Researchers from Meta AI & ETH Zurich investigate how and when language-model-based reasoning can be applied to lessen hallucinations in the work presented here. They create a method known as Chain-of-Verification (CoVe), in which, given an initial draft response, they first plan verification questions to assess its effectiveness and then methodically respond to those questions to ultimately generate a better-amended response. The study shows that facts provided by independent verification questions typically are more accurate than those in the initial long-form response, increasing the entire response’s accuracy.
The team explores variations on this formula for various activities, including list-based queries, closed-book QA, and the creation of long-form content. As an alternative to the baseline language model, they first provide a combined method for creating the full verification chain from left to right, which enhances performance and reduces hallucinations. On the other hand, models who pay attention to current hallucinations in the context of their generations frequently repeat the hallucinations.
The researchers introduce factored variations to optimize the verification chain stages according to the situation. The results demonstrate how these factored variations improve performance further on the three tasks under consideration.
The team also showed that preventing the model from attending to its prior answers while responding to the verification questions (factored CoVe) reduces the likelihood of repeating the same hallucinations. Overall, this approach offers significant performance improvements over the response from the original language model simply by asking the same model to think about (check) its response. Equipping CoVe with the ability to apply tools, such as retrieval augmentation in the verification execution step, is a logical extension of this research that would undoubtedly result in more advantages.
Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
If you like our work, you will love our newsletter..
Dhanshree Shenwai is a Computer Science Engineer and has a good experience in FinTech companies covering Financial, Cards & Payments and Banking domain with keen interest in applications of AI. She is enthusiastic about exploring new technologies and advancements in today’s evolving world making everyone’s life easy.