Recently, I was assigned to build an RAG system to mitigate LLM’s hallucination on proprietary documents. I can take advantage of the given pre-trained BERT/Roberta models and an API towards a gte model as an encoding service as project resources. Therefore, I don’t need to build the system from scratch. Here’s something I learned from this project from obstacles and my decisions.
RAG, a bird’s-eye view.
Let’s quickly review RAG for those who did not get acquainted with it. If you already know about it, skip this section.
RAG, Retrieval Augmented Generation, is a practical approach to alleviate the hallucination problem of LLM. Hallucination means a status when LLM cannot respond to users’ input correctly, even concoct misinformation in a severe tone. This usually happens when LLM meet a problem which does not exist in its training dataset. RAG fixes this problem by furnishing the LLM with relevant information as prompts embedded into the user’s input prompts. This was first formulated in (Jiang et al., 2023) by a team whose members are from Facebook, UCL, and NYU. The name of RAG indicates that the system comprises three components (or processes): Retrieval, Augmentation, and Generation. Making each part happens can produce an LLM, which tells jargon.
(more…)