Posts

Dear readers, I hope that you are as curious as I am and join me on this learning journey. So, get your curiosity and development environment ready and let’s get started. 🙂 To begin with, I will guide you through some hopefully not so boring terminology. What is RAG? R = Retrieval A = Augmented G = Generation Flow without RAG Original prompt usually as an user specific input. The original prompt is sent directly to the LLM. The LLM responds based on a large amount of generic data that it was trained and is likely to be out of date. Flow with RAG Original prompt usually as an user specific input. Retrieval of additional context and information based on the domain. For example, a company specific data. The original prompt is considered together with the extra context that has been retrieved in step 2 and both are sent to the LLM. LLM responds based on the up to date information it has been provided with. What needs to be done to build RAG? Collect and create the context specific data - This can be achieved by, for example, with another LLM technique called embedding language models, that converts textual data into numerical representation and stores it in a vector database. Please consider that this is also a huge transformation step. This is due to the fact that you need to have the data and also at best to store it in LLM-friendly format, e.g. format that the LLM understands. In this way, retrieving this data later and giving it back on the LLM will be a more smooth and time efficient approach. Retrieve the relevant context information - in this step, the original prompt that is in a text format is transformed into a vector representation and matched with the vector databases. Augment the prompt for the LLM - in this step RAG augments the original prompt by adding the retrieved specific information. In addition, you need to keep your extra knowledge source up to date as well. Technology Decisions and Use Case I am going to demonstrate how to build RAG using Spring AI with text input. ...