RAG (Retrieval-Augmented Generation)

A technique that enhances LLM responses by first retrieving relevant documents from a knowledge base and including them in the prompt. RAG lets you give a model access to private data, current information, or domain-specific knowledge without fine-tuning. Running RAG locally requires enough hardware for both the embedding model (lightweight) and the LLM (the real bottleneck). The main hardware consideration is VRAM for the LLM plus storage for the vector database.

Related Products

Related Articles

More Terms