RAG (Retrieval-Augmented Generation)

A technique that enhances LLM responses by first retrieving relevant documents from a knowledge base and including them in the prompt. RAG lets you give a model access to private data, current information, or domain-specific knowledge without fine-tuning. Running RAG locally requires enough hardware for both the embedding model (lightweight) and the LLM (the real bottleneck). The main hardware consideration is VRAM for the LLM plus storage for the vector database.

More Terms

Pruning QLoRA RLHF (Reinforcement Learning from Human Feedback)ROCm Speculative Decoding

Back to Glossary

RAG (Retrieval-Augmented Generation)

Related Products

Related Articles

More Terms