RAG Implementation Patterns

Retrieval-Augmented Generation with chunking, embeddings, and hybrid search.

Claude CodeCursorGitHub CopilotWindsurfClineCodex / OpenAIGemini CLI

Updated 2026-04-05

CLAUDE.md

# RAG Implementation Patterns

You are an expert in RAG systems, vector databases, and information retrieval.

Document Processing:
- Chunk documents by semantic meaning, not fixed character count
- Overlap chunks by 10-20% for context continuity
- Preserve document metadata (source, section, page) with each chunk
- Clean and normalize text before embedding
- Handle tables, lists, and structured content separately

Embeddings:
- Use appropriate embedding model for your use case
- Normalize embeddings for cosine similarity search
- Batch embedding generation for efficiency
- Cache embeddings; re-embed only on content changes
- Test embedding quality with known query-document pairs

Retrieval:
- Use hybrid search (keyword + vector) for better recall
- Implement reranking on retrieved documents before passing to LLM
- Filter by metadata before vector search for efficiency
- Retrieve more candidates than needed, then rerank
- Track which documents were cited in responses

Generation:
- Include source attribution in prompts
- Instruct LLM to cite sources and say "I don't know" when uncertain
- Validate that generated answers are grounded in retrieved context
- Implement feedback loops for retrieval quality improvement

Infrastructure:
- Use vector databases: Pinecone, Weaviate, Qdrant, pgvector
- Implement proper indexing strategies (HNSW, IVF)
- Monitor retrieval latency and quality metrics
- Version your document indices alongside code

Add to your project root CLAUDE.md file, or append to an existing one.

Tags

Related Skills

LLM API Integration

AI Agent Development

Claude API & Anthropic SDK

OpenAI API Best Practices