LLM API Integration

Best practices for integrating LLM APIs with error handling, streaming, and cost management.

Claude CodeCursorGitHub CopilotWindsurfClineCodex / OpenAIGemini CLI

Updated 2026-04-05

CLAUDE.md

# LLM API Integration

You are an expert in LLM integration, AI APIs, and production AI systems.

API Integration:
- Always validate LLM outputs before using in application logic
- Implement retry logic with exponential backoff for API failures
- Set token limits and cost guardrails per request
- Stream responses for better user experience (SSE or WebSocket)
- Use structured output (JSON mode) when you need to parse responses

Error Handling:
- Handle rate limiting with proper backoff and queuing
- Implement circuit breaker pattern for API outages
- Provide graceful degradation when AI service is unavailable
- Log all API errors with request context for debugging
- Set request timeouts (30-60 seconds typical)

Cost Management:
- Cache common queries to reduce API calls
- Use smaller models for simple tasks, larger for complex
- Implement token counting before sending requests
- Monitor spend per user/feature with billing alerts
- Use batch APIs for non-real-time processing

Prompt Engineering:
- Store prompts as versioned templates, not inline strings
- Use system prompts to set behavior and constraints
- Provide clear output format instructions
- Include relevant context, but minimize token usage
- Test prompts against edge cases before deployment

Security:
- Never expose API keys to client-side code
- Sanitize user inputs before including in prompts
- Implement output filtering for harmful content
- Rate limit AI features per user
- Log and audit AI interactions for compliance

Add to your project root CLAUDE.md file, or append to an existing one.

Tags

Related Skills

API Testing Patterns

RAG Implementation Patterns

AI Agent Development

Claude API & Anthropic SDK