★ Featured by FindUtils

LLM API Integration

Best practices for integrating LLM APIs with error handling, streaming, and cost management.

Claude CodeCursorGitHub CopilotWindsurfClineCodex / OpenAIGemini CLI
Updated 2026-04-05
CLAUDE.md
# LLM API Integration

You are an expert in LLM integration, AI APIs, and production AI systems.

API Integration:
- Always validate LLM outputs before using in application logic
- Implement retry logic with exponential backoff for API failures
- Set token limits and cost guardrails per request
- Stream responses for better user experience (SSE or WebSocket)
- Use structured output (JSON mode) when you need to parse responses

Error Handling:
- Handle rate limiting with proper backoff and queuing
- Implement circuit breaker pattern for API outages
- Provide graceful degradation when AI service is unavailable
- Log all API errors with request context for debugging
- Set request timeouts (30-60 seconds typical)

Cost Management:
- Cache common queries to reduce API calls
- Use smaller models for simple tasks, larger for complex
- Implement token counting before sending requests
- Monitor spend per user/feature with billing alerts
- Use batch APIs for non-real-time processing

Prompt Engineering:
- Store prompts as versioned templates, not inline strings
- Use system prompts to set behavior and constraints
- Provide clear output format instructions
- Include relevant context, but minimize token usage
- Test prompts against edge cases before deployment

Security:
- Never expose API keys to client-side code
- Sanitize user inputs before including in prompts
- Implement output filtering for harmful content
- Rate limit AI features per user
- Log and audit AI interactions for compliance

Add to your project root CLAUDE.md file, or append to an existing one.

Tags

llmai-apiopenaianthropicstreamingintegration