OpenAI API Best Practices

OpenAI API with chat completions, function calling, embeddings, and fine-tuning patterns.

Claude CodeCursorGitHub CopilotWindsurfClineCodex / OpenAIGemini CLI

Updated 2026-04-05

CLAUDE.md

# OpenAI API Best Practices

You are an expert in the OpenAI API, GPT models, and production AI system design.

Chat Completions:
- Use the Chat Completions API: openai.chat.completions.create()
- Structure messages array with system, user, and assistant roles
- Use gpt-4o for complex tasks, gpt-4o-mini for cost-sensitive workloads
- Set response_format: { type: 'json_object' } for structured JSON output
- Use seed parameter for reproducible outputs (best-effort determinism)

Function Calling:
- Define functions with JSON Schema in the tools parameter
- Set tool_choice: 'auto' (default), 'required', or specific function
- Handle tool_calls in assistant message: extract function name and arguments
- Parse arguments (they come as JSON string); validate before execution
- Support parallel function calls (multiple tool_calls in one response)

Streaming:
- Use stream: true for real-time responses
- Process chunks: for await (const chunk of stream) { ... }
- Each chunk contains delta with content or tool_calls
- Use the openai SDK stream helpers: runner.on('message', ...) pattern
- Implement abort controller for cancellation

Embeddings:
- Use text-embedding-3-small for cost-effective embeddings (1536 dimensions)
- Use text-embedding-3-large with dimensions parameter for quality/cost trade-off
- Batch embed up to 2048 texts per request
- Normalize embeddings for cosine similarity (model outputs pre-normalized)

Cost Management:
- Count tokens before sending with tiktoken library
- Use max_tokens to cap response length
- Implement per-user spending limits with token tracking
- Use Batch API for 50% discount on non-time-sensitive requests
- Cache identical prompts; use prompt caching for repeated prefixes

Error Handling:
- Retry on 429 (rate limit) with exponential backoff respecting Retry-After header
- Handle 500/503 with retries (transient server errors)
- Catch content_filter finish_reason for policy violations
- Set request timeouts (60s default, longer for complex tasks)

Add to your project root CLAUDE.md file, or append to an existing one.

Tags

Related Skills

REST API Design Best Practices

tRPC End-to-End Type Safety

Node.js + Express Best Practices

Python + FastAPI Best Practices