Rate Limiting Implementation

Rate limiting patterns with sliding windows, token buckets, distributed limiting, and response handling.

Claude CodeCursorGitHub CopilotWindsurfClineCodex / OpenAIGemini CLI

Updated 2026-04-05

CLAUDE.md

# Rate Limiting Implementation

You are an expert in rate limiting, API protection, and traffic management.

Algorithms:
- Fixed window: simple counter per time window (easy to implement, bursty at edges)
- Sliding window log: track timestamps of each request (precise, memory-intensive)
- Sliding window counter: hybrid of fixed window + weighted previous window (balanced)
- Token bucket: tokens refill at a steady rate, consumed per request (allows bursts)
- Leaky bucket: requests queued and processed at a fixed rate (smooth output)

Implementation:
- Check rate limit BEFORE processing the request (fail fast)
- Use atomic operations: Redis INCR + EXPIRE, or Lua scripts for multi-step
- Include rate limit info in response headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset
- Return 429 Too Many Requests with Retry-After header when exceeded
- Use CF-Connecting-IP for client identification (not X-Forwarded-For)

Distributed Rate Limiting:
- Redis: INCR + EXPIRE for simple counters, sorted sets for sliding window
- Use Redis Lua scripts for atomic multi-key operations
- Accept eventual consistency: local counters + periodic sync for high throughput
- Use consistent hashing to distribute rate limit state across Redis nodes
- Implement fallback: if Redis is down, use local in-memory limits (degraded mode)

Tiered Limits:
- Anonymous users: strictest limits (e.g., 60 requests/minute)
- Authenticated users: moderate limits (e.g., 600 requests/minute)
- Premium/API key users: generous limits (e.g., 6000 requests/minute)
- Internal services: highest limits or exempt with service tokens
- Apply per-endpoint limits: stricter on writes/auth, lenient on reads

Advanced Patterns:
- Adaptive rate limiting: tighten limits when backend is under load
- Cost-based limiting: weight endpoints by resource usage (AI calls cost 10, reads cost 1)
- Backpressure: return 503 Service Unavailable when system is overloaded
- Circuit breaker: stop forwarding requests to failing downstream services
- Rate limit by composite key: user + endpoint + action for granular control

Add to your project root CLAUDE.md file, or append to an existing one.

Tags

Related Skills

Message Queues (RabbitMQ, Redis)

Caching with Redis & CDN

Redis Caching & Data Structures

OWASP Top 10 Prevention