Rate limiting looks simple until the requirements stop being simple. The system needs to feel fair, resist bursts when necessary, and stay cheap enough to enforce on every request. That means the right design depends on what you are optimizing for.
Clarify the policy first
Before touching data structures, define:
- Is the limit per user, API key, IP, or tenant?
- Do you want to allow short bursts?
- Is occasional approximation acceptable?
- What happens on rejection: hard fail, queue, or degrade?
These answers shape the algorithm much more than people expect.
Option 1: Token bucket
Token bucket is often the most practical default for APIs.
How it works
- Tokens are added at a steady refill rate.
- Each request consumes a token.
- If no token remains, reject the request.
Why teams like it
- Easy burst support
- Predictable steady-state behavior
- Compact state per key
Where it can hurt
- Distributed refill logic needs care
- Exact fairness across replicas is harder without centralized coordination
Option 2: Leaky bucket
Leaky bucket is better when you want smoother output.
The bucket drains at a fixed rate regardless of burstiness. That makes it useful when downstream systems need a steadier request shape, but it is less flexible than token bucket for user-facing API bursts.
Option 3: Sliding-window counter
This is a nice compromise when you want stronger fairness than fixed windows without storing every timestamp.
Instead of keeping all request times, you combine:
- the current bucket count
- the previous bucket count
- the fraction of overlap with the previous window
That gives a weighted estimate that reduces boundary spikes.
Storage choices
In-memory
Good for:
- single-node deployments
- edge-local enforcement
- low coordination needs
Weakness:
- state disappears on restart
- global consistency is weak
Redis
Good for:
- centralized enforcement
- atomic increments
- TTL-based cleanup
Weakness:
- network hop on the critical path
- hot keys under extreme concentration
Interview framing
When asked to design a rate limiter, organize the answer like this:
- Define the policy and fairness model.
- Choose the algorithm that matches burst behavior.
- Decide where state lives.
- Explain scaling, fault tolerance, and observability.
Metrics worth exposing
You want visibility into:
- accepted requests
- rejected requests
- top limited keys
- clock skew issues
- storage latency if using Redis or another shared store
A practical recommendation
If the problem is a general public API, token bucket with Redis-backed state is a strong baseline. It is intuitive, production-proven, and easy to explain in an interview. If strict smoothness matters more than burst tolerance, move toward leaky bucket.
Closing thought
Rate limiting is not about memorizing three named algorithms. It is about deciding what kind of unfairness you are willing to tolerate and what operational cost you are willing to pay for tighter control.