Rate Limiter Design: Picking the Right Tradeoff

A grounded walkthrough of token bucket, leaky bucket, and sliding-window counters, with guidance on when each approach wins.

Rate limiting looks simple until the requirements stop being simple. The system needs to feel fair, resist bursts when necessary, and stay cheap enough to enforce on every request. That means the right design depends on what you are optimizing for.

Clarify the policy first

Before touching data structures, define:

Is the limit per user, API key, IP, or tenant?
Do you want to allow short bursts?
Is occasional approximation acceptable?
What happens on rejection: hard fail, queue, or degrade?

These answers shape the algorithm much more than people expect.

Option 1: Token bucket

Token bucket is often the most practical default for APIs.

How it works

Tokens are added at a steady refill rate.
Each request consumes a token.
If no token remains, reject the request.

Why teams like it

Easy burst support
Predictable steady-state behavior
Compact state per key

Where it can hurt

Distributed refill logic needs care
Exact fairness across replicas is harder without centralized coordination

Option 2: Leaky bucket

Leaky bucket is better when you want smoother output.

The bucket drains at a fixed rate regardless of burstiness. That makes it useful when downstream systems need a steadier request shape, but it is less flexible than token bucket for user-facing API bursts.

Option 3: Sliding-window counter

This is a nice compromise when you want stronger fairness than fixed windows without storing every timestamp.

Instead of keeping all request times, you combine:

the current bucket count
the previous bucket count
the fraction of overlap with the previous window

That gives a weighted estimate that reduces boundary spikes.

Storage choices

In-memory

Good for:

single-node deployments
edge-local enforcement
low coordination needs

Weakness:

state disappears on restart
global consistency is weak

Redis

Good for:

centralized enforcement
atomic increments
TTL-based cleanup

Weakness:

network hop on the critical path
hot keys under extreme concentration

Interview framing

When asked to design a rate limiter, organize the answer like this:

Define the policy and fairness model.
Choose the algorithm that matches burst behavior.
Decide where state lives.
Explain scaling, fault tolerance, and observability.

Metrics worth exposing

You want visibility into:

accepted requests
rejected requests
top limited keys
clock skew issues
storage latency if using Redis or another shared store

A practical recommendation

If the problem is a general public API, token bucket with Redis-backed state is a strong baseline. It is intuitive, production-proven, and easy to explain in an interview. If strict smoothness matters more than burst tolerance, move toward leaky bucket.

Closing thought

Rate limiting is not about memorizing three named algorithms. It is about deciding what kind of unfairness you are willing to tolerate and what operational cost you are willing to pay for tighter control.