Back to all writing
System Design21 Jun 20263 min readIntermediate

Rate Limiter Design: Picking the Right Tradeoff

A grounded walkthrough of token bucket, leaky bucket, and sliding-window counters, with guidance on when each approach wins.

rate-limitingbackendscalability

Rate limiting looks simple until the requirements stop being simple. The system needs to feel fair, resist bursts when necessary, and stay cheap enough to enforce on every request. That means the right design depends on what you are optimizing for.

Clarify the policy first

Before touching data structures, define:

  • Is the limit per user, API key, IP, or tenant?
  • Do you want to allow short bursts?
  • Is occasional approximation acceptable?
  • What happens on rejection: hard fail, queue, or degrade?

These answers shape the algorithm much more than people expect.

Option 1: Token bucket

Token bucket is often the most practical default for APIs.

How it works

  • Tokens are added at a steady refill rate.
  • Each request consumes a token.
  • If no token remains, reject the request.

Why teams like it

  • Easy burst support
  • Predictable steady-state behavior
  • Compact state per key

Where it can hurt

  • Distributed refill logic needs care
  • Exact fairness across replicas is harder without centralized coordination

Option 2: Leaky bucket

Leaky bucket is better when you want smoother output.

The bucket drains at a fixed rate regardless of burstiness. That makes it useful when downstream systems need a steadier request shape, but it is less flexible than token bucket for user-facing API bursts.

Option 3: Sliding-window counter

This is a nice compromise when you want stronger fairness than fixed windows without storing every timestamp.

Instead of keeping all request times, you combine:

  • the current bucket count
  • the previous bucket count
  • the fraction of overlap with the previous window

That gives a weighted estimate that reduces boundary spikes.

Storage choices

In-memory

Good for:

  • single-node deployments
  • edge-local enforcement
  • low coordination needs

Weakness:

  • state disappears on restart
  • global consistency is weak

Redis

Good for:

  • centralized enforcement
  • atomic increments
  • TTL-based cleanup

Weakness:

  • network hop on the critical path
  • hot keys under extreme concentration

Interview framing

When asked to design a rate limiter, organize the answer like this:

  1. Define the policy and fairness model.
  2. Choose the algorithm that matches burst behavior.
  3. Decide where state lives.
  4. Explain scaling, fault tolerance, and observability.

Metrics worth exposing

You want visibility into:

  • accepted requests
  • rejected requests
  • top limited keys
  • clock skew issues
  • storage latency if using Redis or another shared store

A practical recommendation

If the problem is a general public API, token bucket with Redis-backed state is a strong baseline. It is intuitive, production-proven, and easy to explain in an interview. If strict smoothness matters more than burst tolerance, move toward leaky bucket.

Closing thought

Rate limiting is not about memorizing three named algorithms. It is about deciding what kind of unfairness you are willing to tolerate and what operational cost you are willing to pay for tighter control.