Distributed Throttle and Shared Sessions Across Instances
Oleksandr Zhuravlov
You set a throttle on a stitch — say two requests a second to a metered upstream — and on your laptop it behaves exactly as declared. Then you deploy three instances behind a load balancer, or your handler runs as a serverless function that scales to a dozen concurrent isolates, and the upstream starts handing you 429s you can't explain. The math is the problem: three processes each enforcing two-per-second is six-per-second at the wall. Your real upstream rate is N times your cap, where N is however many copies of your code happen to be running right now.
The throttle isn't broken. It's doing precisely what it was told — for one process. The default state lives in memory, and memory isn't shared across instances.
What actually lives in the store
A stitch keeps a small amount of state between calls. By default it lives in an in-process store — a plain object in the heap of the one process that made the call. Two pieces of that state are the ones that go wrong when you scale out:
- Throttle counters. The rate limiter counts requests inside a time window and the number in flight, then makes the next caller wait if it would breach the cap. Those counters are the whole mechanism. If every process keeps its own, every process limits itself in isolation and the aggregate sails past the cap.
- Auth sessions. A
cookieSessionlogs in once, captures the cookie, replays it, and re-logs-in when the wall returns. That captured cookie — and any token anoauth2strategy caches — also lives in the per-process store. Across instances each process runs its own login: three instances, three logins against an upstream that may itself rate-limit logins, and a re-auth on one worker doesn't help the other two.
In-process is the right default: for a single long-lived process it's correct, zero-latency, and needs nothing to operate. It only becomes wrong the moment a second copy of your code shares the same upstream budget or account — and the fix isn't a different limiter, it's a different place to keep the counters.
The store is a seam
StitchAPI puts that per-process state behind one seam: a StitchStore, a small interface a seam reads from instead of the heap. Core ships no backend of its own — the default is in-process — but because the throttle and the session machinery already talk to the store through that interface, moving them off-box is a swap, not a rewrite. Attach a driver to the seam's store field and the call sites don't change at all.
The three first-party drivers are thin peer-dependency packages:
@stitchapi/redis— for servers (and the edge via Upstash's HTTP Redis)@stitchapi/cloudflare-kv— for Cloudflare Workers and Pages@stitchapi/deno-kv— for Deno Deploy, Node, or Bun
import { fromIoredis, redisStore } from '@stitchapi/redis';
import Redis from 'ioredis';
import { seam } from 'stitchapi';
const api = seam({
baseUrl: 'https://api.example.com',
throttle: { rate: '2/s', concurrency: 4, scope: 'host' },
// The one line that makes it fleet-wide:
store: redisStore(fromIoredis(new Redis(process.env.REDIS_URL))),
});The throttle declaration is unchanged from the single-process version — this article assumes you already know what rate and concurrency mean (if not, the throttle guide covers the semantics). The only new thing is store. With it attached, the counters increment in Redis instead of the local heap, so every instance pointed at the same Redis shares one budget. Two-per-second stays two-per-second whether one process is running or twelve.
Sessions ride along for free. Give two stitches the same key plus a shared store and they share one login — capture the cookie once, on whichever worker logs in first, and every other worker replays it instead of re-authenticating:
import { cookieSession, env, stitch } from 'stitchapi';
const me = stitch({
baseUrl: 'https://api.example.com',
path: '/me',
auth: cookieSession({
login,
cookie: '*',
key: 'acme-session', // shared identity for the session
loginInput: () => ({
body: { user: env('USER')(), pass: env('PASS')() },
}),
}),
store: redisStore(fromIoredis(client)), // same store across instances
});Which driver fits which deployment
The drivers aren't interchangeable, and the line that separates them is one operation: atomic increment. A distributed throttle needs the store to increment a counter exactly once under concurrency — if two workers both read 5, both write 6, you've undercounted and the cap leaks. Shared cache and sessions don't need that; they only need get and set.
That single requirement decides the fit:
| Driver | Best for | Atomic incr → distributed throttle |
|---|---|---|
@stitchapi/redis | Node servers; edge via Upstash HTTP | Yes — a single Lua INCR + PEXPIRE |
@stitchapi/deno-kv | Deno Deploy, Node, or Bun | Yes — an atomic compare-and-set loop |
@stitchapi/cloudflare-kv | Cloudflare Workers / Pages | No — last-write-wins; back it with a Durable Object |
For a fleet of servers, reach for Redis: it has a real atomic INCR, so the counters increment exactly once and the limiter paces evenly across the fleet. Deno KV gives the same guarantee through a compare-and-set loop, and on Deno Deploy the same handle is replicated globally — a natural fit at the edge.
Workers KV is the honest exception. It's last-write-wins get/put with no atomic increment, so a throttle counter on it would undercount under concurrency and silently break the cap. Rather than pretend, cloudflareKvStore(...).incr(...) throws a documented error. KV stays the right backend for the read-heavy halves — cache and shared sessions — which is usually exactly what an edge handler needs. For a distributed throttle at the edge, pair KV with a Cloudflare Durable Object (the single-writer counter an atomic increment requires) or run the Redis driver over Upstash's HTTP API.
The deciding question isn't "server or edge" — it's "do I need a shared
throttle, or just shared cache and sessions?" Cache and sessions work on
any of the three; a distributed throttle needs atomic incr, which is what
splits Workers KV off from the other two.
When this isn't worth it
A shared store is the right answer to a real problem, but it has a price, and a single process never has to pay it:
- It adds a network hop and a dependency to operate. Every throttle check and session read now crosses the wire to Redis or KV instead of touching local memory — latency on the hot path, and one more thing that can be down, misconfigured, or saturated. On a single process you'd be paying all of that to fix a problem you don't have; keep the in-process default.
- The caps become approximate under contention, not exact. Even with atomic
incr, a distributed limiter coordinates across the network, and brief overshoot is possible when many workers race the same window. Treat a shared throttle as "close to N/s across the fleet," not a hard guarantee — keep a little headroom under the upstream's real ceiling. - Eventually-consistent edge KV is looser still. Workers KV reads can lag writes by design, and its TTLs are coarse (seconds, 60-second floor). Fine for caching and sessions that tolerate a slightly stale cookie; a poor foundation for second-by-second accuracy — which is exactly why
incrthrows there rather than quietly lying. - One backend, many environments. Staging and production must not collide on the same keys; namespace with
keyPrefixso they share one store without stepping on each other's counters and sessions.
Stay in-process until you actually run more than one instance against the same upstream budget or account. The moment you do, the store is the one line that turns N independent limiters back into one — and N independent logins into one shared session.
Try it
npm i stitchapiThen attach a driver and your throttle and sessions go fleet-wide without touching a call site:
- Distributed stores — the Redis, Workers KV, and Deno KV drivers, with the atomic-
incrtrade-off in full. - Pluggable store — the
StitchStoreinterface the drivers implement. - Distributed throttle and Shared sessions — the exact pacing and session-sharing semantics.
Point every instance at the same store, and the cap you wrote is the cap the upstream sees.