The usual way to handle a rate limit is to wait for the provider to enforce it. You fire requests as fast as your code produces them, and when one comes back 429 Too Many Requests, you retry, back off, and honor Retry-After. It works, and you should keep it — but notice what it costs. By the time the 429 arrives, you've already spent the request: a round trip out, a round trip back, and a rejection for your trouble. Do that in a loop and the rejections stack up, each one adding latency and, on a metered endpoint, sometimes a charge for the privilege of being told no.

Reacting is a backstop, not a plan. The plan is to not cross the line in the first place.

A stitch lets you declare a proactive throttle — a ceiling on how fast and how concurrently it calls an upstream — so requests are paced before they leave your process. Instead of discovering the limit by tripping it, you tell the stitch what the limit is and let it hold you under it.

Two different controls

The throttle is two independent caps, and conflating them is the most common mistake. They answer different questions.

Rate cap (rate) — how fast you may call. In StitchAPI this is a minimum spacing between successive calls, written as a string like '5/s'. It's about pace over time, not a burst allowance: five per second means roughly one every 200ms, not five at once followed by silence.
Concurrency cap (concurrency) — how many at once. This bounds in-flight requests regardless of how quickly they complete. It's the control that matters when an upstream tolerates a steady stream but falls over if you open fifty sockets to it simultaneously.

They're orthogonal. A slow-but-parallel API wants a concurrency cap and a loose rate. A fast-but-serial one wants a tight rate and concurrency: 1. Most real limits are a combination, and you can set either on its own or both together.

import { stitch } from 'stitchapi';

const search = stitch({
    baseUrl: 'https://api.example.com',
    path: '/search',
    // Pace to five calls/sec, never more than two in flight at once.
    throttle: { rate: '5/s', concurrency: 2 },
});

When a call has to wait its turn, it isn't silently stalled — the stitch emits a progress event with phase throttled on its event stream, so a queued call is visible to a trace rather than looking like a hang. You can watch the backpressure happen instead of inferring it from latency graphs.

A provider's rate limit is almost never per-endpoint. It's per account, per token, per host — and your code probably hits that one provider from several different stitches. If each stitch enforces its own budget, three stitches at 5/s apiece can put 15/s on a provider that only allows five, and you're back to meeting 429s in production.

scope: 'host' fixes this. It pools one limiter across every stitch that targets the same host, so the budget is shared the way the provider actually counts it.

import { seam } from 'stitchapi';

const api = seam({
    baseUrl: 'https://api.example.com',
    // One account-wide budget, shared by every member below.
    throttle: { rate: '5/s', concurrency: 4, scope: 'host' },
});

const search = api.stitch({ path: '/search' });
const getItem = api.stitch({ path: '/items/{id}' });
const listItems = api.stitch({ path: '/items' });

Now search, getItem, and listItems draw from one 5/s budget against api.example.com, no matter which one a given caller reaches for. This is exactly the case a seam is built for — a group of stitches sharing one base, one auth, and one runtime — but scope: 'host' pools the budget across separate stitches too, whenever they hit the same host. Host-scoped pooling works in-process with no extra setup.

Proactive and reactive, together

Going proactive doesn't mean throwing away the reactive path — it means the reactive path stops being your primary defense and becomes the safety net it should have been all along. Both live on the same declaration:

const search = stitch({
    baseUrl: 'https://api.example.com',
    path: '/search',
    // Proactive: stay under the limit so most 429s never happen.
    throttle: { rate: '5/s', concurrency: 2, scope: 'host' },
    // Reactive: if one slips through anyway, back off and honor Retry-After.
    retry: { attempts: 3, on: [429, 503], respectRetryAfter: true },
});

The two cover different failure shapes. The throttle handles the limit you know about — the documented account ceiling you can pace yourself under. The retry handles the limit you don't: a burst from another process sharing the same token, a provider that tightened its limit without telling you, a transient 503 that has nothing to do with rate at all. When a 429 does land, respectRetryAfter defers to the server's Retry-After header instead of guessing. Throttle reduces how often you reach for the net; the net is still there for when you do.

When this isn't worth it

A proactive throttle is only as good as the number you give it, and it carries one sharp caveat.

You have to know — or estimate — the real limit. The throttle paces you to the cap you declare, not to the provider's actual ceiling. Set it too high and it does nothing the 429 path wasn't already doing; set it too low and you've throttled yourself below a limit that was never going to bite. If a provider documents its limit, use that. If it doesn't, start conservative, watch for 429s, and tighten. This is the work the control asks of you, and there's no way around it.
The default limiter is per-process. The in-process throttle counts in memory, in one process. Run two instances and each keeps its own count — together, N workers can hit a host at up to N times the rate you declared, and a host-scoped budget silently stops honoring an account-wide limit. For a single process or a CLI run, that's fine. To make the budget span workers you need a shared store; the mechanics are a topic of their own, covered in the distributed throttle write-up.
Low-volume callers won't notice. If you make a handful of calls and never come close to the limit, the throttle is dead weight — accurate, but solving a problem you don't have. The reactive 429 path alone is enough until your volume starts brushing the ceiling.

None of this makes the reactive path obsolete. It just stops you from leaning on it for a limit you could have stayed under in the first place.

Try it

npm i stitchapi

Declare a throttle on a stitch, pair it with retry as a backstop, and the next time you brush a provider's rate limit you'll pace under it instead of bouncing off it. The full field list and scope semantics are in the Throttle guide; the reactive side is in Retry & backoff.

Two different controls

One limiter across stitches that share a provider

Proactive and reactive, together

When this isn't worth it

Try it