Release candidate — 1.0.0-rc.1
← Back to blog

Schema Drift Is the Bug You Ship to Production

Oleksandr Zhuravlov

A provider you depend on renames total to total_amount, or turns a number into a string, or starts returning null where it never did before. Nobody told you. Your build is green. Your tests are green. The deploy goes out. And then, hours later, a chart renders NaN, an order total comes out undefined, and a pager goes off — for a code path you didn't touch.

This is schema drift, and it's a peculiarly nasty class of bug because every guardrail you'd expect to catch it was looking the wrong way. The change happened on a machine you don't own, to a contract you only assumed. The first runtime that sees the new shape is production.

Why types don't save you here

The intuitive defense is TypeScript. You typed the response, so surely a shape change is a compile error?

It isn't. Your types describe what you expected the response to be — they're a claim you wrote down once, frozen at the moment you wrote it. The compiler checks your code against that claim, not the claim against reality. When the upstream JSON stops matching your interface, nothing recompiles, because nothing in your source changed. The types are a fossil of an old agreement, and the bytes on the wire have quietly moved on.

Tests don't catch it either, and for a sharper reason: most suites mock the upstream. The fixture is a snapshot of the old shape — the shape you captured back when you wrote the test. So the mock and the types agree perfectly, the assertions pass, and the green check tells you your code is correct about a response that no longer exists. The one thing that would catch the drift — a real call to the real API returning the real new shape — is exactly what a unit test is designed not to do.

So the breakage slips through the net by construction. Types check source against a frozen claim; tests check code against a frozen fixture. Drift is a change to neither of those — it's a change to the live response, and the only place a live response shows up is at runtime, in production, on the unlucky request.

Make the silent change loud

The fix isn't more types or more mocks. It's to compare the response you actually receive against a baseline you committed, on the calls you actually make, and to treat any difference as a signal you can route.

StitchAPI does this with leveled drift detection. You wrap a stitch's output schema in drift(), point it at a committed snapshot file, and every live response is diffed against that baseline. Each difference becomes a finding, classified by how much you care:

import { drift, stitch } from 'stitchapi';
import { z } from 'zod';

const getOrder = stitch({
    baseUrl: 'https://api.example.com',
    path: '/orders/{id}',
    output: drift(
        z.object({
            id: z.number(),
            total: z.number(),
            note: z.string().optional(),
        }),
        {
            critical: ['id', 'total'], // breaking these is fatal
            watch: ['note'], // changes here are a warning
            snapshotFile: 'orders.contract.json',
        },
    ),
});

The orders.contract.json file is the load-bearing part. It's a baseline you check into the repo — the shape you agreed to — and it's the thing the live response gets measured against. Generate it deliberately with stitch drift generate, commit it, and now "the upstream changed" has a place to register. The committed snapshot is what turns an invisible delta into a visible one: without a baseline, a renamed field is just undefined three layers downstream; with one, it's a named finding on a known path.

Drift detects the three changes that actually break integrations — a field that went missing, one whose type changed, and one that became nullable. Those are exactly the moves that slip past a frozen type and a stale mock.

The leveling is the point

Detecting a change is half the value. The other half is deciding what it means — and not every change means the same thing.

If total goes missing, you're shipping broken order totals; that should stop the call cold. If a note field changes wording, you'd like to know, but it shouldn't take down checkout. If the provider adds a brand-new field you've never used, that's not a problem at all — it might even be an opportunity. Collapsing all three into one "drift!" alarm gives you something you'll learn to ignore.

So drift findings carry a level, and you set the policy:

  • error — a path you listed as critical broke. The finding rides the event stream as a drift event and then the call fails with STITCH_DRIFT, because you declared this field load-bearing and it moved.
  • warn — a watched (or otherwise non-critical) field changed. You get a drift event, the call still succeeds. You watch the field erode for a release or two before it becomes fatal.
  • info — a new field appeared that your contract doesn't mention. Streamed as a drift event, never fatal — a nudge that the API grew something you might want.

You choose which paths are fatal. That's the design: the same upstream change is a page-me emergency on one field and a shrug on another, and only you know which is which.

Reacting to a drift event

Because every call is an event stream — start → progress → drift → result → done, with await as sugar over it — drift isn't a separate channel you have to wire up. It's an event in the same stream you already consume. When you only want the value, await the stitch and an error-level finding throws STITCH_DRIFT for you. When you want to see findings as they happen, iterate the stream:

for await (const event of getOrder({ params: { id: 7 } }).stream()) {
    if (event.type === 'drift') {
        const { level, path, change } = event.finding;
        // e.g. level: 'warn', path: 'note', change: 'type-changed'
        log.warn(`drift on ${path}: ${change}`, { level });
    }
    if (event.type === 'result') {
        renderOrder(event.value); // typed, validated, post-drift
    }
}

In practice a warn finding flows to your logs (or a Sentry breadcrumb, or a Pino line) where you'll see a field start to wobble before it breaks; an error finding fails the call so the broken path can't quietly hand back garbage. The change you'd otherwise have discovered from a production incident becomes a line in your trace, attached to the exact path and change that moved.

When this isn't worth it — and the honest limits

Drift detection earns its keep against APIs you don't control and can't pin: third-party vendors, internal services owned by another team, anything with a habit of shipping "minor" response tweaks. For an endpoint you own and version yourself, a plain output schema is usually enough — you'll change both sides in the same commit.

And the caveats matter, because drift is a real tool with real edges:

  • It observes the responses you actually receive — it is not a contract test. Drift checks the live calls your code makes; it doesn't probe every endpoint or every field with synthetic traffic. A field that only appears in a response path you never exercise won't be checked until something calls it. If you need exhaustive coverage of an API surface, that's a contract-testing job, not a drift job.
  • Snapshots are maintenance. A committed baseline is a thing you own. When the upstream change is intended, you have to regenerate (stitch drift generate --force) and re-commit the snapshot, or every call will keep flagging the new-but-correct shape. A stale baseline is noise, and noise gets muted.
  • The severity policy is yours, including the mistakes. Mark too many paths critical and you'll fail calls over cosmetic changes; mark too few and a real break slips through as a warn nobody reads. Drift gives you the dial — it doesn't tell you where to set it.
  • First-call baselining is a footgun in prod. With no committed snapshot, the first live call writes one and reports nothing — convenient in dev, dangerous in production where it silently persists whatever the first response happened to be. Set readonly: true so a missing baseline surfaces a finding instead of being written as a side effect.

None of that changes the core trade. Without a baseline, a quietly renamed field is an undefined you discover from a production incident. With one, it's a drift event on a named path, leveled by a policy you set — caught on the request that actually drifted, not on the one that finally fell over.

Try it

npm i stitchapi

Wrap an output in drift(), commit the snapshot, and pick which fields are fatal. The full mechanics — readonly baselines, generating snapshots in CI, and the STITCH_DRIFT error — are in the Leveled drift guide, with the broader picture in Validation and the event stream.