Under heavy development
StitchAPI

Catch a silent selector rename in scraped HTML

Scrape an HTML page into a structured object with transform, then drift the structured shape so a markup or selector rename becomes a loud contract error instead of a silently missing field.

Task

You scrape a page that has no API — a catalog provider returns HTML, and you parse it into a structured object with a selector. The danger is silent: the provider renames a CSS class or moves a cell, your selector quietly stops matching, and the field comes back undefined. Nothing throws; downstream code keeps running on corrupted data until someone notices a ranking is wrong. You want that markup rename to surface the moment it happens, as a loud error on the structured shape — not the raw HTML.

Example

The response pipeline is transform → unwrap → validation, so transform reshapes the HTML into a structured object before drift ever runs — drift then compares the parsed value, not the raw markup. List the path that must not move under critical.

import { ,  } from 'stitchapi';
import {  } from 'zod';

// A trivial hand-rolled scraper: the `score` selector keys off `td.score` —
// exactly the class a markup rename silently breaks.
function (: unknown): { : <string, unknown>[] } {
    const  = ();
    const  = .(/<tr[^>]*class="row1"[^>]*>/i).(1);
    const  = .(() => {
        const : <string, unknown> = {};
        const  = /<td[^>]*class="title"[^>]*>([^<]+)<\/td>/i.();
        const  = /<td[^>]*class="score"[^>]*>\s*(\d+)\s*<\/td>/i.(
            ,
        )?.[1];
        if () ['title'] = [1]!.();
        if ( !== ) ['score'] = (); // omitted when the selector misses
        return ;
    });
    return {  };
}

const  = ({
    : 'https://api.example.com',
    : '/catalog',
    : , // HTML string -> { items: [...] }
    : 'items',
    : (
        .(.({ : .(), : .().() })),
        {
            // `[].score` is a path on the STRUCTURED shape, not the raw HTML.
            : ['[].score'],
            : 'catalog.contract.json',
        },
    ),
});

const  = await ();

How it works

The first call records the baseline into catalog.contract.json (check it into the repo) from the transformed shape — [{ title, score }], not the HTML body. Every call after parses the page and compares that structured value against the snapshot.

When the provider renames the score cell's class (scorerank), the selector stops matching and scrape drops score from each item. Because [].score is critical, drift emits an error-level finding — { level: 'error', change: 'missing', path: '[].score' } — which breaks the contract and throws STITCH_DRIFT. The path is the proof: [].score exists only on the parsed object, so a finding on it can only come from drift inspecting the transform output, not the raw markup. The silent undefined is now a loud, immediate failure.

warn and info findings (a non-critical field changing, a new field appearing) don't throw — they ride the event stream as drift events, so you can watch a field erode before it breaks. Reserve critical for the few paths whose loss corrupts your data.

Ship it read-only in production

The first call with no committed snapshot writes one and reports nothing — a convenience while you develop the contract, but a footgun in a deployed run, where the first request would silently persist a baseline and "pass" instead of detecting drift. Generate the baseline once, commit catalog.contract.json, then ship with readonly: true so drift detects but never writes:

const  = ({
    : 'https://api.example.com',
    : '/catalog',
    : ,
    : 'items',
    : (
        .(.({ : .(), : .().() })),
        {
            : ['[].score'],
            : true, // detect-but-never-write
            : 'error', // a missing baseline fails CI, not slips past
            : 'catalog.contract.json',
        },
    ),
});

Anti-pattern: don't drift the raw HTML body and skip transform. Snapshotting the markup makes every cosmetic edit — a reordered attribute, a whitespace change — look like drift, burying the one rename that matters. Parse to the structured shape first, then drift the few critical fields you actually depend on.

See also

On this page