Cut Your LLM Bill From Both Ends
Oleksandr Zhuravlov
StitchAPI attacks AI cost on two axes — the tokens you spend describing tools to a model, and the redundant upstream calls your agents make. Here's the mechanism behind each.
Notes on API stitching, agent-native integrations, and cutting AI cost.
Oleksandr Zhuravlov
StitchAPI attacks AI cost on two axes — the tokens you spend describing tools to a model, and the redundant upstream calls your agents make. Here's the mechanism behind each.
Oleksandr Zhuravlov
The one-tool-per-endpoint pattern re-sends every endpoint's JSON schema on every turn of an agent loop. Here's why that cost compounds, and how code-mode keeps the standing context flat as your catalog grows.
Oleksandr Zhuravlov
When an AI agent is the caller, raw fetch() is the wrong primitive — it leaks the credential, returns unvalidated bytes, and has no retry, throttle, timeout, or trace. Here's the case for handing the agent a stitch instead.
Oleksandr Zhuravlov
Handing an agent your API key leaks the secret into prompts, logs, and traces. StitchAPI keeps the credential behind the seam — the caller gets the ability to make one call, never the key itself.
Oleksandr Zhuravlov
A third-party field quietly changes shape and your code keeps compiling, your tests keep passing, and the breakage only shows up live. Here's how leveled drift detection turns that silent failure into a visible signal.
Oleksandr Zhuravlov
How StitchAPI's read-through cache and in-process request coalescing actually work — the derived, principal-scoped cache key, the coalescing window, and what's safe to put behind them.
Oleksandr Zhuravlov
Most clients only find a rate limit by hitting it. A declared throttle keeps you under the limit instead — here's the difference between a rate cap and a concurrency cap, and how it composes with the reactive retry path.
Oleksandr Zhuravlov
When an upstream dependency is already down or crawling, naive retries make its outage yours. Here's how a circuit breaker plus layered timeouts turn that failure mode into a bounded, declared policy.
Oleksandr Zhuravlov
The same stitch declaration is callable as an in-process function, a CLI command, an HTTP endpoint, and an MCP tool — no rewrite, and the resilience, auth, and drift you declared once apply identically through all four.
Oleksandr Zhuravlov
Spec-based generators turn an OpenAPI document into a fully typed client. Runtime stitching turns a single URL into a validated function. Here's an honest, axis-by-axis comparison — and when each one is the right call.
Oleksandr Zhuravlov
The reflex when wiring an agent to a service is to stand up a dedicated MCP server for it. Here's the cost of N servers, when one runtime exposing many stitches is the better shape, and when a bespoke server still wins.
Oleksandr Zhuravlov
Hosted workflow platforms and a library like StitchAPI solve overlapping but different problems. A neutral comparison on deployment model, where the logic lives, operational surface, and the unit of work — including when each is the better fit.
Oleksandr Zhuravlov
Take a streaming stitch and expose it as a text/event-stream Response from a Next.js App Router route handler with @stitchapi/next, then consume the deltas on the client.
Oleksandr Zhuravlov
How to hand a model a stitch as a callable tool with @stitchapi/vercel-ai — so it gets typed, validated data while the credential stays behind the seam, and your declared resilience comes along for free.
Oleksandr Zhuravlov
A throttle that's correct on one process quietly multiplies by N when you scale out, and per-process logins re-authenticate on every worker. StitchAPI's pluggable state store moves both into shared storage so the limiter and cookieSession coordinate fleet-wide.
Oleksandr Zhuravlov
The same stitch plugs into each major UI framework via a thin binding, surfacing loading, error, drift, and streaming as idiomatic reactive state. Here's the shared core that makes that consistent.