Release candidate — 1.0.0-rc.1
← Back to blog

You Might Not Need an MCP Server Per Integration

Oleksandr Zhuravlov

You wire your agent to its first service and it goes well: install (or stand up) an MCP server, point the agent at it, watch the tools light up. Then you add a second service. A third. Now you have three MCP servers to launch, three sets of credentials to inject, three processes whose health you have to watch — and the agent's context window is carrying every tool each of them exposes, on every turn, whether it uses them or not.

That's the pattern this post is about: one MCP server per integration, each exposing a fat catalog of tools. It's the path of least resistance, and for a single rich vendor server it's often the right one. But as the count climbs, "one server per service" turns into an operational and context cost that has nothing to do with the work the agent is actually doing. The alternative isn't fewer integrations — it's one runtime exposing many of them through a single MCP surface.

What "a server per integration" actually costs

A dedicated MCP server is a real process with a real surface area. The first one is free; the cost is in the multiplier.

  • N processes to run and supervise. Each server is something that starts, can crash, needs a port or a stdio pipe, and shows up in your deployment story. Three integrations is three things to keep alive; twelve is twelve.
  • N auth boundaries to wire and secure. Every server needs its upstream credential — a token, a key pair, an OAuth client — configured, rotated, and scoped. Each one is a place a secret can leak. The agent, meanwhile, is trusted to call all of them.
  • N tool catalogs in the context window. This is the quiet one. Each server advertises its tools, and an MCP client hands those schemas to the model. A server with twenty tools is twenty JSON schemas; five such servers is a hundred. That catalog is re-sent on every turn of the loop, used or not. Tool-schema sprawl is a tax you pay per turn, and it scales with the number of servers times the tools each one ships.

None of these is fatal on its own. Together, "add an integration" quietly means "add a process, an auth boundary, and a slice of every future turn's context." The reflex optimizes for the first integration and pays for it at the tenth.

The other shape: one runtime, many stitches, one MCP surface

StitchAPI is API stitching — you declare an endpoint once (its types, auth, and resilience) and call it like a local function, from your code, the CLI, or an agent over MCP. A stitch is that one typed callable; a seam groups stitches that share a base URL, auth, throttle budget, and store.

The relevant part here is how it reaches an agent. Instead of a server per service exposing a tool per endpoint, one StitchAPI runtime exposes a single, fixed code-mode surface over MCP — three tools, no matter how many stitches sit behind them:

  • run_stitch — execute a stitch by name with its input
  • list_stitches — discover what's available, on demand
  • describe_stitch — pull one stitch's shape, schema, and auth scheme only when the model needs it

Adding an integration is adding a stitch, not standing up a server. The MCP surface stays three tools whether you have four stitches or four hundred:

import { seam } from 'stitchapi';
import { bearer, env } from 'stitchapi';

// One seam, many endpoints — all reachable behind the same MCP surface.
const upstream = seam({
    baseUrl: 'https://api.example.com',
    auth: bearer(env('UPSTREAM_TOKEN')),
});

export const getUser = upstream.stitch({ path: '/users/{id}' });
export const listOrders = upstream.stitch({ path: '/users/{id}/orders' });
// Add the next endpoint here — the agent's tool list doesn't grow.

Start the surface once with stitch mcp, and the agent drives every stitch through run_stitch:

stitch mcp
{
    "name": "run_stitch",
    "input": { "name": "getUser", "input": { "params": { "id": 1 } } }
}

The model learns about a new stitch through list_stitches only on the turns it cares about, so the per-turn context stays flat as the catalog grows. (The token economics of catalogs versus code-mode is its own subject — there's a companion post on that, and the catalog-tax deep dive goes deeper still; here the point is the architecture, not the math.)

Where the three costs go

Map the same three pressures onto one runtime and they collapse:

  • One process, not N. stitch mcp is a single surface in front of every stitch you've declared. Adding the eleventh integration adds a declaration, not an eleventh thing to launch and watch.
  • One auth boundary, declared per stitch — and the secret never crosses it. Auth is a field on the stitch (bearer, apiKey, basic, cookieSession, oauth2), resolved at call time. The agent calls a stitch and gets data; it names a capability, the runtime holds the credential. describe_stitch even reports the auth scheme without ever exposing the token. So the agent boundary is a capability boundary, not a pile of credentials handed to the model.
  • One fixed tool surface. Three tools, constant. The schema you'd otherwise re-send for every endpoint of every server is replaced by on-demand discovery.

And because the stitch is the unit — not the server — the same definition is also an in-process function, a CLI command (stitch run), and an HTTP endpoint (stitch serve). The agent isn't getting a second-class copy of an integration that also exists elsewhere; it's calling the exact thing your code calls, with the same validation, retries, throttle, and drift detection declared on it.

Be fair: when a dedicated MCP server is the right call

This isn't an argument against MCP servers — code-mode is an MCP surface. It's an argument against reflexively running one per integration when the integrations are HTTP APIs you could declare. A dedicated server earns its keep when:

  • A vendor ships a rich, official MCP server. If the service you're integrating already maintains a well-built server — with curated tools, server-side logic, and ergonomics tuned for the model — adopting it is usually better than re-deriving the surface from raw endpoints. Don't rebuild what someone maintains for you.
  • The capability isn't an HTTP request. A stitch sits above a transport — strongest fit is HTTP, with graphql, sse, stream, download, llm, shell, and postmessage as peer surfaces. If a server's value is something outside that — driving a database session, orchestrating local processes, wrapping a non-HTTP protocol, holding rich server-side state — a purpose-built MCP server may model it better than a stitch would.
  • You specifically want that tool's ergonomics. Sometimes the curated, named-tool-per-action shape is the interface you want the model to see — a small, deliberately designed menu beats code-mode's "discover and call." If a tightly authored tool surface is the product, build it as one.

The honest framing: a dedicated server is right when someone else maintains a good one, when the capability isn't a plain HTTP call, or when the exact tool ergonomics are the point. A single runtime is right when you're otherwise about to stand up your own thin server around a handful of endpoints — and then another, and another.

When this isn't worth it

If you have exactly one integration, this is a non-decision: stand up the one server (or the one stitch) and move on. The N-versus-one math only bites once N grows.

Two more caveats, because overclaiming helps no one:

  • Consolidating doesn't make the integrations themselves simpler. Each upstream still has its own auth, its own quirks, its own failure modes. One runtime gives you one place to declare those, not a way to wish them away.
  • A great official server beats a hand-declared stitch for that service. If the vendor's MCP server is genuinely better than the endpoints you'd stitch by hand, use it — and let the StitchAPI runtime carry the long tail of services that don't have one.

The decision isn't "MCP servers, yes or no." It's "how many processes, auth boundaries, and tool catalogs do you want to carry to reach N services?" For a pile of HTTP APIs you control or consume, the answer can be one runtime, one surface, one credential boundary — and adding the next service is adding a stitch.

Try it

npm i stitchapi

Declare a couple of endpoints, run stitch mcp, and point your agent at the three-tool surface. See Use from an agent, the run_stitch & code-mode tool, and the MCP surface for starting the server and the transport details.