API errors your agents can fix themselves.

72% of production MCP agents fail a reliability benchmark. SelfHeal is a drop-in proxy that catches MCP and API failures, analyzes them with an LLM, and returns a structured fix your agent can retry with. Timeout cascades, malformed tool calls, auth drift — gone. Works with any MCP-compatible framework.

Successful calls = free. You only pay credits when SelfHeal actually fixes a failure.

Install: pip install graceful-fail or npm install graceful-fail

$29/team/mo flat, pay-per-heal via x402, or 500 calls free. No credit card required.

Three failure modes your framework doesn't catch

They happen at the protocol boundary. Retry loops and fallbacks miss them. SelfHeal operates exactly there.

  • Tool call validation errors (36.7%) — Production MCP servers accept malformed tool calls and return 400s with no actionable detail. Your agent retries the same bad payload until it burns through its token budget.
  • Timeout cascades (60-sec stalls) — Routine when 8+ MCP servers are chained. One slow upstream poisons every downstream call. Your agent sits idle waiting for a response that's never coming.
  • Auth drift mid-session (silent) — Token refreshes fail silently. Your agent keeps sending expired creds. Upstream returns 401 → agent retries → same 401 → burns through the retry budget → customer sees "agent offline."

Key stats

  • ~5ms pass-through latency on successful requests
  • Zero credentials exposed — stripped before LLM analysis
  • 100% pass-through transparency on 2xx/3xx
  • Free on successful requests — credits only used on failures

Intercept. Analyze. Return a structured fix.

POST through SelfHeal, get results back. No API key required for the free tier.

  1. Your agent sends API calls through the proxy. Set X-Destination-URL to point at the real MCP server or API.
  2. Successful responses (2xx/3xx) pass through verbatim with ~5ms added latency. No credits consumed.
  3. Failed responses (4xx/5xx) are intercepted, analyzed by an LLM, and returned with a structured JSON envelope containing retriability, error category, human-readable explanation, and a suggested payload diff.

Credential-safe: Authorization, Cookie, API keys, and OAuth tokens are stripped before any error reaches the LLM. Framework-agnostic: works with LangGraph, CrewAI, AutoGen, custom agents, or raw HTTP.

Two ways to pay. Same engine.

x402 for agents, flat-fee for teams. Your choice.

  • Reliability Plan — $29/team/mo: Flat fee. Unlimited calls under fair use (10K/mo soft cap). Up to 10 developers, Slack + email alerts on circuit-open, priority support, monthly reliability report. 14-day free trial.
  • x402 Outcome Pricing — $0.003/successful heal (USDC): Pay-per-heal via HTTP 402 + Base L2 settlement. Zero subscription, zero auth, zero billing setup. Your agent's wallet IS the API key. Best for high-variance workloads and agent-to-agent commerce.
  • Hobby — Free: 500 requests/month. All features, capped volume. 1 developer, community Discord. For testing and personal agents.

Stop paying the LLM tax

Ran npx llm-tax and saw the bloat? See the one-line fix on the SelfHeal llm-tax page — SelfHeal's normalize feature returns only the fields your agent uses. Free for 500 calls/month.

Ship the integration today

Three setups cover 90% of production agent stacks.

  • LangGraph agents — Drop-in integration. 5-minute setup. Works with any LangGraph chain that touches MCP or external APIs.
  • CrewAI crews — Existing integration shipped in our SDK. Wrap any Crew's tool calls without changing agent logic.
  • Custom agents (OpenAI / Anthropic SDK) — HTTP proxy pattern with a 3-line setup. Works with raw httpx/fetch calls to any MCP server or HTTP API.

Python SDK

Install: pip install 'graceful-fail[langchain]'. Available on PyPI. Supports sync, async, LangChain, and CrewAI.

Node.js / TypeScript SDK

Install: npm install graceful-fail. Available on npm. Full TypeScript support, ESM and CJS.

Your MCP agent just hit a 422. Does it crash — or fix itself?

Stop babysitting agent retries. SelfHeal gives them a structured fix before the retry budget burns.

Start 14-day trial