Where tools fail in production
If you've debugged a stuck agent, you know the moment: the model says it called the tool, the trace says the tool returned 200, and the customer says nothing happened. The model got JSON it didn't recognize and improvised; the integration looks fine; only the eval suite (if you have one) catches it.
Tool design is where agents stop being demos. Five engineering moves prevent the bulk of these failures.
- Strongly-typed input + output schemas
Zod or equivalent on both sides. Reject malformed inputs from the model; reject malformed outputs from upstream. Both fail closed.
- Idempotency keys on every state-changing tool
Retries are inevitable. Idempotency keys make them safe. The agent generates a stable key per logical action; the tool deduplicates server-side.
- Explicit failure modes mapped to retry / escalate / refuse
Every tool returns a structured failure code; the orchestrator knows which codes retry, which escalate, which refuse. No silent failures.
- Bounded retries with exponential backoff and circuit-breakers
Bound retries to 3; backoff geometrically; trip a circuit-breaker after N failures across a window. Runaway retry loops are how agents bill $400.
- Versioned tool contracts
Tools have versions. The agent calls a specific version. Upstream changes don't silently break the agent - they require a documented contract bump.
export const issueRefund = defineTool({
name: "issue_refund",
version: "v2.1", // contract version
description: "Issue a refund to a customer's order",
// Strongly typed input
input: z.object({
order_id: z.string().uuid(),
amount_cents: z.number().int().positive().max(500_00),
reason: z.enum(["damaged", "wrong_item", "other"]),
idempotency_key: z.string().min(16), // required for retries
}),
// Strongly typed output - model can't pretend success
output: z.object({
refund_id: z.string().uuid(),
status: z.enum(["completed", "pending_review", "rejected"]),
}),
// Explicit failure modes mapped to orchestrator behavior
errors: {
NOT_FOUND: { policy: "refuse", userMsg: "Order not found." },
AMOUNT_OVER_CAP: { policy: "escalate", userMsg: "Refund requires review." },
UPSTREAM_5XX: { policy: "retry", maxRetries: 3, backoffMs: [1000, 3000, 9000] },
IDEMPOTENT_DUPE: { policy: "succeed", reuseLastResponse: true },
},
// Server-side handler (idempotent)
async handle(input) {
const existing = await findRefundByIdempotencyKey(input.idempotency_key);
if (existing) return existing;
return await processRefund(input);
},
});Fuzz your tools with adversarial model output
An underused engineering practice: fuzz-test your tools by sampling 1,000 model outputs against the schema. Look for the malformed cases the model produces (missing fields, wrong types, partial JSON). Most production failures are in this set.
We run this at engagement kickoff: it surfaces the schema bugs that would have hit production in week 3 and pushes them into week 1 where they're cheap to fix.
| Pattern | Frequency | Fix |
|---|---|---|
| Missing required field | ~12% of attempts | Schema rejects; orchestrator retries with explicit field requirement |
| Wrong type (string for number) | ~5% | Schema rejects; coerce safely or fail closed |
| Hallucinated field | ~3% | Strict schemas reject extra fields |
| Partial JSON (truncated) | ~2% | Stream completion; reject if not closed |
| Markdown wrapper around JSON | ~9% | Lenient parser strips fenced blocks |
References
- [1]Anthropic tool-use best practices - Anthropic docs (2025)
- [2]MCP (Model Context Protocol) spec - Anthropic (2024)
Frequently asked questions
Should we let the model see schema validation errors?
Yes - return the validation error to the model so it can correct on retry. Cap retries to 2 to bound the loop.
Does this slow agents down?
Marginally - schema validation is microseconds. The latency added is dwarfed by the LLM round-trip. The reliability gain is large.
What about MCP (Model Context Protocol)?
MCP standardizes how agents discover and call tools across vendors - useful for portability. The principles on this page (schemas, idempotency, failure modes) sit on top of MCP regardless.