EngineeringFor VP EngineeringFor Head of AI

Tool design for agents that survive ambiguity

Tool contracts are where most agent projects die in production. Strongly-typed schemas, idempotency keys, retry semantics, and failure-mode mapping turn fragile tool integrations into ones that don't break under adversarial inputs or upstream drift.

TTechimax EngineeringForward-deployed engineering team10 min read

Where tools fail in production

If you've debugged a stuck agent, you know the moment: the model says it called the tool, the trace says the tool returned 200, and the customer says nothing happened. The model got JSON it didn't recognize and improvised; the integration looks fine; only the eval suite (if you have one) catches it.

Tool design is where agents stop being demos. Five engineering moves prevent the bulk of these failures.

Five tool-design moves that compound
  • Strongly-typed input + output schemas

    Zod or equivalent on both sides. Reject malformed inputs from the model; reject malformed outputs from upstream. Both fail closed.

  • Idempotency keys on every state-changing tool

    Retries are inevitable. Idempotency keys make them safe. The agent generates a stable key per logical action; the tool deduplicates server-side.

  • Explicit failure modes mapped to retry / escalate / refuse

    Every tool returns a structured failure code; the orchestrator knows which codes retry, which escalate, which refuse. No silent failures.

  • Bounded retries with exponential backoff and circuit-breakers

    Bound retries to 3; backoff geometrically; trip a circuit-breaker after N failures across a window. Runaway retry loops are how agents bill $400.

  • Versioned tool contracts

    Tools have versions. The agent calls a specific version. Upstream changes don't silently break the agent - they require a documented contract bump.

Production-ready tool definition with all five movests
export const issueRefund = defineTool({
  name: "issue_refund",
  version: "v2.1",                  // contract version
  description: "Issue a refund to a customer's order",

  // Strongly typed input
  input: z.object({
    order_id: z.string().uuid(),
    amount_cents: z.number().int().positive().max(500_00),
    reason: z.enum(["damaged", "wrong_item", "other"]),
    idempotency_key: z.string().min(16),    // required for retries
  }),

  // Strongly typed output - model can't pretend success
  output: z.object({
    refund_id: z.string().uuid(),
    status: z.enum(["completed", "pending_review", "rejected"]),
  }),

  // Explicit failure modes mapped to orchestrator behavior
  errors: {
    NOT_FOUND:        { policy: "refuse",   userMsg: "Order not found." },
    AMOUNT_OVER_CAP:  { policy: "escalate", userMsg: "Refund requires review." },
    UPSTREAM_5XX:     { policy: "retry",    maxRetries: 3, backoffMs: [1000, 3000, 9000] },
    IDEMPOTENT_DUPE:  { policy: "succeed",  reuseLastResponse: true },
  },

  // Server-side handler (idempotent)
  async handle(input) {
    const existing = await findRefundByIdempotencyKey(input.idempotency_key);
    if (existing) return existing;
    return await processRefund(input);
  },
});

Fuzz your tools with adversarial model output

An underused engineering practice: fuzz-test your tools by sampling 1,000 model outputs against the schema. Look for the malformed cases the model produces (missing fields, wrong types, partial JSON). Most production failures are in this set.

We run this at engagement kickoff: it surfaces the schema bugs that would have hit production in week 3 and pushes them into week 1 where they're cheap to fix.

PatternFrequencyFix
Missing required field~12% of attemptsSchema rejects; orchestrator retries with explicit field requirement
Wrong type (string for number)~5%Schema rejects; coerce safely or fail closed
Hallucinated field~3%Strict schemas reject extra fields
Partial JSON (truncated)~2%Stream completion; reject if not closed
Markdown wrapper around JSON~9%Lenient parser strips fenced blocks
Common malformed outputs we see during tool-fuzz testing

References

  1. [1]Anthropic tool-use best practices - Anthropic docs (2025)
  2. [2]MCP (Model Context Protocol) spec - Anthropic (2024)

Frequently asked questions

Should we let the model see schema validation errors?

Yes - return the validation error to the model so it can correct on retry. Cap retries to 2 to bound the loop.

Does this slow agents down?

Marginally - schema validation is microseconds. The latency added is dwarfed by the LLM round-trip. The reliability gain is large.

What about MCP (Model Context Protocol)?

MCP standardizes how agents discover and call tools across vendors - useful for portability. The principles on this page (schemas, idempotency, failure modes) sit on top of MCP regardless.

Talk to engineering

Ready to ship the patterns from this post?

Tell us where you are. A senior forward-deployed engineer replies within 24 hours with a written plan tailored to your stack - never an SDR.

  • Practical engineering review of your current setup
  • Eval discipline + observability + cost controls
  • Free 60-min working session, no sales pitch

Senior reply within 24h

Drop your details and we'll match you with an engineer who's shipped in your industry.

By submitting, you agree to our privacy policy. We'll never share your information.