What does Techimax do?

Techimax embeds forward-deployed engineers inside enterprises, SMBs, and non-tech businesses to ship production agentic AI - and the engineering to make it real. Web, mobile, backend, agents - any tech stack, any platform.

What industries do you serve?

Healthcare, banking and financial services, retail and ecommerce, telecom and media, entertainment and OTT, automotive, travel, education, real estate, energy, legal, manufacturing, and SaaS - across regulated enterprises, SMBs, and public sector.

How fast can you ship?

Forward-deployed engineers ship spec-to-production agents in days for routine work, and 4-6 weeks for full multi-agent platforms. Lightning Pods deliver daily releases by week two of every engagement.

HIPAA-grade agents: a working playbook for healthcare AI

What HIPAA actually asks for

HIPAA's Privacy Rule and Security Rule together govern protected health information (PHI). For LLM-powered agents this resolves to four engineering deliverables: covered processing (BAA), minimum necessary access, audit logging, and breach detection. None of these are optional; all of them are tractable.

The 2024–2026 wave of healthcare AI guidance - HHS Office for Civil Rights bulletins on tracking technologies [1], FDA's Good Machine Learning Practice principles [4], and NIST's AI Risk Management Framework Generative AI Profile [5] - converges on the same engineering checklist. Build to it once; satisfy multiple regulators.

The HIPAA-grade engineering checklist

BAA-covered model providers only
Anthropic Claude on AWS Bedrock, Azure OpenAI, GCP Vertex - all offer BAA tiers. Outside these, no PHI.
PHI redaction at the SDK boundary
Inbound prompts and tool inputs are redacted before they reach the model unless the BAA covers them. PHI tokens get round-tripped via a secure enclave.
RBAC at retrieval
Tenant + role + clinical relationship on every retrieval query. The patient's chart is filtered server-side before any LLM sees it.
Audit log per agent action
Append-only log: who asked, what data, what was returned, what happened next. Retained per state retention rules (typically 6 years).
Calibrated clinical eval suite
Cases include refusal cases for clinical advice, citation cases for any medical claim, and red-team cases for de-identification leakage.

Chart

Where HIPAA-grade agent effort lands in our healthcare engagements (last 12 months)

View data table· Source: Techimax healthcare engagement scope analysis, 2024–2026

Series	Value
PHI redaction + BAA wiring	22
Eval suite (clinical)	28
RBAC at retrieval	14
Audit logs + retention	12
Reviewer queues	11
Governance docs	13

PHI redaction: what to redact and where

The intuitive redaction pattern - "strip names and dates before sending to the model" - is wrong twice. First, it doesn't strip enough (HIPAA defines 18 identifier categories). Second, when the BAA covers the model, redaction strips information the model needs to do its job.

The right pattern: classify the deployment surface (BAA-covered? non-covered?) and redact only what isn't covered for that surface. Redact at the SDK boundary, not in application code; replace with stable tokens so retrieval and audit can re-link.

Surface	Strategy	Reasoning
Anthropic on Bedrock (BAA)	Pass PHI; log and audit	BAA covers; minimum necessary still applies
OpenAI direct API (no BAA in path)	Redact 18 identifiers; tokenize	Not BAA-covered; PHI cannot transit
Open-weight on customer GPUs	Pass PHI; encrypt at rest; rotate logs	Customer-owned; covered by infrastructure controls
Third-party tool calls	Tokenize before tool call; rehydrate after	Tool vendors typically not BAA-covered

Redaction strategy by deployment surface

Clinical evals: what 'good' means

Healthcare evals demand a different bar. A general-purpose agent eval might accept 90% pass-rate as production-ready; a clinical-adjacent agent often needs 99%+ on safety-critical refusals (no diagnosis without provider review; no medication advice; cite all clinical claims).

Calibrate against clinician review. We default to: every clinical-adjacent eval graded by an LLM grader, then sampled (typically 10%) for clinician review. Disagreement rate between LLM grader and clinician is reported; > 5% disagreement triggers a re-calibration sprint.

Chart · % pass rate

Clinical eval pass-rate target vs typical post-rescue baseline (n = 8 healthcare engagements)

View data table· Source: Techimax healthcare engagement data 2024–2026

Series	% pass rate
Refusal of clinical advice	84
Refusal of clinical advice	99
Citation accuracy on medical claims	71
Citation accuracy on medical claims	96
PHI de-identification on outputs	82
PHI de-identification on outputs	99
Drug-interaction recognition	76
Drug-interaction recognition	94

Human-in-loop calibration for clinical workflows

Healthcare is the canonical case for human-in-the-loop. The decision rule: agents draft, clinicians decide. Anything that materially affects diagnosis, treatment, medication, or risk classification routes to a human reviewer with a documented SLA. Anything that doesn't (intake summarization, scheduling, administrative notes) can be fully automated with a quality eval gate.

We track three metrics on the reviewer queue: queue depth (alarm at 1.5× SLA), override rate (alarm at any 2× sustained spike), and reviewer time per item (drift indicator for agent quality). Override rate is the canary: when it climbs, the agent is drifting; when it drops below 5%, the agent is over-cautious and routing too conservatively.

Decision class	Default policy	SLA	Reviewer
Diagnosis suggestion	Required review	< 4h business	Licensed clinician
Medication recommendation	Required review	< 1h	Pharmacist or clinician
Triage / risk classification	Required review on high-risk	< 30 min	Clinical lead
Intake summarization	Sample review	< 24h	Care coordinator
Scheduling action	Auto with audit	-	Audit only
Patient-facing message	Required review	< 2h	Care coordinator

Default human-in-loop policy by clinical decision class

What an audit trail actually contains

OCR investigators ask for specific artifacts. "We have a log" is insufficient. The audit trail must let an investigator reconstruct any decision - what data was retrieved, who could see it, what the agent did, what the clinician decided, when the decision was reversed if applicable. Each event needs an immutable timestamp, an actor (agent ID or user ID), and a payload sufficient to reconstruct context.

Storage: append-only with cryptographic hashing for tamper-evidence (we use Merkle-tree style chaining); 6-year retention default; encrypted at rest with customer-managed keys; access-logged at the row level. The same artifact serves SOC 2 (CC7.2 audit logging), HITRUST, and the OCR's investigation request.

Audit event schema we ship in HIPAA engagementsts

// Append-only event store. Every agent action emits one of these
// before the user sees any UI confirmation. Reconstruction is
// the design goal - investigator must be able to recreate context.
type AuditEvent = {
  event_id:        string;          // UUIDv7 - sortable, immutable
  timestamp:       string;          // ISO-8601 UTC
  actor_kind:      "agent" | "user" | "system";
  actor_id:        string;
  patient_token:   string;          // tokenized PHI key - not raw MRN
  agent_id?:       string;
  agent_version?:  string;
  model_id?:       string;
  retrieved_docs:  { source: string; doc_id: string; version: string }[];
  prompt_hash:     string;
  output_hash:     string;
  decision_class:  DecisionClass;
  reviewer_id?:    string;          // when human review applied
  override?:       boolean;         // reviewer overrode agent
  prev_event_id:   string;          // chain - Merkle integrity
  signature:       string;          // HMAC of canonical event body
};

Healthcare agents are not a strategy problem. They're an engineering problem with five known deliverables. Ship the deliverables and the compliance conversation gets shorter.

When does an agent become a medical device?

FDA jurisdiction kicks in when an agent's output materially affects diagnosis, treatment, or prevention. Decision support that surfaces information for a clinician to interpret is generally not regulated as a medical device under the 21st Century Cures Act exclusions. Software that recommends a specific course of treatment may be Software as a Medical Device (SaMD) and require clearance [4].

Engineering implication: classify your agent's outputs against the SaMD framework before shipping. Most enterprise healthcare agents we deploy fall outside SaMD because they surface citations and route to clinicians for interpretation. The line moves with how the agent is used, not just what it says - be intentional about the framing in your UI.

References

[1]HIPAA Security Rule - HHS (2024 (current))
[2]Anthropic BAA + Trust Center - Anthropic (2025)
[3]AWS HIPAA-eligible services - AWS (2025)
[4]FDA Good Machine Learning Practice for medical devices - FDA / Health Canada / MHRA (2024)
[5]AI Risk Management Framework Generative AI Profile - NIST (2024)
[6]21st Century Cures Act information-blocking rule - ONC (2024)
[7]HITRUST CSF v11 - HITRUST Alliance (2024)

Frequently asked questions

Are open-weight models acceptable for PHI?

Yes when self-hosted in customer-owned infrastructure with appropriate controls (encryption, RBAC, audit). The BAA question only applies when a third-party vendor processes PHI on your behalf.

What about state laws (CCPA, NY SHIELD)?

Layer on top of HIPAA. State laws typically add notice and rights obligations; the engineering deliverables on this page satisfy most state requirements as long as the audit trail is queryable.

Can we use these patterns for HITRUST certification?

Yes. HITRUST CSF includes the controls covered above. We pair with HITRUST assessors during engagement and have shipped agents into HITRUST-certified environments.

How long does this take?

6–10 weeks for a single clinical-adjacent agent, including governance documentation. We've seen organizations try to compress to 2 weeks; the result is debt that surfaces during audit.

Which model providers offer BAA coverage in 2026?

Anthropic Claude on AWS Bedrock (BAA via AWS), Azure OpenAI (BAA via Microsoft), and GCP Vertex AI (BAA via Google). Direct OpenAI API and direct Anthropic API also have BAA tiers for enterprise customers. Open-weight self-hosted on customer infrastructure doesn't require a BAA - the customer is the processor.

Do we need IRB review for AI features in research contexts?

If the AI is used in research with human subjects, yes - your IRB will want to review. Engineering implication: build the audit log to support research data extracts (de-identified per Safe Harbor or Expert Determination) without re-engineering.

How do we handle minor patient data?

Standard HIPAA pediatric considerations apply (parent/guardian consent, age-out at majority for some states). Engineering implication: tag patient records with age category; route minor-related actions through a stricter reviewer queue; log consent status with each access.

What about TEFCA and information-blocking rules?

ONC's information-blocking rules apply to certified health IT and may govern how an AI agent surfaces or withholds information [6]. Practical pattern: an agent that summarizes records visible to the patient via a portal must not selectively suppress the same information from the patient's view. Build to parity.

HIPAA-grade agents: a working playbook for healthcare AI

What HIPAA actually asks for

PHI redaction: what to redact and where

Clinical evals: what 'good' means

Human-in-loop calibration for clinical workflows

What an audit trail actually contains

When does an agent become a medical device?

References

Frequently asked questions

Ready to ship the patterns from this post?

Senior reply within 24h

Related field notes

AI safety in regulated industries: what auditors actually ask

Agent guardrails: prompt injection, jailbreaks, and exfiltration in production

From demo to production: the agentic AI engineering checklist