AI agents on FHIR

Privacy-Preserving, Certifiable AI Agents on FHIR

Healthcare AI agents that stay inside the user's authorization boundary, produce accurate audit trails, and have a clear path to medical-device certification. The privacy posture lives in the FHIR backend, not in a wrapper around the LLM. Evoleen offers pre-built agents from its Agent Lake, co-development, and certified hosting where the regulatory path calls for them.

What you can build

  • Clinician copilots

    Answer patient-specific questions (current medications, HbA1c trend, last imaging report) inside the clinician's authorization scope. Audit logs record reads against the actual clinician identity, not a generic AI account.

  • Patient-facing assistants

    Patients ask natural-language questions about their own records. The patient's bearer token limits the agent to that single patient's compartment.

  • Cohort analytics agents

    Translate "how many diabetic patients had HbA1c above 7% last quarter" into deterministic FHIR queries executed by a narrowly scoped service identity. The model interprets the request; deterministic code retrieves and counts.

  • Operational and triage agents

    Coordinate task creation, scheduling, and follow-up communications under operation-specific permissions, with optional human confirmation before write-back.

  • Read-only assistants today, controlled write-back when ready

    Start with retrieval and summarization. Introduce write-back later as a higher-control tier with agent-specific roles and resource-type restrictions.

  • Audit trail across the full request

    Every FHIR access logged against the user's resolved identity. Combined with agent-side prompt/completion logs, you can reconstruct what the agent retrieved for a given user question.

How it works

  1. 1
    User authenticates through your app

    OAuth 2.0 / OIDC against your identity provider. The bearer token encodes the user's identity (clinician, patient, RelatedPerson, or service identity).

  2. 2
    App calls the agent service with the user's bearer token

    The agent service is a pass-through for authentication. It never exchanges the token for a more privileged credential.

  3. 3
    Agent service forwards the token on every FHIR call

    Each retrieval against Fire Arrow uses the user's exact token. No "AI superuser" identity exists in the system.

  4. 4
    Fire Arrow resolves the FHIR identity and applies the rules

    Token validation, identity resolution to a Patient/Practitioner/RelatedPerson/Device resource, rule matching, blocked-parameter checks, search narrowing at the database layer, and validator evaluation. Property filters apply where configured for the user's role.

  5. 5
    Agent service receives only authorized FHIR data

    Nothing the user could not see. The same rules that gate the clinician's UI gate the agent's reads, with no parallel authorization code in the agent.

  6. 6
    Agent sends scoped evidence to the inference provider

    Send the minimum necessary evidence for the task. The LLM never holds the user token and never calls the FHIR server. Private inference providers with no-training commitments and secure contractual terms are supported.

  7. 7
    Logs flow into your observability stack

    Fire Arrow access logs, agent routing decisions, prompts, and completions can land in Azure Application Insights or your stack of choice under your retention and access controls.

What you get out of the box

Capability With Fire Arrow Building it yourself
Identity boundary Token forwarding through the agent service. The user's bearer token reaches the FHIR server unchanged. Custom OAuth proxy, identity-to-data mapping per app, and a decision about which calls inherit which scope.
Search narrowing Server appends compartment / organisation criteria to every search before it hits the database. REST and GraphQL share the same narrowing. Re-implement filters in agent code per query. Keep REST and GraphQL filters in sync. Catch missed parameters in code review.
Service-account scoping Durable and one-time API tokens, each tied to a real Practitioner or Device identity, restricted by `client-role`, `resource`, and `operation` rules. Long-lived secret store, custom rotation, and per-call scope enforcement in the agent.
Audit trail Every FHIR call logged against the resolved user identity, including the matched rule. Log against a generic "AI service" account and try to correlate with your agent traces afterwards.
Field-level data minimization `property-filters` (NullFilter, RandomFilter) declared on rules. Sponsor or analytics roles see redacted views from the same backend. Custom redaction layer in the agent that parses, transforms, and re-serializes FHIR resources per role.
Side-channel protection on filtered access `blocked-search-params` and `blocked-includes` reject probing queries with 403. HFQL is fail-closed when the list is set. Audit every search parameter, every `_sort`, every `_include`, every `_has`, every `_filter` per role.
Write-back safety Operation-specific rules (`create`, `update`, `subscribe`), agent-specific roles, optional human confirmation step. Per-action permission gating in the agent code, hand-rolled approval flow.
Inference provider choice Provider-agnostic. Fire Arrow doesn't care whether you use Azure AI Foundry, Amazon Bedrock, or self-hosted models. Pick by data-handling terms and region. Same. Provider choice is independent of FHIR access.
Debug for denied requests `X-Fire-Arrow-Debug` header returns the rule trace, the matched validator, and near-miss hints (missing PractitionerRole, role-code mismatch, etc.). Trace through scattered authorization code and application logs.

The LLM is not the security boundary

The most common failure pattern is to treat the language model as if it were the application: connect it close to the data, give it a broad service account, allow it to search or retrieve more than the current user should see, and rely on prompt instructions like "only answer what the user is allowed to see".

Prompt instructions are not enforcement. An agent constrained to the user's authorization boundary cannot escalate by being told to. The worst case is a confused query, not an unauthorized read. Keeping authorization on the FHIR server (and out of the agent service) is the design that makes prompt injection an availability concern instead of a data-disclosure event.

Token forwarding vs scoped service accounts

Token forwarding fits whenever the agent answers on behalf of an interactive user. The audit trail stays accurate, the user's existing access rules apply, and no extra configuration is needed for new clinical scopes.

Scoped service accounts fit background or shared workflows: cohort analytics, operational triage, scheduled summarization. The pattern is not "AI superuser". The pattern is a service identity (Practitioner or Device) with rules that grant exactly the operations the workflow needs and nothing else. Durable API tokens on Server make these identities first-class.

Data minimization first, de-identification second

Retrieve less. The most important privacy measure is sending fewer resources to the model in the first place, not redacting them after the fact.

Where the role still needs broad access, property filters strip structured identifiers (`name`, `telecom`, `address`, `birthDate`, `identifier`, `photo`) and free-text fields (`note`, `conclusion`, narrative `text`) from the response. Free-text PHI remains a separate risk: structured de-identification does not protect against names or contextual clues embedded in narrative.

Search side-channels stay open unless `blocked-search-params` and `blocked-includes` close them on every search rule. The safest pattern for property-filtered roles is `read` (or `graphql-read` with reference expansion) without `search` access, removing the side-channel entirely.

Stateless inference by default

Treat each agent step as a self-contained inference problem. Conversation history stays in your client or application layer; provider-side stored conversations and long-lived memory expand the retention surface and need a separate review.

Stateless inference also makes the deterministic boundary clearer. Use the LLM for language understanding, summarization, and explanation. Use deterministic code for retrieval, filtering, validation, and calculation. In the cohort analytics pattern, the model translates a natural-language question into instructions for a query compiler, and the compiler runs the deterministic FHIR query.

Eval discipline before production

Runtime controls are not enough. Maintain an eval harness covering positive, negative, easy, hard, adversarial, and ambiguous tasks. Use it as a release gate. Review agent behavior manually before release. Keep deterministic components unit-tested.

Categorize each agent workflow explicitly as read-only, confirmation-gated, or automated action. The architecture supports all three; the choice is a deployment decision per workflow.

Pre-built agents, certification, and certified hosting

Evoleen, the company behind Fire Arrow, maintains an Agent Lake of pre-built clinical AI agents with strong eval sets: cohort analysis, vaccination recommendations, EMR summarization, and clinical data retrieval. Each agent is a starting point. Use it as a baseline, adapt it to your data, or build your own on the same Fire Arrow primitives.

For teams that prefer a head start, Evoleen offers co-development. The same engineering team behind Fire Arrow's authorization and orchestration model can shape the eval harness, scope the agent identity, design the inference and retrieval layer, and integrate with the customer's identity provider. The engagement ends with a working agent the customer's team owns and can extend.

Where the regulatory path calls for it, Evoleen also offers certification support and certified hosting. The development work runs inside an ISO 27001-aligned SDLC and quality management system; the operating environment can carry an agent through MDR / IEC 62304-relevant evaluations or HIPAA-scoped operating models. The architecture documented above is the technical foundation; certification is a separate engagement that produces the documentation, evidence, and operating procedures the path requires.

Example deployments

  • Clinician medication review copilot

    Clinician asks "what is this patient on?". Agent forwards the clinician's token; Fire Arrow returns the patient's MedicationRequest and MedicationStatement resources scoped by PractitionerCompartment / LegitimateInterest. Agent summarizes with citations back to the resource IDs.

  • Patient health companion

    Mobile app where patients ask about their own records. Token forwarding limits the agent to the patient's compartment via PatientCompartment. The agent never has more reach than the patient does.

  • Cohort analytics agent

    Research coordinator asks population-level questions. A scoped service account with read access to the relevant resource types runs deterministic queries; the model only handles question parsing and result narration. No write permissions.

  • Azure reference deployment

    Fire Arrow Server on Azure App Service, PostgreSQL, Blob Storage, Storage Queues, Application Insights, Microsoft Entra ID, and Azure AI Foundry for inference, all behind Cloudflare or Azure Front Door. The Evoleen reference design ships as a starting point.

FAQ

Should the agent forward the user's token or use its own service account?

Forward the user's token whenever the agent answers on behalf of an interactive user. The audit trail stays accurate and existing rules apply unchanged. Service accounts are for background workflows where there is no end-user request, and they should be scoped tightly through Fire Arrow rules rather than granted broad access.

Does Fire Arrow change how I run the LLM?

No. Inference provider, model choice, and orchestration framework are your decisions. Azure AI Foundry, Amazon Bedrock, and self-hosted models all work. Fire Arrow controls what the agent service can read from the FHIR server, which in turn limits what the model ever sees.

What stops prompt injection from causing data leakage?

Authorization runs on the FHIR server, independent of prompt content. An agent operating under the user's token cannot escalate even if the prompt says it should. The worst case is a wasteful or confused query, not an unauthorized read.

How do I prove what an agent accessed?

Each FHIR request carries the user's token, so the access log records the actual identity, the matched rule, and the resource. Combined with structured logs on the agent side (prompt, tool calls, retrieved resource IDs), you can reconstruct what the agent retrieved for a given user question.

Can the agent write back to FHIR?

Yes, but treat write-back as a higher-control tier. Use agent-specific roles, restrict by resource type and operation, and add a confirmation step where appropriate. Read-only is the recommended starting point for new agentic workflows.

What about long-lived agent memory?

Stateless inference is the default. Provider-side stored conversations or long-lived memory expand the retention surface, complicate deletion, and need their own threat analysis. Treat them as an explicit design decision rather than a default.