Executive Summary

Healthcare teams increasingly want software agents that can answer clinical questions, prepare summaries, assist with triage, and help clinicians or researchers work faster with complex data. Many of these use cases are reasonable. Many are also implemented unsafely.

The common failure pattern is to treat the language model as if it were the application. A model is connected too close to the data, given a broad service account, allowed to search or retrieve more than the current user should see, and then asked to “behave.” In healthcare, that is the wrong starting point.

The central design principle of this paper is simple: the LLM is not the security boundary. The FHIR server and surrounding application architecture must remain the place where identity, authorization, data minimization, de-identification, auditability, and operational control are enforced.

Fire Arrow is built for this. It enables agentic access to FHIR data while keeping access control inside the server boundary. Agent services forward the real user token to Fire Arrow, or use narrowly scoped service identities where appropriate. Fire Arrow then applies authorization rules, request validation, search narrowing, and optional response-time filtering before any data reaches the model.

This paper walks through a reference architecture for that approach. It covers read-only assistant patterns, controlled write-back, privacy and de-identification, inference-provider choices, auditability, and an Azure-based reference deployment. It also explains why stateless inference is the right starting point and why more advanced memory or autonomous workflows need additional architectural review.

The goal is not to claim that agentic healthcare systems are “automatically compliant.” It is to lay out a concrete way to build them so that privacy, governance, audit, and least-privilege security are there from the start.


1. Audience and Scope

This paper is intended for:

  • CTOs, CIOs, and technical product leaders evaluating AI-enabled healthcare architectures.
  • Solution and platform architects responsible for clinical or research systems.
  • Security, privacy, and compliance stakeholders reviewing agentic access to protected data.
  • Engineering teams building clinician, patient, research, or operational assistants on top of FHIR.

In scope

This paper covers:

  • Agent-mediated access to FHIR data for retrieval, search, summarization, and orchestration.
  • Security and privacy controls for model-assisted workflows.
  • Deployment and operating patterns using Fire Arrow as the FHIR security and control plane.
  • Read-only assistant scenarios.
  • Controlled write-back and workflow automation scenarios.

Out of scope

This paper does not:

  • Provide legal advice or make blanket claims of regulatory compliance.
  • Classify every deployment under a single regulatory category.
  • Assert that all LLM-based healthcare systems are appropriate for every workflow.
  • Recommend unrestricted direct model access to clinical data sources.

2. The Problem

There is real demand for agentic systems that work with clinical and research data:

  • clinician-facing copilots that answer patient-specific questions,
  • patient-facing assistants that explain records and next steps,
  • research assistants that work over controlled or anonymized datasets,
  • operational agents that coordinate tasks, scheduling, communications, or follow-up actions.

These use cases are attractive because FHIR data is rich, standardized, and hard for humans to navigate quickly.

The most obvious implementation patterns are also the most dangerous.

Common failure modes

2.1 The model is given too much authority

A generic AI service account is granted broad read access or even write access across the entire dataset. This may make the first demo easy, but it breaks least privilege and makes the audit trail useless.

2.2 The model becomes the effective policy engine

Instead of enforcing policy in the server, implementers rely on prompt instructions like “only answer what the user is allowed to see.” That doesn’t work.

2.3 Data minimization happens too late

Raw resource bundles, full narratives, or broad search results are sent to the model even when a narrower evidence set would have been sufficient.

2.4 De-identification is treated as a simple field-removal exercise

Structured demographic removal is useful, but it does not fully solve the problem. Search parameters and free-text narrative fields can still reveal or leak sensitive information if they are not controlled carefully.

2.5 Auditability collapses at the AI boundary

Once all FHIR access is performed using a generic AI identity, it becomes difficult to preserve the link between a user action, an agent step, a server request, and a generated response.

The answer is not to avoid agentic systems altogether. It is to build them on an architecture that preserves security and traceability.


3. Design Principles

3.1 The LLM is not the security boundary

The model should not authenticate directly against the FHIR server and should never hold the user’s access token. Authorization remains a server-side responsibility.

3.2 Reuse the application’s identity and authorization model

Wherever possible, the agent should operate in the same user context as the application. If a user is allowed to see a resource in the application, that same user context should govern what the agent can retrieve.

3.3 Prefer stateless inference by default

The simplest and safest starting point: treat each agent step as a self-contained inference problem. Conversation history stays in the client or application layer — no long-lived provider-side memory. More advanced memory patterns are possible, but they need separate review.

3.4 Send the minimum necessary evidence to the model

The model should receive only the evidence needed for the current task. In many cases that means a scoped subset of FHIR data or a de-identified extract, not a broad dump of clinical records.

3.5 Use deterministic code wherever practical

Natural language is useful for interpreting requests, summarizing, and explaining. Retrieval, filtering, authorization, validation, and calculation should stay deterministic.

3.6 Make higher-risk capabilities opt-in

Write-back, workflow automation, stateful memory, and broader cross-system actions should be enabled only intentionally, with stronger controls than a basic read-only assistant.

3.7 Preserve auditability end to end

Any regulated deployment needs traceability from user request through agent planning, data retrieval, model inference, and final response.


4. Reference Architecture

The diagram below shows how Fire Arrow acts as the control plane for agentic FHIR access.

Reference architecture for secure agentic access to FHIR data with Fire Arrow

4.1 How the architecture works

  1. A user authenticates through the application’s normal identity flow.
  2. The client calls the agent service with the user’s request and the user context.
  3. The agent service forwards the user token to Fire Arrow when it needs FHIR access.
  4. Fire Arrow resolves identity, applies authorization rules, validates the request, blocks unsafe parameters if configured, narrows search results, and filters response properties where appropriate.
  5. The agent service receives only authorized data — scoped to what the current user is allowed to see.
  6. The agent service sends that evidence to the model provider for summarization, explanation, or reasoning.
  7. Logs, telemetry, and operational traces go to the organization’s logging stack under customer-configurable retention and access controls.

4.2 Why this architecture matters

The key difference from naive LLM architectures is that the model is downstream of access control rather than upstream of it. Fire Arrow remains the place where security policy is enforced.


5. Security and Authorization Model

5.1 Token forwarding

Start with token forwarding. The agent does not authenticate to Fire Arrow with its own broad identity. Instead, it forwards the user’s token when retrieving data on that user’s behalf.

This preserves two critical properties:

  • the user’s existing access rights are reused,
  • the audit trail stays tied to the actual user and request context.

5.2 Scoped service accounts

Sometimes a service identity fits better than direct user context — background operations, shared analytics tasks, or tightly bounded write-back workflows. In those cases, scope the service account narrowly.

The correct pattern is not “AI superuser.” The correct pattern is a service identity with only the exact resource and operation permissions required for a well-defined task.

5.3 Deny by default

A secure FHIR platform does not rely on permissive defaults. Fire Arrow uses explicit authorization rules, validators, and request constraints so that allowed behavior is defined intentionally.

5.4 Resource and data scoping

For agentic workflows, Fire Arrow restricts access at multiple levels:

  • operation type,
  • resource type,
  • subsets of patients,
  • subsets of organization data,
  • property-level response filtering,
  • blocked search vectors or query parameters.

Think of it as row-level security expressed through FHIR-native authorization patterns.

5.5 Optional user confirmation for actions

For higher-risk workflows, you can require user confirmation before an action executes. The clinician gets LLM assistance; the system does not get autonomous write-back.

Confirmation can be introduced selectively. Examples include:

  • confirming a generated communication draft,
  • approving a task creation or status change,
  • approving a write-back to specific resource types.

This does not replace authorization. It adds a second control layer for selected workflows.


6. Privacy and De-identification

6.1 Data minimization first

The most important privacy measure is not post-processing alone. It is to retrieve less data in the first place. If the agent only needs a narrow subset of evidence, the system should avoid retrieving broader clinical context.

6.2 Structured de-identification

Fire Arrow supports de-identification through response-time property filtering and authorization policy design. For controlled research analytics or summarization over limited views, this is often sufficient.

6.3 Free-text PHI remains a special risk

Narrative fields can contain names, locations, dates, identifiers, and contextual clues even when structured patient demographics are removed. This is why privacy-sensitive agent workflows should review how fields such as notes, conclusions, and human-readable text are handled.

6.4 Search side channels matter

A system can appear anonymized at the response level while still leaking information through search behavior. If the server permits searching on fields that contain identifying or quasi-identifying content, the search surface itself can become a disclosure channel.

Anonymized or de-identified access patterns need to be evaluated case by case. In some deployments, property filters and blocked parameters are enough. In others, expert review is the safer choice.

6.5 Logging retention and access

The Evoleen reference design logs Fire Arrow requests, Fire Arrow telemetry, agent routing decisions, task decisions, prompts, and completions. These logs are stored in Azure under strong RBAC controls using Entra ID and Azure roles.

Retention windows are customer-configurable — privacy expectations and regulatory requirements vary by project, use case, and geography.


7. Threat Model

7.1 Prompt injection

A user or a retrieved document may contain instructions intended to manipulate the model. This can lead to disclosure attempts, unsafe tool selection, or attempts to bypass policy.

Control approach: the model never becomes the authorization boundary, tool access is constrained, and Fire Arrow enforces data access independently of prompt content.

7.2 Indirect prompt injection through retrieved content

Clinical documents, imported notes, or external data may contain hidden or malicious instructions.

Control approach: treat retrieved content as untrusted, constrain tool use, keep retrieval deterministic where possible, and minimize model authority.

7.3 Over-privileged service identities

A broad agent service account can turn any model error or prompt attack into a large data-exposure event.

Control approach: prefer user-context access and use narrowly scoped service accounts only where justified.

7.4 Cross-patient or cross-organization leakage

A poorly designed authorization model may allow the agent to pull more data than the current context requires.

Control approach: explicit Fire Arrow rules that constrain resources, operations, patient subsets, organizational subsets, and searchable fields.

7.5 Free-text leakage

Even carefully filtered structured data may be undermined by narrative content.

Control approach: review and restrict narrative fields separately; do not assume that structured de-identification alone is enough.

7.6 Unsafe write-back or autonomous actions

An agent that can write without restrictions can create or update the wrong data, trigger incorrect downstream workflows, or act outside organizational policy.

Control approach: introduce write-back only as an advanced control tier, with narrow roles, operation-specific permissions, and optional user confirmation.

7.7 Stateful-memory expansion of risk

Long-lived memory and provider-side stored conversation state can improve usability, but they also expand the retention surface, complicate deletion and review, and require separate threat analysis.

Control approach: start with stateless inferencing and treat stored memory as an explicit design decision requiring further review.


8. Inference Provider Strategy

Fire Arrow is inference-provider agnostic. What matters is not branding, but the provider’s data-handling commitments, regional controls, storage behavior, and operational fit.

8.1 Azure AI Foundry as the reference deployment

Azure is the reference deployment in the Evoleen architecture. It provides a strong baseline:

  • enterprise identity integration,
  • regional deployment options,
  • no-training commitments for customer prompts and outputs,
  • content safety controls,
  • telemetry and operational integration into the wider Azure estate.

Europe is the reference region in Evoleen deployments, but region selection is customer- and project-specific.

8.2 Amazon Bedrock as an approved alternative

Amazon Bedrock works where its regional and contractual terms fit the project.

The recommended governance model is an approved model allow-list rather than a blanket provider approval. This is especially important where Bedrock exposes third-party serverless models with model-specific license terms.

8.3 Self-hosted models

Self-hosted inference gives the most control, especially for customers with strict data residency or isolation requirements. In practice, it also shifts responsibility for model operations, patching, safety controls, and runtime resilience to the deployment team.

8.4 Stateless by default, stateful by exception

Start with stateless inference. The client or application carries the active conversation history; each task is a fresh retrieval-and-inference step.

Provider-side stored conversations, threads, or memory can be valuable for advanced assistants, but introduce questions that need answering first:

  • retention and deletion behavior,
  • access boundaries,
  • regional processing,
  • threat model expansion,
  • regulatory expectations for the workflow.

9. Azure Reference Deployment

The Evoleen reference deployment uses Azure as the primary hosting and control environment.

9.1 Core platform components

  • Fire Arrow Server hosted as Docker containers on Azure App Service.
  • Azure Database for PostgreSQL for primary data storage.
  • Azure Blob Storage for external file and attachment storage.
  • Azure Storage Queues for queue-driven event handling and asynchronous processing.
  • Azure Container Registry for image storage and deployment workflows.
  • Azure Application Insights for telemetry, logging, and alerting.
  • Microsoft Entra ID and Azure role assignments for infrastructure and access governance.
  • Azure VNets for protected service-to-service communication.
  • Azure AI Foundry for model hosting and inference.
  • Cloudflare or Azure Front Door as the web-facing protection layer.

9.2 Client-side options

Evoleen commonly uses:

  • Flutter as the default client technology,
  • CopilotKit where a Node.js-based copilot framework is appropriate,
  • terminal-based tooling for developer and operator workflows.

9.3 Agent framework

Agents in the reference design are built with the Microsoft Agents Framework. This provides a consistent orchestrator layer while keeping data retrieval, query compilation, validation, and business logic deterministic.


10. Assurance, Evaluation, and Release Discipline

Runtime controls are not enough. How agents are designed, tested, and released matters just as much.

10.1 Strong eval harnesses

Evoleen uses strict evaluation harnesses during agent development and release. These include test sets that cover:

  • positive tasks,
  • negative tasks,
  • easy tasks,
  • hard tasks,
  • adversarial tasks,
  • ambiguous tasks.

The point is not to produce a universal score. It is to establish release gates and expose failure modes before production.

10.2 Manual review before release

Agent behavior is reviewed manually before release. Pre-release governance matters as much as runtime controls.

10.3 Deterministic execution where possible

Evoleen uses LLMs where language understanding or explanation adds value, and keeps core execution deterministic everywhere else.

A good example is the cohort analytics pattern: a natural-language request is translated into instructions for a query compiler, but the actual execution is deterministic. This reduces the attack surface and makes behavior easier to test.

10.4 Unit testing and SDLC discipline

Agents are released inside an ISO 27001-aligned software development lifecycle and quality management system. That does not replace project-specific risk review, but it provides a disciplined baseline.


11. Write-Back and Workflow Automation

Read-only retrieval and summarization are the recommended entry point for agentic clinical and research systems. That said, many useful systems eventually need to perform actions.

11.1 Why write-back deserves a higher control tier

Once an agent can create, update, or trigger downstream workflows, the risk profile changes. The system is no longer presenting information — it is acting. That does not make write-back unacceptable, but it must be designed more carefully.

11.2 Safe write-back design patterns

Fire Arrow supports the core ingredients needed for controlled action patterns:

  • agent-specific identities,
  • role-based permissions,
  • resource-type restrictions,
  • patient and organizational scoping,
  • property-level restrictions where relevant,
  • validator-based safeguards,
  • optional confirmation steps.

11.3 Confirmation as a selectable control

Some customers will want a human confirmation step before the system performs a write or workflow action. Others may approve selected automated actions for narrowly defined scenarios.

The architecture supports both patterns. Confirmation is a deployment decision, not a sign that the underlying authorization model is weak.

11.4 Example write-back scenarios

Examples of controlled write-back patterns include:

  • creating or updating a task in a limited workflow lane,
  • drafting a communication for approval,
  • updating a narrow resource type under an agent-specific role,
  • coordinating an operational step using queue-backed background processing.

The common rule is that the agent should only have the exact write scope it needs and nothing more.


12. Implementation Checklist

Use this checklist before taking an agentic FHIR workflow live.

Security checklist

  • Have you decided whether the workflow uses token forwarding or a scoped service account?
  • Is the access model deny-by-default?
  • Are resource types, operations, patient subsets, and organization subsets explicitly constrained?
  • Are unsafe search vectors blocked where needed?
  • Are write permissions absent unless they are intentionally required?

Privacy checklist

  • What is the minimum evidence the model actually needs?
  • Are narrative fields reviewed separately from structured data?
  • Is this a case where property filters are sufficient, or is expert de-identification review needed?
  • Are prompt, completion, and telemetry retention settings explicitly configured?
  • Are access rights to logs and traces tightly controlled?

Provider checklist

  • Which inference provider and specific models are approved?
  • Does the deployment use stateless inferencing by default?
  • Are any provider-side storage or memory features disabled unless explicitly approved?
  • Are regional or cross-region behaviors understood and documented?
  • Have model-specific license terms been reviewed where applicable?

Evaluation and release checklist

  • Does the agent have a structured eval harness?
  • Are adversarial and ambiguous tasks included?
  • Are deterministic components unit tested?
  • Is there a manual review step before release?
  • Is the workflow clearly categorized as read-only, confirmation-gated, or automated action?

13. Conclusion

Agentic use of FHIR data is compatible with security, privacy, and operational control. What matters is where the control plane lives.

If the model is allowed to become the policy engine, the architecture is fragile. If the FHIR server remains the enforcement point, if the model is downstream of authorization, if data minimization happens before inference, and if memory and write-back are treated as higher-control tiers, then useful and defensible healthcare agent architectures become possible.

Fire Arrow is designed for that model. It lets teams build clinician assistants, patient-facing systems, research tools, and operational agents without bypassing the access-control and audit foundations that healthcare systems require.

The practical recommendation is straightforward:

  • start with stateless, read-only, least-privilege architectures,
  • use token forwarding whenever practical,
  • minimize the evidence sent to the model,
  • introduce write-back only with explicit roles and narrow scopes,
  • treat long-lived memory and autonomous workflows as advanced features that deserve their own review.

That is the path from impressive demo to deployable system.


References

  1. Fire Arrow Docs — Agentic LLM Access. https://docs.firearrow.io/docs/server/how-to/agentic-llm-access

  2. Fire Arrow Docs — Minimal Configuration. https://docs.firearrow.io/docs/server/getting-started/minimal-configuration

  3. Fire Arrow Docs — First Run. https://docs.firearrow.io/docs/server/getting-started/first-run

  4. Fire Arrow Docs — About Fire Arrow. https://docs.firearrow.io/docs/general/about

  5. Fire Arrow Docs — Authorization Concepts. https://docs.firearrow.io/docs/server/authorization/concepts

  6. Fire Arrow Docs — Property Filters. https://docs.firearrow.io/docs/server/authorization/property-filters

  7. Fire Arrow Docs — Automatic Anonymization. https://docs.firearrow.io/docs/server/how-to/automatic-anonymization

  8. Microsoft Learn — Data, privacy, and security for Azure AI Foundry models. https://learn.microsoft.com/en-us/azure/machine-learning/concept-data-privacy?view=azureml-api-2

  9. Microsoft Learn — Azure OpenAI data, privacy, and security. https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/data-privacy

  10. AWS — Amazon Bedrock FAQs. https://aws.amazon.com/bedrock/faqs/

  11. AWS — Amazon Bedrock third-party model terms. https://aws.amazon.com/legal/bedrock/third-party-models/

  12. AWS — AWS Service Terms. https://aws.amazon.com/service-terms/

  13. HHS — Minimum Necessary Requirement. https://www.hhs.gov/hipaa/for-professionals/privacy/guidance/minimum-necessary-requirement/index.html

  14. HHS — Guidance Regarding Methods for De-identification of Protected Health Information. https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html

  15. NIST — AI Risk Management Framework: Generative AI Profile. https://www.nist.gov/publications/artificial-intelligence-risk-management-framework-generative-artificial-intelligence

  16. EUR-Lex — Regulation (EU) 2024/1689 (AI Act). https://eur-lex.europa.eu/eli/reg/2024/1689/oj/eng