Clinical trials

Live FHIR Anonymization for Clinical Trials, Research, and External APIs

Different audiences read different views of the same FHIR data. Investigators see full PHI; sponsors see anonymized records; partner apps see resources with internal tags stripped; analytics services see filtered bulk exports. Field-level redaction and search side-channel protection are part of the authorization rules themselves, so privacy review and audit inspect one configuration instead of a custom gateway plus a nightly de-identification job.

What you can build

  • Live investigator + sponsor views from one data set

    No de-identification copy, no scheduled export job, no synchronisation delay. Sponsor monitors see updates as they happen.

  • Field-level redaction declared in authorization

    `NullFilter` removes a field. `RandomFilter` replaces HumanName or ContactPoint with synthetic values of the same type. Filters live on the rule, not in custom gateway code.

  • Search side-channels closed with one configuration

    `blocked-search-params` and `blocked-includes` reject probes that would reverse-engineer redacted values. `_filter`, `_has`, `_text`, `_content` rejected fail-closed across REST and GraphQL when blocked params are set.

  • Bulk exports automatically filtered

    When the export client authenticates with a sponsor identity, the same property filters apply to every resource returned. No second pipeline, no second storage location.

  • Blinded vs unblinded monitors from the same role

    Identity filters (FHIRPath on the caller's Practitioner resource) select the anonymized rule for blinded monitors and the catch-all rule for unblinded ones. No parallel role hierarchy.

  • External client privilege limits on standard FHIR APIs

    Partner apps and patient-facing apps see resources with internal workflow tags and administrative extensions stripped. Stock FHIR clients, no custom SDKs.

  • Internal data hidden from external views

    Implementation-specific extensions, workflow routing tags, and operational metadata removed per role. Internal tools see the full resource; external apps do not.

How it works

  1. 1
    Give the investigator role normal access

    Investigators (the treating physicians at each site) read, search, create, and update the trial's clinical resources (Patient, Observation, Condition, MedicationRequest, and the rest) with no redaction. Their authorisation rules look like a normal clinician role. This is the baseline; everything that follows is a constrained variant of it.

  2. 2
    Configure the sponsor role with field-level redaction

    The sponsor monitor role uses the same authorisation pattern as the investigator role, but with a redaction list attached. On Patient: replace the name with a synthetic placeholder of the same shape (so downstream code that expects a name still works) and remove contact details, address, date of birth, identifiers, and photo entirely. On clinical resources like Observation, Condition, and DiagnosticReport: strip the free-text notes, conclusions, and narrative: the fields where clinicians often type names, locations, or other PHI in prose. The redaction list lives on the authorisation rule, not in a downstream gateway.

  3. 3
    Apply the redaction whether the resource is fetched directly or through search

    Reading a single resource by id and searching for it are separate operations under separate rules. Attach the same redaction list to both, so the sponsor's view of a patient is the same regardless of how they retrieved it.

  4. 4
    Block search parameters that could probe redacted fields

    Filtering hides the field on the way out, but the underlying data stays indexed and searchable. A search for patients named "Smith" still returns results even if the response no longer shows a name. For the sponsor's search rules, list every parameter that could reveal a redacted field (name, family name, given name, phonetic name, phone, email, every address component, date of birth, identifier), and the server rejects any search that uses one of them. The mapping from a redacted field to the parameters that could probe it is many-to-many, so the list has to be enumerated explicitly.

  5. 5
    Block reference traversals that could leak around the redaction

    A single FHIR query can also pull in linked resources by reference: a patient's general practitioner, the encounter that produced an observation, the questionnaire behind a response. If a linked resource carries something the sponsor should not see, block that traversal. Each path that could carry data around the redaction has to be closed deliberately.

  6. 6
    Prefer fetch-by-id over search for the sponsor role where possible

    If the sponsor's workflow does not actually need ad-hoc search, grant only fetch-by-id (and the GraphQL equivalent that resolves references by id rather than by query). With no search surface there is nothing for the side-channels to attack. If search is required, plan for a careful audit of every parameter and traversal path that touches a redacted field.

  7. 7
    Optional: split blinded from unblinded monitors without inventing a second role

    Blinded and unblinded monitors are both "sponsor monitors", but blinded monitors must not see treatment-arm-revealing data. Tag the blinded monitors' practitioner records with a marker, and add a small expression to the redacted rule that only matches when the caller carries that tag. Unblinded monitors fall through to the unredacted version. Both audiences keep the same role name; rule order ensures the redacted rule wins for blinded monitors.

  8. 8
    Verify what each role actually sees

    During development, ask the server for a debug trace on a test request. The response shows which rule matched, which validator allowed access, and which redactions were applied. Use it to confirm that an investigator account sees the full resource and that a sponsor account sees the redacted version, before either view reaches a real audit. Turn it off in production.

What you get out of the box

Capability With Fire Arrow Building it yourself
Per-role field redaction A redaction list attached to the authorisation rule. One option removes a field outright; another replaces names or contact details with synthetic values of the same shape, so downstream code that expects a name or a phone number still works. Configured once per role and applied across REST, GraphQL, and bulk export. A custom API gateway that parses each FHIR response, applies role-specific transformations, and re-serialises the output, repeated for every resource type, every endpoint, every access path.
Search side-channel safety An explicit list of blocked search parameters per role. Listed parameters are rejected, sort orders that match them are rejected, and the more permissive expression-based search forms are rejected outright as soon as any parameter is blocked, so an unaudited expression cannot quietly leak. Audit every search parameter, every sort key, every reference traversal, every reverse-chain expression. The expression-based search forms can embed parameter names inside chains, so the audit has to be deep, not just enumerative.
SQL-on-FHIR safety Fail-closed: as soon as any search parameters are blocked for a resource type, all SQL-on-FHIR queries on that type are rejected outright. No partial-leakage path. Most stacks have no SQL-on-FHIR alternative; the equivalent would be a separate analytics database with its own access policy to keep in sync.
One backend, two audiences The same server handles both the investigator and sponsor views. Sponsor reads hit live data through the redacted rules. Two systems: an operational FHIR store and a de-identified copy. Schedule the de-identification job. Live with the sync delay. Maintain two security boundaries.
Bulk-export filtering The bulk export client authenticates with a sponsor or analytics service account, hits the same redacted rules, and the redaction applies to every exported resource automatically. Build a separate export pipeline. Run a de-identification step. Store the result in a second location. Serve from there with its own access controls.
Identity-conditional rules (blinded vs unblinded) A small expression on the rule, evaluated against the caller's own practitioner record, decides whether the redacted rule applies. Same role name, two views. Two role names plus branching logic in the application, or a separate authorisation layer that handles per-user variants.
External vs internal API surface Strip internal workflow tags, administrative extensions, and operational metadata per role through the same redaction mechanism. Configuration change, not code change. Per-resource filter logic in the gateway, maintained as new resource types and extensions are added.
Discovery and debug A debug trace on any request shows which rule matched, why it matched, and which redactions were applied. Near-miss hints surface common configuration mistakes when a request is denied. Trace through scattered redaction code, gateway middleware, and access logs to figure out which check did what.

Field-level visibility belongs in authorisation

Healthcare systems are pulled in two directions at once. Clinical workflows, study protocols, and digital monitoring programs need comprehensive data capture. Regulations like HIPAA's Minimum Necessary rule and GDPR's data-minimisation principle require that each user sees only what their role justifies.

The common workaround where the two pressures meet is a custom API gateway that strips fields per endpoint. Every resource type, every access path (REST, GraphQL, bulk export, subscriptions), every new search parameter has to be covered by gateway code, and the filtering logic ends up scattered across controllers and middleware. A single missed parameter is a leak.

Treating field-level visibility as an authorisation rule keeps the policy in one place. Same configuration for REST and GraphQL. Same configuration for live access and bulk export. Same configuration that an audit reviewer can read straight, instead of tracing the missing fields back through application code.

The search side-channel problem

Redaction hides response fields, but the underlying data stays indexed. A determined client could try to infer redacted values through targeted searches, sort orders, reference traversals, or general filter expressions if those routes are left open.

When the response no longer carries a patient's name, a client can still ask the server "search Patients where name = Smith". If the search returns results, a patient named Smith exists in the system, even though the response itself has no name. Sorting alphabetically by name reveals the ordering across patients. Following a reference (for example, the general practitioner attached to a patient) can expose values through the linked resource. More general filter and reverse-chain expressions can reach any indexed field whether the regular search parameters listed it or not.

Each route has to be closed deliberately. The blocked-parameter list rejects the unsafe parameters and refuses sort orders that match them. A separate blocked-include list rejects unsafe reference traversals. The general filter and reverse-chain expressions are rejected outright as soon as any parameters are blocked for a resource type, because they could reach any parameter and a partial check would create a false sense of security.

Risk profile, by access pattern

Fetching a single resource by id is the safest pattern. The client retrieves what it already has the id of; redaction applies unconditionally; there is nothing for a search side-channel to attack.

Following references through GraphQL is almost as safe, as long as references are resolved by id rather than by client-controlled search arguments. Redaction applies to every resource that comes out of the traversal automatically.

The SQL-on-FHIR query language is the most aggressive read surface and is therefore handled most strictly: as soon as any search parameters are blocked for a resource type, all SQL-on-FHIR queries on that type are rejected outright. There is no partial-allow path.

Search (both REST search and GraphQL search) carries the highest residual risk because it accepts client-controlled search parameters. It requires explicit blocked-parameter and blocked-include lists covering every parameter that could reveal a redacted field.

The recommended pattern for a redacted role is therefore: grant fetch-by-id (REST and GraphQL) only. Add search if the workflow truly needs it, and budget time for a thorough audit when you do.

The same mechanics, beyond clinical trials

The same mechanics apply to scenarios that come up in day-to-day operations beyond trial sponsor anonymisation.

Bulk-export filtering: the FHIR Bulk Data Access specification leaves field-level filtering to each implementation. When a bulk-export client authenticates with a sponsor or analytics identity that matches a redacted rule, the filtering applies to every exported resource automatically. No separate de-identification pipeline.

External-client privilege limiting: a partner app may need to read Observations but not search Patients by name. A patient-facing app may need to see Encounters but not the internal workflow tags clinicians use behind the scenes. The same blocked-parameter mechanism enforces these limits inside the authorisation pipeline, so the partner uses standard FHIR APIs and does not need a custom SDK.

Internal-data hiding: workflow routing tags, system-generated identifiers, and operational extensions are not PHI but should not be visible to external consumers. Field-level redaction strips them per role, so the internal teams keep the full resource and the external consumers see only what is meant for them.

What field-level redaction is and is not

Field-level redaction is a practical de-identification mechanism. It is not a statistical disclosure control system. Removing direct identifiers does not automatically prevent re-identification through small cohorts, rare clinical combinations, or temporal patterns. If a use case requires k-anonymity-style guarantees, that is a higher-level design that sits on top of the field-level redaction described here.

Free-text fields are a separate concern. Structured demographic fields (name, address, phone, date of birth) are easy targets for an explicit redaction list. Names, locations, and dates also turn up inside narrative notes, conclusions, and prose-style fields where clinicians write freely. Removing those entirely is straightforward; selectively redacting only the identifying parts of free text requires an NLP pipeline outside this authorisation layer.

Example deployments

  • Multi-site clinical trial: investigator and sponsor tiers

    Site investigators have full access to the trial's clinical resources, the same as a normal treating clinician. Sponsor monitors share the same access pattern but with a redaction list that strips Patient identifiers and free-text notes, and a corresponding block-list on the search side so identifiers cannot be probed back out through search.

  • Blinded vs unblinded sponsor monitors

    Both sets of monitors use the same role. Blinded monitors are tagged on their practitioner record. The redacted rule only matches when the caller carries that tag; unblinded monitors fall through to the unredacted version. One role name, two views, no parallel role hierarchy.

  • Patient app, analytics service, and external partners on one backend

    A patient app accesses the patient's own data through the standard patient-scoped rules. An analytics service uses a service account whose rules carry the redaction list, so its bulk-export downloads are de-identified at source. External partner apps use stock FHIR clients with the same search-parameter protections and stripped internal extensions. Three audiences, one server, no separate API surfaces.

  • Internal vs external API surface

    Internal practitioners see the full resources, including administrative tags, workflow extensions, and operational metadata. External applications authenticate with a different role; the same FHIR endpoints return resources with those internal fields stripped. No separate "public API" project to maintain alongside the main one.

FAQ

Can the same patient appear in both views?

Yes. Both roles read the same FHIR resources; what differs is the redaction applied to the response. Patient identities stay stable through the resource id, while the identifying fields (name, address, contact details, date of birth) are removed in the sponsor view.

What if a sponsor wants to slice on a redacted field, like age?

Expose a derived field in a separate resource that is safe to query (for example, an Observation that records an age band rather than a date of birth) and grant search access on that resource. The original date of birth stays redacted and out of search reach.

Does it work for the SQL-on-FHIR query language?

Yes, and strictly. As soon as any search parameters are blocked for a resource type, all SQL-on-FHIR queries on that type are rejected outright. There is no partial-leakage path.

How do I handle free-text fields like clinical notes?

Remove the field entirely for the redacted role. Free-text PHI is not removed by structured-field redaction, so any field where clinicians might type names or locations needs to be reviewed and stripped explicitly. Selective NLP-based redaction is not built in; if a workflow needs to keep narrative content for the redacted role, that processing belongs in a separate pipeline.

Does this work alongside bulk export?

Yes. The export client authenticates with a service account that matches a redacted rule, and the redaction applies to every resource the export returns. No separate de-identification pipeline.

Is this a replacement for HIPAA Safe Harbor or GDPR pseudonymisation?

No. Field-level redaction is an enforcement mechanism. Whether the resulting view qualifies as de-identified, pseudonymised, or anonymised under a specific regulatory standard depends on the data, the recipient, and the broader processing context. A privacy assessment is still required.