FHIR Bulk Data ($export)

FHIR Bulk Data Export

FHIR Bulk Data Access ($export) is the standardized way to extract large datasets from a FHIR server: an asynchronous operation, NDJSON output, three well-defined scopes. The same operation covers two cases that come up routinely in practice. Patient-level export turns the GDPR right of access into a built-in endpoint instead of a per-product project. System-level export, paired with the corresponding import path, turns vendor lock-in into an operational decision: a full export from one FHIR server reloads into another, including a different vendor's product, on a different cloud, in a different region.

What you can build

  • One operation, three scopes

    $export at the system level (everything the identity can read), the Group level (a defined cohort), and the Patient level (the patient compartment). Same async machinery, same NDJSON output.

  • Built-in path for GDPR Article 15 and Article 20

    Patient-level $export returns the patient's record in a structured, machine-readable format. No custom exporter, no per-product DSAR pipeline, no hand-maintained list of resources to include.

  • Vendor-neutral data portability

    System-level $export plus an inbound NDJSON path moves a full dataset between any two FHIR R4 servers. Same vendor, different vendor, different cloud, different operating model.

  • NDJSON output for analytics

    One file per resource type, one resource per line. Loads directly into BigQuery, Snowflake, Redshift, Databricks, Spark, DuckDB, or any tool that reads JSON Lines.

  • Incremental exports via _since

    Repeated exports return only resources changed after a given timestamp. Warehouses and downstream stores stay in sync without re-exporting the full dataset each time.

  • Authorization through the rule chain

    Bulk operations resolve through the same rule chain regular reads do. A research role with property filters produces a redacted export automatically; a sponsor identity's group export contains only the cohort that role is permitted to see.

How it works

  1. 1
    Client kicks off the export

    POST to $export at the system, Group, or Patient endpoint with a service-account or user identity, and a Prefer: respond-async header. Optional parameters narrow the scope: _type for resource types, _since for incremental, _typeFilter for FHIR-search-style filters, includeAssociatedData for Provenance.

  2. 2
    Server accepts and queues the operation

    The server responds 202 Accepted with a Content-Location header pointing at a status URL. The export runs in the background and produces NDJSON files via configured object storage.

  3. 3
    Authorization runs against every resource

    The rule chain evaluates the requesting identity. The export contains exactly the resources and fields that identity would see through normal reads, with the same property filters and search-side protections applied.

  4. 4
    Client polls the status URL

    The status URL returns 202 while running and 200 when complete. The completion response is a JSON manifest listing the produced NDJSON files, one entry per resource type, with byte counts and resource counts.

  5. 5
    Client downloads the files

    Each file URL in the manifest can be fetched directly. Pre-signed URLs against the operator's object storage are typical, so downloading does not hold a connection to the FHIR endpoint open during a multi-gigabyte transfer.

  6. 6
    Audit captures the export

    The kick-off, the resolved identity, the rule path, and the produced manifest are recorded in the same audit stream as regular FHIR access. Deleting the staged files after download is also a recorded operation.

Who this is for

Engineers building analytics pipelines on FHIR data, integration architects moving datasets between systems, data protection officers handling Article 15 and Article 20 requests, and operations teams planning backups, migrations, or vendor transitions.

The three scopes and what each is for

System-level export ($export at the root) returns the full dataset the requesting identity is authorized to read. In practice this is the operation behind backups, full-tenant migrations, vendor-to-vendor transitions, and bulk data warehousing for an entire deployment. It is the broadest scope and is typically reserved for service identities with broad read permission.

Group-level export ($export on a Group resource) returns the data for the patients listed in a Group. Group resources define cohorts: a clinical-trial arm, a population the public-health team is monitoring, the patients enrolled in a specific care program. Group exports feed analytics warehouses, research datasets, and cross-system population workflows.

Patient-level export ($export at /Patient/$export) returns the patient compartment for the patients the identity can read. For a service identity this can be every patient in the system; for a patient-scoped identity it is the patient's own record. The same operation handles both bulk patient-data extraction and the single-patient case behind data subject access requests.

Patient-level export covers GDPR Article 15 and Article 20

GDPR Article 15 gives a data subject the right to a copy of their personal data. Article 20 gives the right to receive that data in a structured, commonly used, machine-readable format and to transmit it to another controller. For a healthcare product, those obligations arrive a few times a year per active user and a smaller number of times per ex-user. Without a built-in path, every product writes its own exporter and maintains the list of resources it covers as the product evolves.

Patient-level $export covers both rights at once. The patient compartment is the FHIR-defined set of resources tied to a Patient. The output is FHIR R4 in NDJSON, both of which are structured, commonly used, and machine-readable by any conformant FHIR consumer. A patient-scoped token (or a service identity acting on the patient's behalf) calls $export, the server produces the NDJSON, and the controller hands the result to the data subject.

The set of resources covered by the export is decided by the authorization rule for the patient role, not by an extra exporter component. Adding a new resource type to the product does not require updating a parallel DSAR pipeline; the resource is in scope as soon as the role can read it. The audit log records the export the same way it records any other access, which is the evidence the controller needs to demonstrate the request was fulfilled.

System-level export and the end of FHIR vendor lock-in

A FHIR server holds a standardized resource graph, not a vendor-specific schema. System-level $export produces the full graph as NDJSON. Any other FHIR R4 server can ingest the same NDJSON via the corresponding $import operation or via transaction Bundles posted to the standard endpoint.

The receiving server can be the same product running in a different environment, a different deployment of the same product, or a completely different vendor's FHIR server. The wire format and the resource definitions are the same on both sides. Switching vendors becomes an operational project, not an architectural rewrite of the integration layer.

Concrete scenarios this enables: a disaster-recovery snapshot to a cold environment, a migration from one cloud region to another, a vendor transition that does not require rewriting the integration layer, splitting a tenant out of a multi-tenant deployment into its own server, and promoting a known-good dataset from staging to production. Each scenario uses the same two operations, $export then the corresponding import, regardless of source and target.

Caveats worth planning for: standard $export returns current state; resource history, Binary attachment payloads, and any non-standard extensions need to be handled explicitly if they are part of the system of record. Current-state portability is the baseline; full historical fidelity is a per-deployment decision driven by what the source server exposes and what the target accepts.

Cohort exports for analytics and research

Group-level export is the standard path for moving a defined population into an analytics environment. The Group resource lists the patients in the cohort; the export returns the patient-compartment data for that cohort as NDJSON.

Two parameters do most of the day-to-day work. _type narrows the export to specific resource types (Observation and Condition only, say, instead of every type in the cohort). _since requests only resources updated after a timestamp, which turns a once-a-week warehouse load into an incremental sync rather than a re-export of the full cohort. _typeFilter applies FHIR search expressions to narrow further: only Observations of a specific code, only Conditions in a given clinical status.

Combined with the rule chain, the export is the intersection of what the client requested and what the role permits. A research role with property filters produces a redacted export automatically. A sponsor identity reading a clinical-trial Group sees the cohort with the redactions configured for the sponsor role, applied to every exported resource.

Why NDJSON

The output format is one NDJSON file per resource type, with one resource per line. Each line is a complete, parseable FHIR resource. The format is sometimes called JSON Lines.

NDJSON is stream-friendly: producer and consumer never need to hold the full dataset in memory. A single file can be processed line by line, and large files can be split across worker processes for parallel ingestion. Every mainstream data engineering tool reads it natively, including BigQuery, Snowflake, Redshift, Databricks, Spark, DuckDB, and the standard line-oriented Unix toolchain.

NDJSON is the format the Bulk Data spec picked for bulk movement. A single FHIR Bundle is what FHIR uses for synchronous, smaller exchanges. The two formats coexist for different jobs.

Asynchronous by design

Bulk operations are asynchronous. The client kicks off the export, the server returns immediately with a status URL, the client polls the status URL on a backoff, and the manifest of file URLs comes back when the export completes. The same protocol covers exports that take seconds for a single patient and hours for a large tenant.

Output files are typically staged through object storage (S3-compatible, Azure Blob, Google Cloud Storage) configured by the operator. Pre-signed download URLs let the client fetch the data directly from storage rather than through the FHIR endpoint, which keeps the server free to serve regular traffic during a large export and removes the FHIR endpoint as a bandwidth bottleneck.

The inbound side: $import and transaction Bundles

The Bulk Data Import specification is the symmetric inbound operation. Same NDJSON shape, same resource-type-per-file layout, same async pattern with status polling. The two main flavours under discussion are Ping-and-Pull (the server fetches files from URLs the client provides) and direct upload. The specification is still progressing toward normative status, so support varies by server.

For smaller payloads, or for environments where standing up an import endpoint is more work than it is worth, transaction or batch Bundles posted to the standard FHIR endpoint cover the same ground. Both paths run under a service identity scoped to the resource types being written, and the audit log captures the import the same way it captures any other write traffic.

The export-import pair is what makes portability operational. Moving a dataset between two FHIR servers becomes the cost of running one operation on each side, plus the planning around history and binary payloads, rather than the cost of writing a one-off ETL.

Bulk export is not a database dump

$export goes through the rule chain. Every resource it returns has been authorized by the same rules that gate regular reads. A sponsor identity's export is automatically the redacted view of the cohort; a patient identity's export is automatically the patient's own data; a service identity's system export is whatever the service role can read.

A database dump bypasses the access boundary. It captures the storage layer directly, which is appropriate for backup and disaster recovery (where the storage layer is the access boundary, secured by the operator's infrastructure) and inappropriate for application-level data movement (where authorization is the boundary). The two operations answer different questions: $export is for application-level movement under the rule chain; a dump is for operational continuity under storage controls.

FAQ

How is $export different from Patient/$everything?

$everything is a synchronous operation that returns a single FHIR Bundle for one patient. $export is asynchronous, produces NDJSON, and scales from a single patient to a full tenant. For a small single-patient record where a Bundle is convenient, $everything fits. For anything bulk, anything large, or anything a downstream analytics tool will load, $export is the operation to use.

Can a patient-facing app trigger the patient's own export?

Yes. With a patient-scoped token, $export at the Patient endpoint returns the patient's compartment. The same access rules a regular Patient read uses also apply to the export, so the result is exactly the data the patient is permitted to see. Self-service Article 15 access in a patient app rests on exactly this pattern.

Does the receiving server need to be the same product as the source?

No. NDJSON output of standard FHIR R4 resources is consumable by any conformant FHIR server that accepts $import or transaction Bundles. Migrations between vendors run on the same two operations as migrations between deployments of the same product.

What about resource history?

Standard $export returns current state. Some servers expose history through additional parameters or auxiliary endpoints; whether to include history is a per-export decision driven by the consumer's needs. For full historical fidelity in a migration, plan the history transfer alongside the bulk export.

What about Binary attachments and DocumentReference content?

DocumentReference resources export as JSON like any other resource. The actual binary content (PDFs, images, attachments) is referenced by URL and may live in separate storage. A complete dataset move plans the binary payloads alongside the NDJSON, typically by transferring the underlying object storage or by including Binary resources explicitly in the export.

Can I run an incremental export?

Yes. The _since parameter requests only resources updated after a given timestamp. Combined with _type and _typeFilter, an incremental export returns a small, targeted slice that downstream warehouses and search indexes can apply without a full re-load.

How long does an export take?

Patient-level exports of a single patient complete in seconds. Group exports range from seconds for small cohorts to minutes for large ones. System-level exports of large tenants are typically scheduled rather than interactive and can take from minutes to hours depending on dataset size and the configured object storage throughput.

Can I export only specific resource types?

Yes. The _type parameter on $export filters the output. Combined with the rule chain, the export is the intersection of what was requested and what the role permits.

How is this different from a database dump?

$export goes through the rule chain. A database dump bypasses access controls and is reserved for backup and disaster recovery, where the operator's storage is the access boundary. Bulk Data is the appropriate path for application-level data movement and for any export that has to be authorized per requesting identity.

Is the import operation as standardized as the export?

Not yet. The export side of the Bulk Data Access spec is widely implemented and stable. The import side (Bulk Data Import, in its Ping-and-Pull and direct-upload flavours) is still progressing toward normative status, so the inbound path is best planned around the specific capabilities of the receiving server. Where $import is not available, transaction or batch Bundles posted to the standard FHIR endpoint are the practical fallback.