🔥 How to Export FHIR Data to Databricks Delta Lake

FHIR resources are optimized for interoperability, while analytics platforms such as Databricks work best with tables. Healthcare analytics teams constantly run into that gap — they need FHIR data inside Databricks for reporting, research, machine learning, and population health analytics, but the data arrives as nested JSON instead of rows.

Since Aidbox 2605, you can close that gap directly: export SQL-on-FHIR ViewDefinitions straight into Databricks Unity Catalog managed Delta tables. This article shows how to build that pipeline using:

$viewdefinition-export for snapshots and backfills
AidboxTopicDestination for continuous delivery with automatic initial export

Why Databricks for FHIR Analytics?

FHIR is a good format for exchanging healthcare data, but it is not a convenient shape for analytics. Analytics teams want to query cohorts, build dashboards, join clinical data with operational data, and prepare datasets for research or machine learning — and Databricks gives them a lakehouse platform built for exactly that:

Delta Lake tables for analytical storage
Unity Catalog for governance
Databricks SQL for querying
Notebooks and Jobs for data processing
ML and AI workflows on top of the same tables

The missing piece is the FHIR-to-table transformation, and that is what SQL-on-FHIR ViewDefinitions provide: they define how FHIR resources become analytical rows before those rows land in Databricks.

Convert FHIR Resources into Analytics Tables

FHIR resources are nested JSON, but most analytics workloads need relational tables. That mismatch is usually where the pain starts — someone builds an ETL pipeline, someone else maintains it, and six months later the pipeline has become a product of its own.

Aidbox solves this with SQL-on-FHIR: define a ViewDefinition and FHIR resources turn into rows.

{
  "resourceType": "ViewDefinition",
  "id": "patient_flat",
  "resource": "Patient",
  "select": [
    {
      "column": [
        { "name": "id", "path": "id" },
        { "name": "ts", "path": "getAidboxTs()" },
        { "name": "gender", "path": "gender" },
        { "name": "birth_date", "path": "birthDate" },
        { "name": "family_name", "path": "name.where(use = 'official').family.first()" },
        { "name": "given_name", "path": "name.where(use = 'official').given.first()" }
      ]
    }
  ]
}

What this does

This ViewDefinition takes Patient resources and turns them into flat analytical rows.

resource: Patient selects the source FHIR resource type.
id, gender, birth_date, family_name, and given_name become table columns.
getAidboxTs() adds a timestamp column used for incremental exports.
After materialization, Aidbox exposes the result as sof.patient_flat.

Materialize the view:

POST /fhir/ViewDefinition/patient_flat/$materialize

A Patient resource:

{
  "resourceType": "Patient",
  "id": "patient-1",
  "gender": "male",
  "birthDate": "1990-01-15",
  "name": [
    {
      "use": "official",
      "family": "Smith",
      "given": ["John"]
    }
  ]
}

becomes a row you can query:

SELECT id, ts, gender, birth_date, family_name, given_name
FROM sof.patient_flat;

id	ts	gender	birth_date	family_name	given_name
patient-1	2026-06-09T10:15:00Z	male	1990-01-15	Smith	John

Aidbox exposes the result as the sof.patient_flat PostgreSQL view, which becomes the source for both batch exports and continuous delivery.

Not familiar with SQL-on-FHIR?

Read our articles:

👉 What is a ViewDefinition

👉 SQL on FHIR: How It Works, Benefits & Use Cases

Continuous Delivery from FHIR to Databricks

Aidbox topic-based subscriptions are the event layer behind continuous delivery — at-least-once semantics, append-only target. You define an AidboxSubscriptionTopic for the resource changes you care about, then attach an AidboxTopicDestination that decides where those events should go. In this case, the destination is not a webhook or a queue — it is a Data Lakehouse writer that takes rows from the SQL-on-FHIR ViewDefinition and delivers them to Databricks.

Create a topic that listens for Patient changes:

POST /fhir/AidboxSubscriptionTopic

{
  "resourceType": "AidboxSubscriptionTopic",
  "url": "http://example.org/subscriptions/patient-updates",
  "status": "active",
  "trigger": [
    {
      "resource": "Patient",
      "supportedInteraction": ["create", "update", "delete"]
    }
  ]
}

Create a Data Lakehouse destination:

{
  "resourceType": "AidboxTopicDestination",
  "id": "patient-databricks",
  "topic": "http://example.org/subscriptions/patient-updates",
  "kind": "data-lakehouse-at-least-once",
  "parameter": [
    // ... databricks params ...
    { "name": "viewDefinition", 
      "valueString": "patient_flat" 
    },
    { "name": "batchSize", 
      "valueUnsignedInt": 50 
    },
    { "name": "sendIntervalMs", 
      "valueUnsignedInt": 5000 
    }
  ]
}

What this does

This destination sends SQL-on-FHIR rows to Databricks whenever a matching FHIR resource changes.

topic — the subscription topic to listen to.
viewDefinition — the flattened table shape to send.
batchSize — how many rows go into one delivery batch.
sendIntervalMs — how often pending rows are flushed.
// ... databricks params ... — Databricks-specific connection fields, omitted for clarity. See Data Lakehouse AidboxTopicDestination.

Create a Patient:

POST /fhir/Patient

{
  "name": [
    {
      "use": "official",
      "family": "Smith",
      "given": ["John"]
    }
  ],
  "gender": "male",
  "birthDate": "1990-01-15"
}

Check Databricks:

SELECT id, ts, gender, birth_date, family_name, given_name, is_deleted
FROM aidbox_export.fhir.patients;

id	ts	gender	birth_date	family_name	given_name	is_deleted
patient-1	2026-06-09T10:15:00Z	male	1990-01-15	Smith	John	0

The export engine adds an is_deleted column to the target table so downstream queries can filter out deleted resources without losing their history.

Initial FHIR Export Comes for Free

Most systems already contain data before the first stream starts. When AidboxTopicDestination starts, it automatically exports the current state of every row from the ViewDefinition before switching to live delivery — no separate bootstrap job, no custom migration script, no dual pipeline. Historical rows become the starting point, and new updates continue through the live stream.

This is the default. If you only want new data from the moment the destination is created, add skipInitialExport: true to the destination's parameter array:

{ "name": "skipInitialExport", "valueBoolean": true }

Batch Export from FHIR to Delta Lake

Need a one-time snapshot instead of a continuous stream? Use $viewdefinition-export.

POST /fhir/ViewDefinition/$viewdefinition-export
Prefer: respond-async

{
  "resourceType": "Parameters",
  "parameter": [
    {
      "name": "view",
      "part": [
        { 
          "name": "name", 
          "valueString": "patient_flat" 
        },
        {
          "name": "viewReference",
          "valueReference": {
            "reference": "ViewDefinition/patient_flat"
          }
        }
      ]
    },
    {
      "name": "kind",
      "valueString": "data-lakehouse"
    }
    // ... databricks parameters ...
  ]
}

One-shot export

Use $viewdefinition-export when you need a controlled batch job instead of continuous delivery.

Typical cases:

one-time snapshots
historical backfills
scheduled exports
recovery jobs
incremental export loops with _since

The operation runs asynchronously and returns a status URL in the Content-Location header.

Response:

202 Accepted
Content-Location: /fhir/ViewDefinition/$viewdefinition-export/status/<export-id>

Check status:

GET /fhir/ViewDefinition/$viewdefinition-export/status/<export-id>

When the export completes:

{
  "resourceType": "Parameters",
  "parameter": [
    {
      "name": "status",
      "valueCode": "completed"
    },
    {
      "name": "output",
      "part": [
        {
          "name": "location",
          "valueUri": "databricks-uc:aidbox_export.fhir.patients"
        }
      ]
    }
  ]
}

Query the resulting table from Databricks SQL:

SELECT id, ts, gender, birth_date, family_name, given_name, is_deleted
FROM aidbox_export.fhir.patients;

id	ts	gender	birth_date	family_name	given_name
patient-1	2026-06-09T10:15:00Z	male	1990-01-15	Smith	John
patient-2	2026-06-09T10:18:00Z	female	1984-07-22	Garcia	Maria
patient-3	2026-06-09T10:24:00Z	other	2001-03-08	Chen	Alex
patient-4	2026-06-09T10:31:00Z	female	1976-11-30	Okafor	Amara

Incremental FHIR Exports

Need a nightly incremental export instead of a continuous stream? Use _since. Aidbox filters rows by the ViewDefinition timestamp column generated from getAidboxTs(), so the export only contains rows changed after the watermark you pass in. Reuse the same $viewdefinition-export request shown above and add one more parameter:

{
  "name": "_since",
  "valueInstant": "2026-01-01T00:00:00Z"
}

To run incremental exports on a schedule, advance _since after each run:

Run $viewdefinition-export with _since.
Wait until the export status becomes completed.
Read exportEndTime from the completed response and store it.
Use that value as _since in the next run.

This gives you scheduled incremental exports without building a separate change-tracking pipeline.

AidboxTopicDestination vs $viewdefinition-export

	`AidboxTopicDestination`	`$viewdefinition-export`
Delivery	Continuous, event-driven	One-shot, run on demand
Initial export	Automatic	Manual
Updates	Near real-time, per resource change	Batch, per request
Incremental	Built-in via subscription topics	Manual via `_since`
Best fit	Production pipelines keeping Databricks tables continuously updated	Snapshots, historical backfills, scheduled jobs, recovery and replay

How FHIR to Delta Lake Export Works

FHIR to Databricks Delta Lake export architecture using SQL-on-FHIR ViewDefinitions

Both the AidboxTopicDestination initial export and $viewdefinition-export use the same export engine, which:

Reads rows from the materialized sof.<view> PostgreSQL view.
Splits the export into multiple chunks.
Writes Delta staging tables to S3.
Runs a final MERGE INTO operation against the Unity Catalog managed table.

The result is a Databricks Delta Lake table you can query from Databricks SQL, notebooks, dashboards, or downstream analytics tools.

For large exports, chunks can be processed in parallel across multiple Aidbox pods.

Other FHIR Analytics Destinations

Databricks is not the only supported analytics destination. If you're using ClickHouse, see ClickHouse AidboxTopicDestination; for BigQuery, see BigQuery AidboxTopicDestination. Both destinations use the same SQL-on-FHIR ViewDefinition approach and continuous delivery model.

Beyond Exports

For a tighter integration, Aidbox can run natively on Databricks Lakebase — see Building a FHIR-native health data platform on Databricks Lakebase for the architecture, or Health Samurai + Databricks for partnership details.

Conclusion

FHIR resources are optimized for interoperability and Databricks is optimized for analytics. SQL-on-FHIR ViewDefinitions bridge that gap by transforming FHIR resources into analytical tables. Use $viewdefinition-export when you need snapshots, backfills, or incremental exports, and AidboxTopicDestination when you need continuous delivery and an automatic initial load. Either way, Databricks Delta Lake receives flattened analytical rows instead of nested FHIR JSON.