FHIR resources are optimized for interoperability, while analytics platforms such as Databricks work best with tables. Healthcare analytics teams constantly run into that gap — they need FHIR data inside Databricks for reporting, research, machine learning, and population health analytics, but the data arrives as nested JSON instead of rows.
Since Aidbox 2605, you can close that gap directly: export SQL-on-FHIR ViewDefinitions straight into Databricks Unity Catalog managed Delta tables. This article shows how to build that pipeline using:
$viewdefinition-exportfor snapshots and backfillsAidboxTopicDestinationfor continuous delivery with automatic initial export
Why Databricks for FHIR Analytics?
FHIR is a good format for exchanging healthcare data, but it is not a convenient shape for analytics. Analytics teams want to query cohorts, build dashboards, join clinical data with operational data, and prepare datasets for research or machine learning — and Databricks gives them a lakehouse platform built for exactly that:
- Delta Lake tables for analytical storage
- Unity Catalog for governance
- Databricks SQL for querying
- Notebooks and Jobs for data processing
- ML and AI workflows on top of the same tables
The missing piece is the FHIR-to-table transformation, and that is what SQL-on-FHIR ViewDefinitions provide: they define how FHIR resources become analytical rows before those rows land in Databricks.
Convert FHIR Resources into Analytics Tables
FHIR resources are nested JSON, but most analytics workloads need relational tables. That mismatch is usually where the pain starts — someone builds an ETL pipeline, someone else maintains it, and six months later the pipeline has become a product of its own.
Aidbox solves this with SQL-on-FHIR: define a ViewDefinition and FHIR resources turn into rows.
{
"resourceType": "ViewDefinition",
"id": "patient_flat",
"resource": "Patient",
"select": [
{
"column": [
{ "name": "id", "path": "id" },
{ "name": "ts", "path": "getAidboxTs()" },
{ "name": "gender", "path": "gender" },
{ "name": "birth_date", "path": "birthDate" },
{ "name": "family_name", "path": "name.where(use = 'official').family.first()" },
{ "name": "given_name", "path": "name.where(use = 'official').given.first()" }
]
}
]
}
What this does
This ViewDefinition takes Patient resources and turns them into flat analytical rows.
resource: Patientselects the source FHIR resource type.id,gender,birth_date,family_name, andgiven_namebecome table columns.getAidboxTs()adds a timestamp column used for incremental exports.- After materialization, Aidbox exposes the result as
sof.patient_flat.
Materialize the view:
POST /fhir/ViewDefinition/patient_flat/$materialize
A Patient resource:
{
"resourceType": "Patient",
"id": "patient-1",
"gender": "male",
"birthDate": "1990-01-15",
"name": [
{
"use": "official",
"family": "Smith",
"given": ["John"]
}
]
}
becomes a row you can query:
SELECT id, ts, gender, birth_date, family_name, given_name
FROM sof.patient_flat;
| id | ts | gender | birth_date | family_name | given_name |
|---|---|---|---|---|---|
| patient-1 | 2026-06-09T10:15:00Z | male | 1990-01-15 | Smith | John |
Aidbox exposes the result as the sof.patient_flat PostgreSQL view, which becomes the source for both batch exports and continuous delivery.
Not familiar with SQL-on-FHIR?
Read our articles:
Continuous Delivery from FHIR to Databricks
Aidbox topic-based subscriptions are the event layer behind continuous delivery — at-least-once semantics, append-only target. You define an AidboxSubscriptionTopic for the resource changes you care about, then attach an AidboxTopicDestination that decides where those events should go. In this case, the destination is not a webhook or a queue — it is a Data Lakehouse writer that takes rows from the SQL-on-FHIR ViewDefinition and delivers them to Databricks.
Create a topic that listens for Patient changes:
POST /fhir/AidboxSubscriptionTopic
{
"resourceType": "AidboxSubscriptionTopic",
"url": "http://example.org/subscriptions/patient-updates",
"status": "active",
"trigger": [
{
"resource": "Patient",
"supportedInteraction": ["create", "update", "delete"]
}
]
}
Create a Data Lakehouse destination:
{
"resourceType": "AidboxTopicDestination",
"id": "patient-databricks",
"topic": "http://example.org/subscriptions/patient-updates",
"kind": "data-lakehouse-at-least-once",
"parameter": [
// ... databricks params ...
{ "name": "viewDefinition",
"valueString": "patient_flat"
},
{ "name": "batchSize",
"valueUnsignedInt": 50
},
{ "name": "sendIntervalMs",
"valueUnsignedInt": 5000
}
]
}
What this does
This destination sends SQL-on-FHIR rows to Databricks whenever a matching FHIR resource changes.
topic — the subscription topic to listen to.viewDefinition — the flattened table shape to send.batchSize — how many rows go into one delivery batch.sendIntervalMs — how often pending rows are flushed.// ... databricks params ... — Databricks-specific connection fields, omitted for clarity. See Data Lakehouse AidboxTopicDestination.
Create a Patient:
POST /fhir/Patient
{
"name": [
{
"use": "official",
"family": "Smith",
"given": ["John"]
}
],
"gender": "male",
"birthDate": "1990-01-15"
}
Check Databricks:
SELECT id, ts, gender, birth_date, family_name, given_name, is_deleted
FROM aidbox_export.fhir.patients;
| id | ts | gender | birth_date | family_name | given_name | is_deleted |
|---|---|---|---|---|---|---|
| patient-1 | 2026-06-09T10:15:00Z | male | 1990-01-15 | Smith | John | 0 |
The export engine adds an is_deleted column to the target table so downstream queries can filter out deleted resources without losing their history.
Initial FHIR Export Comes for Free
Most systems already contain data before the first stream starts. When AidboxTopicDestination starts, it automatically exports the current state of every row from the ViewDefinition before switching to live delivery — no separate bootstrap job, no custom migration script, no dual pipeline. Historical rows become the starting point, and new updates continue through the live stream.
This is the default. If you only want new data from the moment the destination is created, add skipInitialExport: true to the destination's parameter array:
{ "name": "skipInitialExport", "valueBoolean": true }
Read more in Aidbox Docs:
👉 Initial export for Data Lakehouse AidboxTopicDestination
Batch Export from FHIR to Delta Lake
Need a one-time snapshot instead of a continuous stream? Use $viewdefinition-export.
POST /fhir/ViewDefinition/$viewdefinition-export
Prefer: respond-async
{
"resourceType": "Parameters",
"parameter": [
{
"name": "view",
"part": [
{
"name": "name",
"valueString": "patient_flat"
},
{
"name": "viewReference",
"valueReference": {
"reference": "ViewDefinition/patient_flat"
}
}
]
},
{
"name": "kind",
"valueString": "data-lakehouse"
}
// ... databricks parameters ...
]
}
One-shot export
Use $viewdefinition-export when you need a controlled batch job instead of continuous delivery.
Typical cases:
- one-time snapshots
- historical backfills
- scheduled exports
- recovery jobs
- incremental export loops with
_since
The operation runs asynchronously and returns a status URL in the Content-Location header.
Response:
202 Accepted
Content-Location: /fhir/ViewDefinition/$viewdefinition-export/status/<export-id>
Check status:
GET /fhir/ViewDefinition/$viewdefinition-export/status/<export-id>
When the export completes:
{
"resourceType": "Parameters",
"parameter": [
{
"name": "status",
"valueCode": "completed"
},
{
"name": "output",
"part": [
{
"name": "location",
"valueUri": "databricks-uc:aidbox_export.fhir.patients"
}
]
}
]
}
Query the resulting table from Databricks SQL:
SELECT id, ts, gender, birth_date, family_name, given_name, is_deleted
FROM aidbox_export.fhir.patients;
| id | ts | gender | birth_date | family_name | given_name | is_deleted |
|---|---|---|---|---|---|---|
| patient-1 | 2026-06-09T10:15:00Z | male | 1990-01-15 | Smith | John | 0 |
| patient-2 | 2026-06-09T10:18:00Z | female | 1984-07-22 | Garcia | Maria | 0 |
| patient-3 | 2026-06-09T10:24:00Z | other | 2001-03-08 | Chen | Alex | 0 |
| patient-4 | 2026-06-09T10:31:00Z | female | 1976-11-30 | Okafor | Amara | 0 |
Incremental FHIR Exports
Need a nightly incremental export instead of a continuous stream? Use _since. Aidbox filters rows by the ViewDefinition timestamp column generated from getAidboxTs(), so the export only contains rows changed after the watermark you pass in. Reuse the same $viewdefinition-export request shown above and add one more parameter:
{
"name": "_since",
"valueInstant": "2026-01-01T00:00:00Z"
}
To run incremental exports on a schedule, advance _since after each run:
- Run
$viewdefinition-exportwith_since. - Wait until the export status becomes
completed. - Read
exportEndTimefrom the completed response and store it. - Use that value as
_sincein the next run.
This gives you scheduled incremental exports without building a separate change-tracking pipeline.
AidboxTopicDestination vs $viewdefinition-export
AidboxTopicDestination | $viewdefinition-export | |
|---|---|---|
| Delivery | Continuous, event-driven | One-shot, run on demand |
| Initial export | Automatic | Manual |
| Updates | Near real-time, per resource change | Batch, per request |
| Incremental | Built-in via subscription topics | Manual via _since |
| Best fit | Production pipelines keeping Databricks tables continuously updated | Snapshots, historical backfills, scheduled jobs, recovery and replay |
How FHIR to Delta Lake Export Works
Both the AidboxTopicDestination initial export and $viewdefinition-export use the same export engine, which:
- Reads rows from the materialized
sof.<view>PostgreSQL view. - Splits the export into multiple chunks.
- Writes Delta staging tables to S3.
- Runs a final
MERGE INTOoperation against the Unity Catalog managed table.
The result is a Databricks Delta Lake table you can query from Databricks SQL, notebooks, dashboards, or downstream analytics tools.
For large exports, chunks can be processed in parallel across multiple Aidbox pods.
Other FHIR Analytics Destinations
Databricks is not the only supported analytics destination. If you're using ClickHouse, see ClickHouse AidboxTopicDestination; for BigQuery, see BigQuery AidboxTopicDestination. Both destinations use the same SQL-on-FHIR ViewDefinition approach and continuous delivery model.
Beyond Exports
For a tighter integration, Aidbox can run natively on Databricks Lakebase — see Building a FHIR-native health data platform on Databricks Lakebase for the architecture, or Health Samurai + Databricks for partnership details.
Conclusion
FHIR resources are optimized for interoperability and Databricks is optimized for analytics. SQL-on-FHIR ViewDefinitions bridge that gap by transforming FHIR resources into analytical tables. Use $viewdefinition-export when you need snapshots, backfills, or incremental exports, and AidboxTopicDestination when you need continuous delivery and an automatic initial load. Either way, Databricks Delta Lake receives flattened analytical rows instead of nested FHIR JSON.
Further reading:
🤔 Have a question?
Ask us in Zulip chat: https://connect.health-samurai.io/




