---
{
  "title": "How to Export FHIR Data to Databricks Delta Lake",
  "description": "Export SQL-on-FHIR ViewDefinitions into Databricks Unity Catalog managed Delta tables using continuous delivery or one-shot exports.",
  "date": "2026-06-15",
  "author": "Sviatoslav Krivosheev",
  "reading-time": "8 min read",
  "tags": ["SQL on FHIR", "Analytics", "Aidbox", "Integrations", "Databricks"],
  "tldr": "Aidbox 2605 can export FHIR data to Databricks Delta Lake as flattened SQL-on-FHIR rows. Use [$viewdefinition-export](https://www.health-samurai.io/docs/aidbox/modules/sql-on-fhir/operation-viewdefinition-export) for snapshots, backfills, and incremental exports, or [AidboxTopicDestination](https://www.health-samurai.io/docs/aidbox/tutorials/subscriptions-tutorials/data-lakehouse-aidboxtopicdestination) for continuous delivery with automatic initial export.",
  "utm-campaign": "analytics",
  "utm-content": "delta-lake-export",
  "hide-comments": true,
  "hero-image": "fhir-databricks-delta-export.svg",
  "hero-image-alt": "FHIR data export from Aidbox to Databricks Delta Lake",
  "hero-image-position": "before-tldr",
}
---

FHIR resources are optimized for interoperability, while analytics platforms such as Databricks work best with tables. Healthcare analytics teams constantly run into that gap — they need FHIR data inside Databricks for reporting, research, machine learning, and population health analytics, but the data arrives as nested JSON instead of rows.

Since Aidbox 2605, you can close that gap directly: export SQL-on-FHIR ViewDefinitions straight into Databricks Unity Catalog managed Delta tables. This article shows how to build that pipeline using:

- `$viewdefinition-export` for snapshots and backfills
- `AidboxTopicDestination` for continuous delivery with automatic initial export

## Why Databricks for FHIR Analytics?

FHIR is a good format for exchanging healthcare data, but it is not a convenient shape for analytics. Analytics teams want to query cohorts, build dashboards, join clinical data with operational data, and prepare datasets for research or machine learning — and Databricks gives them a lakehouse platform built for exactly that:

- Delta Lake tables for analytical storage
- Unity Catalog for governance
- Databricks SQL for querying
- Notebooks and Jobs for data processing
- ML and AI workflows on top of the same tables

The missing piece is the FHIR-to-table transformation, and that is what SQL-on-FHIR ViewDefinitions provide: they define how FHIR resources become analytical rows before those rows land in Databricks.

## Convert FHIR Resources into Analytics Tables

FHIR resources are nested JSON, but most analytics workloads need relational tables. That mismatch is usually where the pain starts — someone builds an ETL pipeline, someone else maintains it, and six months later the pipeline has become a product of its own.

Aidbox solves this with SQL-on-FHIR: define a ViewDefinition and FHIR resources turn into rows.

```json
{
  "resourceType": "ViewDefinition",
  "id": "patient_flat",
  "resource": "Patient",
  "select": [
    {
      "column": [
        { "name": "id", "path": "id" },
        { "name": "ts", "path": "getAidboxTs()" },
        { "name": "gender", "path": "gender" },
        { "name": "birth_date", "path": "birthDate" },
        { "name": "family_name", "path": "name.where(use = 'official').family.first()" },
        { "name": "given_name", "path": "name.where(use = 'official').given.first()" }
      ]
    }
  ]
}
```

### What this does

This ViewDefinition takes `Patient` resources and turns them into flat analytical rows.

- `resource: Patient` selects the source FHIR resource type.
- `id`, `gender`, `birth_date`, `family_name`, and `given_name` become table columns.
- `getAidboxTs()` adds a timestamp column used for incremental exports.
- After materialization, Aidbox exposes the result as `sof.patient_flat`.

Materialize the view:

```http
POST /fhir/ViewDefinition/patient_flat/$materialize
```

A Patient resource:

```json
{
  "resourceType": "Patient",
  "id": "patient-1",
  "gender": "male",
  "birthDate": "1990-01-15",
  "name": [
    {
      "use": "official",
      "family": "Smith",
      "given": ["John"]
    }
  ]
}
```

becomes a row you can query:

```sql
SELECT id, ts, gender, birth_date, family_name, given_name
FROM sof.patient_flat;
```

| id        | ts                   | gender | birth_date | family_name | given_name |
| --------- | -------------------- | ------ | ---------- | ----------- | ---------- |
| patient-1 | 2026-06-09T10:15:00Z | male   | 1990-01-15 | Smith       | John       |

Aidbox exposes the result as the `sof.patient_flat` PostgreSQL view, which becomes the source for both batch exports and continuous delivery.

{% hint style="info" %}
Not familiar with SQL-on-FHIR?

Read our articles:

👉 [What is a ViewDefinition](https://www.health-samurai.io/articles/what-is-a-viewdefinition)

👉 [SQL on FHIR: How It Works, Benefits & Use Cases](https://www.health-samurai.io/articles/sql-on-fhir-an-inside-look)
{% endhint %}

## Continuous Delivery from FHIR to Databricks

Aidbox [topic-based subscriptions](https://www.health-samurai.io/docs/aidbox/modules/topic-based-subscriptions/aidbox-topic-based-subscriptions) are the event layer behind continuous delivery — at-least-once semantics, append-only target. You define an `AidboxSubscriptionTopic` for the resource changes you care about, then attach an `AidboxTopicDestination` that decides where those events should go. In this case, the destination is not a webhook or a queue — it is a Data Lakehouse writer that takes rows from the SQL-on-FHIR ViewDefinition and delivers them to Databricks.

Create a topic that listens for Patient changes:

```http
POST /fhir/AidboxSubscriptionTopic

{
  "resourceType": "AidboxSubscriptionTopic",
  "url": "http://example.org/subscriptions/patient-updates",
  "status": "active",
  "trigger": [
    {
      "resource": "Patient",
      "supportedInteraction": ["create", "update", "delete"]
    }
  ]
}
```

Create a Data Lakehouse destination:

```json
{
  "resourceType": "AidboxTopicDestination",
  "id": "patient-databricks",
  "topic": "http://example.org/subscriptions/patient-updates",
  "kind": "data-lakehouse-at-least-once",
  "parameter": [
    // ... databricks params ...
    { "name": "viewDefinition", 
      "valueString": "patient_flat" 
    },
    { "name": "batchSize", 
      "valueUnsignedInt": 50 
    },
    { "name": "sendIntervalMs", 
      "valueUnsignedInt": 5000 
    }
  ]
}
```

### What this does

This destination sends SQL-on-FHIR rows to Databricks whenever a matching FHIR resource changes.

- `topic` — the subscription topic to listen to.
- `viewDefinition` — the flattened table shape to send.
- `batchSize` — how many rows go into one delivery batch.
- `sendIntervalMs` — how often pending rows are flushed.
- `// ... databricks params ...` — Databricks-specific connection fields, omitted for clarity. See [Data Lakehouse AidboxTopicDestination](https://www.health-samurai.io/docs/aidbox/tutorials/subscriptions-tutorials/data-lakehouse-aidboxtopicdestination).

Create a Patient:

```http
POST /fhir/Patient

{
  "name": [
    {
      "use": "official",
      "family": "Smith",
      "given": ["John"]
    }
  ],
  "gender": "male",
  "birthDate": "1990-01-15"
}
```

Check Databricks:

```sql
SELECT id, ts, gender, birth_date, family_name, given_name, is_deleted
FROM aidbox_export.fhir.patients;
```

| id        | ts                   | gender | birth_date | family_name | given_name | is_deleted |
| --------- | -------------------- | ------ | ---------- | ----------- | ---------- | ---------- |
| patient-1 | 2026-06-09T10:15:00Z | male   | 1990-01-15 | Smith       | John       | 0          |

The export engine adds an `is_deleted` column to the target table so downstream queries can filter out deleted resources without losing their history.

## Initial FHIR Export Comes for Free

Most systems already contain data before the first stream starts. When AidboxTopicDestination starts, it automatically exports the current state of every row from the ViewDefinition before switching to live delivery — no separate bootstrap job, no custom migration script, no dual pipeline. Historical rows become the starting point, and new updates continue through the live stream.

This is the default. If you only want new data from the moment the destination is created, add `skipInitialExport: true` to the destination's `parameter` array:

```json
{ "name": "skipInitialExport", "valueBoolean": true }
```

{% hint style="info" %}
Read more in Aidbox Docs:

👉 [Initial export for Data Lakehouse AidboxTopicDestination](https://www.health-samurai.io/docs/aidbox/tutorials/subscriptions-tutorials/data-lakehouse-aidboxtopicdestination#initial-export)
{% endhint %}

## Batch Export from FHIR to Delta Lake

Need a one-time snapshot instead of a continuous stream? Use `$viewdefinition-export`.

```http
POST /fhir/ViewDefinition/$viewdefinition-export
Prefer: respond-async

{
  "resourceType": "Parameters",
  "parameter": [
    {
      "name": "view",
      "part": [
        { 
          "name": "name", 
          "valueString": "patient_flat" 
        },
        {
          "name": "viewReference",
          "valueReference": {
            "reference": "ViewDefinition/patient_flat"
          }
        }
      ]
    },
    {
      "name": "kind",
      "valueString": "data-lakehouse"
    }
    // ... databricks parameters ...
  ]
}
```

### One-shot export

Use `$viewdefinition-export` when you need a controlled batch job instead of continuous delivery.

Typical cases:

- one-time snapshots
- historical backfills
- scheduled exports
- recovery jobs
- incremental export loops with `_since`

The operation runs asynchronously and returns a status URL in the `Content-Location` header.

Response:

```http
202 Accepted
Content-Location: /fhir/ViewDefinition/$viewdefinition-export/status/<export-id>
```

Check status:

```http
GET /fhir/ViewDefinition/$viewdefinition-export/status/<export-id>
```

When the export completes:

```json
{
  "resourceType": "Parameters",
  "parameter": [
    {
      "name": "status",
      "valueCode": "completed"
    },
    {
      "name": "output",
      "part": [
        {
          "name": "location",
          "valueUri": "databricks-uc:aidbox_export.fhir.patients"
        }
      ]
    }
  ]
}
```

Query the resulting table from Databricks SQL:

```sql
SELECT id, ts, gender, birth_date, family_name, given_name, is_deleted
FROM aidbox_export.fhir.patients;
```

| id        | ts                   | gender | birth_date | family_name | given_name | is_deleted |
| --------- | -------------------- | ------ | ---------- | ----------- | ---------- | ---------- |
| patient-1 | 2026-06-09T10:15:00Z | male   | 1990-01-15 | Smith       | John       | 0          |
| patient-2 | 2026-06-09T10:18:00Z | female | 1984-07-22 | Garcia      | Maria      | 0          |
| patient-3 | 2026-06-09T10:24:00Z | other  | 2001-03-08 | Chen        | Alex       | 0          |
| patient-4 | 2026-06-09T10:31:00Z | female | 1976-11-30 | Okafor      | Amara      | 0          |

## Incremental FHIR Exports

Need a nightly incremental export instead of a continuous stream? Use `_since`. Aidbox filters rows by the ViewDefinition timestamp column generated from `getAidboxTs()`, so the export only contains rows changed after the watermark you pass in. Reuse the same `$viewdefinition-export` request shown above and add one more parameter:

```json
{
  "name": "_since",
  "valueInstant": "2026-01-01T00:00:00Z"
}
```

To run incremental exports on a schedule, advance `_since` after each run:

1. Run `$viewdefinition-export` with `_since`.
2. Wait until the export status becomes `completed`.
3. Read `exportEndTime` from the completed response and store it.
4. Use that value as `_since` in the next run.

This gives you scheduled incremental exports without building a separate change-tracking pipeline.

## AidboxTopicDestination vs $viewdefinition-export

|                    | `AidboxTopicDestination`                                            | `$viewdefinition-export`                                             |
| ------------------ | ------------------------------------------------------------------- | -------------------------------------------------------------------- |
| **Delivery**       | Continuous, event-driven                                            | One-shot, run on demand                                              |
| **Initial export** | Automatic                                                           | Manual                                                               |
| **Updates**        | Near real-time, per resource change                                 | Batch, per request                                                   |
| **Incremental**    | Built-in via subscription topics                                    | Manual via `_since`                                                  |
| **Best fit**       | Production pipelines keeping Databricks tables continuously updated | Snapshots, historical backfills, scheduled jobs, recovery and replay |

## How FHIR to Delta Lake Export Works

![FHIR to Databricks Delta Lake export architecture using SQL-on-FHIR ViewDefinitions](aidbox-databricks-bulk.svg)

Both the AidboxTopicDestination initial export and `$viewdefinition-export` use the same export engine, which:

1. Reads rows from the materialized `sof.<view>` PostgreSQL view.
2. Splits the export into multiple chunks.
3. Writes Delta staging tables to S3.
4. Runs a final `MERGE INTO` operation against the Unity Catalog managed table.

The result is a Databricks Delta Lake table you can query from Databricks SQL, notebooks, dashboards, or downstream analytics tools.

**For large exports, chunks can be processed in parallel across multiple Aidbox pods.**

## Other FHIR Analytics Destinations

Databricks is not the only supported analytics destination. If you're using ClickHouse, see [ClickHouse AidboxTopicDestination](https://www.health-samurai.io/docs/aidbox/tutorials/subscriptions-tutorials/clickhouse-aidboxtopicdestination); for BigQuery, see [BigQuery AidboxTopicDestination](https://www.health-samurai.io/docs/aidbox/tutorials/subscriptions-tutorials/bigquery-aidboxtopicdestination). Both destinations use the same SQL-on-FHIR ViewDefinition approach and continuous delivery model.

## Beyond Exports

For a tighter integration, Aidbox can run natively on **Databricks Lakebase** — see [Building a FHIR-native health data platform on Databricks Lakebase](https://www.databricks.com/blog/building-fhir-native-health-data-platform-databricks-lakebase) for the architecture, or [Health Samurai + Databricks](https://www.health-samurai.io/partners/databricks) for partnership details.

## Conclusion

FHIR resources are optimized for interoperability and Databricks is optimized for analytics. SQL-on-FHIR ViewDefinitions bridge that gap by transforming FHIR resources into analytical tables. Use `$viewdefinition-export` when you need snapshots, backfills, or incremental exports, and `AidboxTopicDestination` when you need continuous delivery and an automatic initial load. Either way, Databricks Delta Lake receives flattened analytical rows instead of nested FHIR JSON.

Further reading:

- [Data Lakehouse AidboxTopicDestination](https://www.health-samurai.io/docs/aidbox/tutorials/subscriptions-tutorials/data-lakehouse-aidboxtopicdestination)
- [$viewdefinition-export](https://www.health-samurai.io/docs/aidbox/modules/sql-on-fhir/operation-viewdefinition-export)

{% hint style="info" %}
🤔 Have a question?

Ask us in Zulip chat: https://connect.health-samurai.io/
{% endhint %}
