$viewdefinition-export operation
Available in Aidbox versions 2605 and later. Requires fhir-schema mode. Implements SQL-on-FHIR v2 $viewdefinition-export.
Exports ViewDefinition results to a destination. Use kind to choose the Aidbox module backend.
Registered backends
kind | Sink | Module |
|---|---|---|
data-lakehouse | Databricks Unity Catalog managed Delta table | Data Lakehouse module |
Databricks Delta table
The Databricks-side setup (catalog, schema, target table, staging schema, service principal, grants, warehouse) must be in place before you kick off an export. The Data Lakehouse tutorial documents it, and the same setup applies here.
Kick-off
POST /fhir/ViewDefinition/$viewdefinition-export
Content-Type: application/fhir+json
Prefer: respond-async
{
"resourceType": "Parameters",
"parameter": [
{"name": "view",
"part": [{"name": "name", "valueString": "patient_flat"},
{"name": "viewReference", "valueReference": {"reference": "ViewDefinition/patient_flat"}}]},
{"name": "kind", "valueString": "data-lakehouse"},
{"name": "writeMode", "valueString": "managed-zerobus"},
{"name": "databricksWorkspaceUrl", "valueString": "https://xxx.cloud.databricks.com"},
{"name": "databricksWorkspaceId", "valueString": "xxx"},
{"name": "databricksRegion", "valueString": "xxx"},
{"name": "tableName", "valueString": "xxx.xxx.patient_flat"},
{"name": "databricksWarehouseId", "valueString": "xxx"},
{"name": "awsRegion", "valueString": "xxx"},
{"name": "stagingTablePath", "valueString": "s3://xxx/staging/patient_flat/"},
{"name": "chunkCount", "valueUnsignedInt": 1},
{"name": "_since", "valueInstant": "2026-01-01T00:00:00Z"}
]
}
Response:
202 Accepted
Content-Location: /fhir/ViewDefinition/$viewdefinition-export/status/<export-id>
{
"resourceType": "Parameters",
"parameter": [
{"name": "exportId", "valueString": "<uuid>"},
{"name": "status", "valueCode": "in-progress"},
{"name": "location", "valueUri": "/fhir/ViewDefinition/$viewdefinition-export/status/<uuid>"}
]
}
Spec parameters
| Parameter | Required | Notes |
|---|---|---|
view | yes | Exactly one entry. viewReference must point at a server-stored ViewDefinition. Inline viewResource is not yet supported. |
kind | yes | Selects the backend (e.g. data-lakehouse). |
clientTrackingId | no | Echoed back in the status response. |
_format | no | ndjson, parquet, json, or omitted. Ignored: the backend picks the sink format (Delta for kind=data-lakehouse). |
header | no | Echoed; not meaningful for non-CSV sinks. |
_since | no | ISO-8601 instant. Filters the source view by its timestamp column (the column whose ViewDefinition path is getAidboxTs()). 400 :no-timestamp-column if _since is set but the view exposes no such column. See _since (incremental export). |
patient (0..*) | no | List of Patient references. Accepted but not yet applied; the module exports the full view. |
group (0..*) | no | List of Group references. Same status as patient. |
source | no | External data source URI. Not supported; the server rejects it. |
Data Lakehouse backend parameters
| Parameter | Required | Notes |
|---|---|---|
writeMode | yes | managed-zerobus (default; REST row-insert) or managed-sql (SQL warehouse INSERT). |
tableName | yes | Managed UC table full name catalog.schema.table. |
databricksWorkspaceUrl | yes | https://<workspace>.cloud.databricks.com. |
databricksWorkspaceId | yes | Numeric workspace id (for Zerobus URL). |
databricksRegion | yes | Workspace AWS region. |
databricksWarehouseId | yes | SQL warehouse id. Used at setup for schema sync and at finalize for the MERGE. |
awsRegion | yes | AWS region of the S3 staging bucket. |
stagingTablePath | yes | s3://bucket/staging/<table>/. Must start with s3:// or s3a://. |
chunkCount | no | Positive integer (default 1). Splits the export into N per-chunk staging tables and N concurrent writers. See Large-scale and multi-pod execution for sizing. |
The module reads OAuth M2M credentials from Aidbox-wide settings: the BOX_DATABRICKS_DATA_LAKEHOUSE_CLIENT_ID and _CLIENT_SECRET env vars or the corresponding settings registry entries. Per-request parameters do NOT accept them. See the Data Lakehouse tutorial for the full Databricks-side setup.
Status polling
GET /fhir/ViewDefinition/$viewdefinition-export/status/<export-id>
Response codes:
202 Accepted: still in progress. The server returns the sameContent-Locationso the client can keep polling.200 OK: terminal. Body is aParametersresource with the final shape (status=completed,status=failed, orstatus=cancelled, plusoutput[].locationon success).404 Not Found: unknownexport-id.
Completed output for kind=data-lakehouse:
{
"resourceType": "Parameters",
"parameter": [
{ "name": "exportId", "valueString": "<uuid>" },
{ "name": "status", "valueCode": "completed" },
{ "name": "clientTrackingId", "valueString": "..." },
{ "name": "exportStartTime", "valueInstant": "2026-05-22T00:00:00Z" },
{ "name": "exportEndTime", "valueInstant": "2026-05-22T00:01:30Z" },
{
"name": "output",
"part": [
{ "name": "name", "valueString": "patient_flat" },
{
"name": "location",
"valueUri": "databricks-uc:catalog.schema.patient_flat"
}
]
}
]
}
Cancellation
DELETE /fhir/ViewDefinition/$viewdefinition-export/status/<export-id>
Stops an in-flight export and triggers backend cleanup (the data-lakehouse backend drops per-chunk staging tables and recursively deletes the S3 staging prefixes it created, best-effort). Response codes:
202 Accepted: cancel acknowledged, async cleanup running. Body reportsstatus=canceling. Subsequent status polls return200 OKwithstatus=cancelledonce cleanup completes.200 OK: operation already terminal (completed,failed, orcancelled). DELETE is idempotent.404 Not Found: unknownexport-id.
What cancellation does not do:
- It does not roll back rows already merged into the target managed table. The merge is the last step of finalize-export. If cancel arrives before finalize, no rows have landed; if after finalize, the operation is already terminal.
- It does not delete history rows of chunks that already completed.
- It does not hard-kill a running JVM task. Running chunks stop when they observe the cancel marker.
Failure model
- Input validation failures (missing
view, missingkind, multiple views,sourceset, etc.): synchronous400 OperationOutcomefrom the kick-offPOST. The server does not allocate anexport-id. - Backend-side failures: async. Kick-off returns
202with anexport-id; status polling later reportsstatus=failedwith the error in theerrorparameter. Common causes:- No backend registered for
kind(typo, module not deployed). - Databricks auth (bad
client-idorclient-secret). - Missing target table or missing required parameter (e.g., no
tableName). - Schema mismatch the module can't auto-
ALTER.
- No backend registered for
How it works (kind=data-lakehouse)
With chunkCount = N (default N=1), for each chunk the module creates a per-chunk external Delta staging table, fills it from the hash-partitioned sof.<view>, then runs a single MERGE INTO target to materialize the union into the managed target and drops every staging. Failed chunks retry with backoff; if retries are exhausted the operation reports status=failed and runs best-effort cleanup on the stagings.
The MERGE is idempotent on id. A retried export after a lost response inserts nothing instead of duplicating. Your ViewDefinition must select an id column.
Staging cleanup on S3
The module treats stagingTablePath as reusable scratch space:
- at setup, it recursively deletes the root
stagingTablePathbefore chunk fan-out; - before each chunk creates its staging table, it recursively deletes that chunk's sub-prefix;
- after successful finalize, it drops the per-chunk Unity Catalog staging tables and recursively deletes both the per-chunk prefixes and the root staging prefix;
- on cancellation and failed chunks, the module runs the same cleanup best-effort.
Two consequences worth knowing:
- If you rotate
stagingTablePathbetween runs (for example, a date-stamped prefix), Aidbox can only clean the prefix used by the current run. Older prefixes are your responsibility. - Cleanup is best-effort. If S3, Databricks credential vending, or grants fail during cleanup, the export still reports according to the data path result and logs the cleanup problem. A later run that reuses the same prefix retries at setup.
- The auto-cleanup uses Unity Catalog
temporary-path-credentials, so the principal needsEXTERNAL_USE_LOCATIONon the External Location that coversstagingTablePath. Without that grant the module skips cleanup and emits astaging-s3-cleanup-skippedevent; the export itself still runs. See the Data Lakehouse tutorial, Databricks-side setup for the grant table.
_since (incremental export)
_since makes the export incremental. Instead of materializing the full view, the module filters the source query by the view's timestamp column.
Which column it filters on. The column in your ViewDefinition whose path is getAidboxTs() (this FHIRPath returns meta.lastUpdated). For example:
{
"resourceType": "ViewDefinition",
"name": "patient_flat",
"select": [
{
"column": [
{ "name": "id", "path": "id" },
{ "name": "ts", "path": "getAidboxTs()" },
{ "name": "family", "path": "name.family.first()" }
]
}
]
}
Here _since=2026-01-01T00:00:00Z filters as WHERE ts >= '2026-01-01T00:00:00Z'.
Failure mode. If the ViewDefinition exposes no column whose path resolves to getAidboxTs(), kick-off fails synchronously with 400 OperationOutcome and reason :no-timestamp-column. Either add such a column to the view or omit _since.
Typical use case. Cron-driven incremental exports. Persist the exportEndTime from the completed status response, then pass it as _since on the next kick-off. Each run materializes only rows changed since the previous run finished.
Large-scale and multi-pod execution
Chunks run on async-api, so they distribute across every pod in the Aidbox cluster. Status polling and cancellation work from any pod. If a pod fails mid-chunk, another pod picks up the chunk.
Capacity caps
Effective cluster-wide concurrent chunks is the smallest of three ceilings: the requested chunkCount, the total scheduler capacity across pods, and the total DB headroom across pods. The relevant per-pod knobs:
scheduler-executor-threads: hard cap on async-api task execution per pod. Excess chunks wait in the async-api queue until a slot frees up.db.pool.maximum-pool-size: chunks may claim at mostpool size − 4slots per pod, because each running chunk holds a long-lived PostgreSQL cursor onsof.<view>for its entire lifetime. The other4slots stay reserved for normal request traffic, sender workers, status polling, and the async-api coordinator.
The kick-off handler validates only the receiving pod's pool size − 4 and returns 400 parallelism-exceeds-pool if chunkCount doesn't fit. It does not validate the scheduler-thread cap or the cluster-wide DB sum at kick-off; surplus chunks queue and the export runs slower than expected.
Sizing rule of thumb: start with chunkCount = pods × 4. To go higher, raise scheduler-executor-threads and db.pool.maximum-pool-size by the same factor on every pod; otherwise kick-off rejects the request or excess chunks queue.
JVM heap
Each running chunk buffers a Parquet file in memory up to targetFileSizeMb (default 128 MiB) before flushing. Peak heap from staging buffers on a pod is roughly that times the number of chunks running on it. The default Aidbox heap fits a single-cursor (chunkCount=1) export. For higher parallelism, raise the JVM heap (via JAVA_OPTS) or lower targetFileSizeMb.
Differences vs AidboxTopicDestination initial export
Both flows produce the same output (flattened FHIR rows merged into a managed Delta table) and use the same parallelism sizing. They differ in how you start them and where you read progress from:
$viewdefinition-export | AidboxTopicDestination initial export | |
|---|---|---|
| When it runs | On demand, when you POST this operation | Once, when you create a destination with skipInitialExport=false |
| Progress polling | GET /fhir/ViewDefinition/$viewdefinition-export/status/<export-id> | GET /fhir/AidboxTopicDestination/<id>/$status |
| After completion | The operation is done. No follow-up writes. | The destination keeps streaming live FHIR changes into the same target table. |
Use this operation for one-shot snapshots and backfills. Use the destination flow when you want the same initial fill plus continuous replication afterwards. For tutorial-specific operational notes on the destination flow, see Large-scale initial export.
Limitations
- The Data Lakehouse backend's staging path supports AWS S3 only (
s3://...ands3a://...). - One
viewper request (spec allows1..*). patientandgroupfilters extracted but not yet applied to the SQL.estimatedTimeRemainingis not computed.