There are two mature worlds that barely talk to each other.
On one side, FHIR. We finally have a lot of FHIR data — the large majority of US hospitals expose FHIR APIs and the share grows every year, while FHIR-first systems store clinical data natively in resources. On the other side, a very mature ecosystem of analytical tools: modern databases, BI platforms, dataframes, the whole modern data stack.
You can already load FHIR into these tools — modern databases handle nested data well: Postgres with binary JSON, BigQuery, Spark, DuckDB. In fact, the first version of SQL on FHIR was exactly that: query nested FHIR right in the database. It worked — and it didn't standardize. Every engine has its own dialect for nested data; a query written for BigQuery has nothing in common with the same query in Postgres. Nothing to standardize.
Underneath sits the old object-relational impedance mismatch: FHIR resources are nested documents, and the analytical world — SQL, BI tools, dataframes — still wants flat tables. The naive fix, generating a table for every FHIR element, just doesn't work: you get a thousand tables nobody can comprehend. So everyone ends up writing use-case-specific ETL to flatten resources — repeating the same work, slightly differently, with the same subtle bugs around arrays, choice types, and references.
SQL on FHIR version two is our attempt to build a standard bridge between these two worlds — this time by standardizing the flat views instead of the nested queries. It started as a community draft; today it's a full specification developed in the open, adopted into the HL7 ballot process, published under CC0 — with a peer-reviewed paper in npj Digital Medicine that validates the approach on ~300,000 patients and replicates a published clinical study on two independent implementation stacks. And in FHIR R6, ViewDefinition is on track to become a standard FHIR resource — an additional resource, R6's mechanism for growing the core specification in modules.
What the standard really buys you is separation: authoring is separated from implementation, and implementation from usage. One person describes the analytics; anyone's engine runs it; any tool consumes the result.
And it's no longer just view definitions you author. You describe your entire analytics as portable, standard artifacts — flat views, the transformation layers on top of them, queries, whole data marts and conversion pipelines. Run them on different engines, over different data sets, against different vendors' servers — and even distribute the whole thing as an implementation guide, like any other piece of the FHIR ecosystem. That's what "interoperable analytics" means, and the spec now delivers it at three levels: views, queries, and the API.
ViewDefinition: a language for flattening
A ViewDefinition is, simply put, a language for explaining to a server how to flatten resources into a flat table. It describes a tabular view of exactly one resource type — columns, filters, and unnesting — using a minimal FHIRPath subset:
{
"resourceType": "ViewDefinition",
"resource": "Patient",
"name": "patient_demographics",
"select": [
{
"column": [
{"name": "patient_id", "path": "getResourceKey()"},
{"name": "gender", "path": "gender"},
{"name": "dob", "path": "birthDate"},
{"name": "active", "path": "active", "type": "boolean"}
]
},
{
"forEach": "name.where(use = 'official').first()",
"column": [
{"path": "given.join(' ')", "name": "given_name"},
{"path": "family", "name": "family_name"}
]
}
]
}
Because the definition is declarative and engine-neutral, the same view runs on a JavaScript ETL runner, inside PostgreSQL, on Spark, or over a pile of ndjson files from Bulk Export. The whole model reduces to five composable functions, and a complete runner is small enough to read in an afternoon — the JavaScript reference implementation is roughly 400 lines:
| Function | What it does | SQL analogy |
|---|---|---|
column | Extract elements by FHIRPath into columns | SELECT |
where | Filter resources (e.g., by profile or status) | WHERE |
forEach / forEachOrNull | Unnest a collection into rows | INNER JOIN / LEFT OUTER JOIN against a nested table |
select | Cross-join parent columns with unnested rows | join of subselects |
unionAll | Concatenate rows from different branches | UNION ALL |
One honest disclaimer: this flattening is lossy, and we understand this. There is no universal tabular representation of FHIR, so views are use-case specific by design — I sometimes joke that a ViewDefinition is the CSV mode of FHIR. That's not a weakness; it's the contract: one engineer defines the view once, and other engineers and tools just use the flat table.
A natural unit for a view is a profile. Imagine a blood pressure profile — and on top of it a nice blood_pressure table with systolic and diastolic columns, one row per measurement. The profile pins down where the data lives in the resource; the view turns that knowledge into columns. Every well-defined profile is a flat table waiting to be declared.
Two recent additions are worth knowing:
repeathandles truly recursive structures — QuestionnaireResponse items, CodeSystem concepts — where you don't know the depth in advance. Give it the paths to traverse and every nested item becomes a row, whatever its level.%rowIndexcaptures the position of an element during iteration. SQL result sets are unordered, but FHIR arrays are not — the firstnameis not the same as the third. The index lets you preserve FHIR ordering and build surrogate keys.
Joins without joins
A single ViewDefinition never joins resources — by design. Instead, two functions emit keys that your database joins on: getResourceKey() for the row's primary key, and subject.getReferenceKey(Patient) for the foreign key (with a type filter that returns an empty collection — a null in the output — if the reference points elsewhere). A Condition view carries patient_id; joining conditions to patients is a plain SQL join in whatever engine you use.
How keys are actually derived — plain id, primary identifier, hash — is left to the implementation. That's deliberate: it's what makes the same view portable across systems with different data invariants.
And the things ViewDefinition deliberately doesn't do — no cross-resource joins, no aggregation, no sorting, no output formats — are not gaps. Every one of them was a trade-off we debated: should we complicate every view runner, or delegate to the database? Databases are specialized in joins; we don't cross the resource boundary. That restraint is what keeps a runner implementable everywhere from a 400-line library to a distributed SQL engine.
Runners themselves come in two flavors. ETL runners take a stream of resources and produce rows — trivial to write if you have a FHIRPath engine (implementers have reported writing one in Rust in about a month). ELT runners are more like transpilers than runners: they compile a ViewDefinition into a sophisticated SQL query over a FHIR-native database, so the view becomes a real database view and nothing is duplicated — hundreds of view definitions can run over the same resource table.
And the spec was built bottom-up, not top-down: implementations existed before the first release. A shared test suite is part of the specification — dataset, view definition, expected result — and the conformance matrix is implementers running the same tests and reporting back. That's how we guarantee the implementations stay compatible with each other.
SQLQuery and SQLView: queries become shareable
Flat tables are half the bridge. The other half: where do the queries live? The cohort definition, the quality measure, the dashboard query — these are the least portable artifacts in healthcare analytics today, scattered across wikis and notebooks, married to one site's schema.
So we paired the view definition with a query resource, and now you can share the whole thing. SQLQuery is a profile on the FHIR Library resource that packages one logical SQL query:
{
"resourceType": "Library",
"meta": {"profile": ["https://sql-on-fhir.org/ig/StructureDefinition/SQLQuery"]},
"type": {"coding": [{"system": "https://sql-on-fhir.org/ig/CodeSystem/LibraryTypesCodes", "code": "sql-query"}]},
"name": "DiagnosisByAgeSummary",
"status": "active",
"relatedArtifact": [
{"type": "depends-on", "resource": "https://example.org/ViewDefinition/patient_demographics", "label": "pt"},
{"type": "depends-on", "resource": "https://example.org/ViewDefinition/diagnoses_view", "label": "dg"}
],
"parameter": [
{"name": "from_date", "type": "date", "use": "in"}
],
"content": [{
"contentType": "application/sql",
"extension": [{
"url": "https://sql-on-fhir.org/ig/StructureDefinition/sql-text",
"valueString": "SELECT pt.gender, dg.code, count(*) FROM pt JOIN dg USING (patient_id) WHERE dg.onset >= :from_date GROUP BY 1, 2"
}],
"data": "..."
}]
}
The structure breaks down into three ideas:
- Dependencies as aliases. You declare which view definitions the query depends on, and the
labelbecomes the table name in your SQL. The query never hardcodes physical table names — the execution environment resolvesptanddgto whatever the views are materialized as. - Safe parameters. Parameters are declared in the Library and referenced as
:from_dateplaceholders. The spec is blunt: string interpolation MUST NOT be used — real binding only. - Dialect variants. One Library can carry a portable
application/sqldefault plus;dialect=postgresqlor;dialect=sparkattachments, as long as they're functionally equivalent.
There's also an authoring convenience the spec describes for tooling: write a plain .sql file with a few annotation comments (@name, @param, @relatedDependency) and let a builder turn it into the Library resource. SQL stays the source of truth; FHIR becomes the packaging.
SQLView is the newest profile, and it exists for one reason: so we can stack queries one on top of another. It's almost the same as SQLQuery but without parameters — and the intention is different. A query is something you run; a view describes a table. When you say "I need a view in a database", people understand what you're talking about — that's why it's a separate profile rather than a flag on SQLQuery.
How SQLViews stack
An SQLView is a Library with type = sql-view, a canonical URL, dependencies, and SQL. Here's one that defines "active patients" on top of the patient_demographics ViewDefinition:
{
"resourceType": "Library",
"meta": {"profile": ["https://sql-on-fhir.org/ig/StructureDefinition/SQLView"]},
"type": {"coding": [{"system": "https://sql-on-fhir.org/ig/CodeSystem/LibraryTypesCodes", "code": "sql-view"}]},
"url": "https://example.org/Library/ActivePatientsView",
"name": "ActivePatientsView",
"status": "active",
"relatedArtifact": [
{"type": "depends-on", "resource": "https://example.org/ViewDefinition/patient_demographics", "label": "pt"}
],
"content": [{
"contentType": "application/sql",
"extension": [{
"url": "https://sql-on-fhir.org/ig/StructureDefinition/sql-text",
"valueString": "SELECT patient_id, gender, dob FROM pt WHERE active = true"
}],
"data": "..."
}]
}
The crucial part is the url. Because the view has a canonical URL, anything can now depend on it the same way it would depend on a ViewDefinition — just add a relatedArtifact entry and use the label as a table name. A second view builds on the first:
-- SQLView: DiabeticPatientsView
-- depends on: .../Library/ActivePatientsView as ap
-- depends on: .../ViewDefinition/diagnoses_view as dg
SELECT ap.patient_id, ap.gender, ap.dob, dg.onset
FROM ap
JOIN dg USING (patient_id)
WHERE dg.code = '44054006' -- type 2 diabetes (SNOMED)
And a parameterized SQLQuery sits on top of both layers for the final report. The dependency rules are simple:
- A SQLView may depend on ViewDefinitions and other SQLViews — never on SQLQueries.
- A SQLQuery may depend on ViewDefinitions and SQLViews.
- The references form a directed graph, and you keep it acyclic — like views in any database.
Stack enough of these and you've described a complete analytical system — in the same terms a data team would use for any warehouse. ViewDefinitions are the staging layer: raw FHIR landed as flat tables. SQLViews are the intermediate models: cleaned, filtered, joined building blocks. The top of the graph is the data mart: the cohort registers, fact tables, and parameterized reports that analysts actually touch.
Each node is small, readable, and testable on its own: staging views only reshape, each intermediate model adds one transformation step, queries only parameterize the final cut. Complexity lives in the composition, not in any single artifact — which is exactly how mature analytical systems are built inside warehouses today. There's no ceiling here: a disease registry, a quality-measure suite, a dimensional model with facts and dimensions, a hundred-table conversion — anything you can express as layered SQL over flat views, you can express as this graph.
Two bonuses fall out for free. The dependency graph is your data lineage: for any column in the mart you can trace, artifact by artifact, the path back to the FHIRPath expression that produced it. And the whole graph ships as FHIR resources.
What the spec deliberately does not mandate is execution: whether ActivePatientsView becomes an inlined CTE, a database view, or a materialized table is the engine's call — the graph describes the logic, the engine picks the physics.
A full-featured DSL for ELT
Tell a data engineer "we have a DAG here" — and they'll understand immediately. The graph you just saw is ELT, the dominant pattern of the modern data stack: load raw data first, transform it in layers inside the engine. If you know dbt, you already know this shape. The moment we added the ability to build queries on top of queries, SQL on FHIR became a full-featured DSL for describing ELT pipelines: ViewDefinition gives you the EL — extract FHIR, load flat tables — and SQLView with SQLQuery add the T. The loop is closed.
So what's actually new here, compared to the ELT tooling data teams already run? The distribution model. A transformation project is normally code in your repository, assuming your warehouse. A SQL on FHIR pipeline is a set of resources with canonical URLs that you can publish in an implementation guide, version, and run against any conformant stack. And because the artifacts are technology-neutral, people will write translators from them into stack-specific assets — a warehouse view, a step in your orchestrator, a model in whatever transformation framework your team runs. We describe the data mart once; the target technology is a compilation detail.
This also completes the authoring story. Take the blood pressure example from earlier — the profile and its systolic/diastolic view — and now add a set of useful queries on top: hypertension reports, trend dashboards. Profile, view, queries — one shippable pack. We believe SQL on FHIR becomes part of authoring itself: an IG that defines data should ship the views and queries to analyze it.
The API: portable clients, no vendor lock-in
The third piece is a standard HTTP API, and it matters for the same reason the artifacts do: without it, your tooling is married to a vendor even if your views aren't. With it, a dashboard, a pipeline, a notebook — anything that speaks the API — works against any conformant server. Swap the backend, keep the workflow. Clients discover what a server supports through the standard CapabilityStatement.
The API is three verbs. Each answers a different question:
| Verb | Question it answers | Operations | Mode |
|---|---|---|---|
| run | "Give me the rows, now" | $viewdefinition-run, $sqlview-run*, $sqlquery-run | Synchronous, streamed |
| export | "Build the files, tell me when done" | $viewdefinition-export, $sqlview-export*, $sqlquery-export | Async, to file storage |
| materialize | "Keep a table fresh for me" | $materialize* | Async, server-managed |
* $sqlview-run, $sqlview-export, and $materialize are coming to the spec — the working group is standardizing them now.
Note the symmetry: every artifact in the dependency graph — ViewDefinition, SQLView, SQLQuery — gets the same verbs. You can run or export any node of your pipeline, whether it's a raw staging view, an intermediate model, or a final parameterized query. And materialize applies to both ViewDefinitions and SQLViews — any non-parameterized node can become a managed table.
run — for authoring and real-time
$viewdefinition-run — invoked as $run on the ViewDefinition resource — is a synchronous operation designed for authoring and small-dataset use cases. The simplest call is one GET on a stored view:
GET /ViewDefinition/patient-demographics/$run?_format=csv&_limit=100
Accept: text/csv
id,birthDate,family,given
pt-1,1990-01-15,Smith,John
pt-2,1985-03-22,Johnson,Mary
When the view isn't saved yet, you POST it inline as a Parameters body — optionally together with the resources to transform. That's the authoring loop: edit the definition, POST it, see rows, repeat. It's also the real-time path: a dashboard widget or an AI agent that wants a patient's conditions as a flat table rather than a resource graph. Runtime concerns stay at runtime: patient, group, _since, and _limit are operation parameters, not view properties — the same view definition serves the full-population export and the single-patient query.
$sqlquery-run is the same verb one layer up: execute a stored or inline SQLQuery against the materialized views, passing query parameters by name (a nested Parameters resource, bound safely to the declared Library.parameter entries), with _format and _limit for output control. And $sqlview-run (coming to the spec) fills the middle: evaluate any intermediate SQLView — with its whole dependency subgraph resolved by the server — and stream the rows back. Very handy when you're debugging one node of a deep pipeline.
export — bulk export, upgraded
Think of $viewdefinition-export as an upgraded FHIR Bulk Export. With classic Bulk Export you get all resources, as raw ndjson — and then the flattening is your problem: you stand up an ETL pipeline just to make the data queryable. With SQL on FHIR export you ask for the views you really need, and what lands in your bucket is already flat — csv, ndjson, or parquet, ready to be read by Spark, DuckDB, Athena, or loaded into a warehouse.
The flow is four steps:
- Kick off. POST a list of views with the async header:
POST /ViewDefinition/$viewdefinition-export HTTP/1.1
Prefer: respond-async
Content-Type: application/fhir+json
{
"resourceType": "Parameters",
"parameter": [
{"name": "view", "part": [{"name": "viewReference",
"valueReference": {"reference": "ViewDefinition/patient-demographics"}}]},
{"name": "view", "part": [{"name": "viewReference",
"valueReference": {"reference": "ViewDefinition/diagnoses"}}]},
{"name": "_format", "valueCode": "parquet"}
]
}
-
Get a ticket. The server replies
202 Acceptedwith aContent-Locationheader — your status URL. -
Poll. The status URL returns
202while the job runs (with optional progress). When the job completes, it answers303 See Otherwith aLocationheader pointing to the result. -
Collect the files. GET the result URL — the response lists an output URL per view:
{
"resourceType": "Parameters",
"parameter": [
{"name": "exportId", "valueString": "job-42"},
{"name": "status", "valueCode": "completed"},
{"name": "output", "part": [
{"name": "name", "valueString": "patient_demographics"},
{"name": "location", "valueUri": "https://storage.example.org/exports/patient_demographics.parquet"}
]},
{"name": "output", "part": [
{"name": "name", "valueString": "diagnoses"},
{"name": "location", "valueUri": "https://storage.example.org/exports/diagnoses.parquet"}
]}
]
}
If you've implemented Bulk Data Export, you already know this dance — same async choreography, but the payload is analysis-ready tables instead of raw resources. It's less stuff to export, you skip the ETL step entirely, and a server that supports export natively can optimize heavily under the hood.
The same filters apply as for run — patient, group, _since — so a payer can export one member's views just as easily as a whole population. And the verb extends up the graph: $sqlview-export (coming to the spec) materializes an intermediate model into files, $sqlquery-export does the same for query results too large for a synchronous response. Export the staging layer for your lake, or export the finished mart — your choice of cut point.
materialize — "hey server, keep it up to date"
$materialize is the operation where you tell the server: here is my view definition — build a managed view from it, and keep it up to date as the data changes. It's an asynchronous operation: you give it a target name and an update policy (manual, or scheduled with a cron expression), the server builds the view in the background, and when the job completes you get a reference to the materialized view you can query from then on:
POST /ViewDefinition/patient-demographics/$materialize HTTP/1.1
Prefer: respond-async
Content-Type: application/fhir+json
{
"resourceType": "Parameters",
"parameter": [
{"name": "targetName", "valueString": "patient_demographics"},
{"name": "updatePolicy", "valueCode": "scheduled"},
{"name": "schedule", "valueString": "0 0 * * *"}
]
}
When the job completes you get back a reference to the materialized view; how it's exposed for querying — schema, naming, access — is up to the implementation, with targetName as the requested handle. From that point the server owns freshness (nightly in this example), and any SQL tool just queries the result:
SELECT * FROM patient_demographics WHERE dob > '1990-01-01';
Your BI tool connects to a table and never knows FHIR was involved.
The same verb applies to SQLViews. Materializing an intermediate model is exactly what you do in a warehouse when a view gets hot: the server resolves the dependency subgraph, builds the table, and owns its refresh. Which nodes of your DAG are materialized and which stay virtual becomes a runtime tuning decision — the pipeline definition doesn't change.
This pattern is already proven in production. In Aidbox we implemented $materialize over PostgreSQL, plus adapters to ClickHouse, BigQuery, and Databricks: a very efficient initial load — millions of resources in seconds — and then the table is kept up to date in near real time using subscriptions. Same view definition, four different engines — which is rather the point. The operation is now being standardized so that "keep this table fresh" becomes a portable request, not a vendor feature.
Together: run for development and real-time access, export for feeding external engines, materialize for in-database analytics — all standard endpoints, all vendor-neutral.
Where this is heading: from the user's perspective the server becomes a magic box. You send it view definitions and queries; the views stay up to date; you just run your reports. And since LLMs are already quite good at authoring SQL and view definitions, the next interface on top of this box is plain language — with the standard API as what the agent drives underneath.
FHIR to OMOP: battle-proving the DSL
So that's the full toolkit: a DSL for staging (ViewDefinition), a DSL for transformations (SQLView/SQLQuery), and an API to run it all. The best way to find out whether a toolkit is real is to throw the toughest conversion we know at it — and that's FHIR to OMOP. Frankly, this project already dictated parts of the design: we needed views of views to express it, and that need is a big part of why SQLView exists.
OMOP CDM is the OHDSI standard for observational research, and what I like about OMOP is that it's extremely pragmatic: a fixed set of tables, all codes normalized to standard concepts, the whole infrastructure — terminology included — sitting right there in the database. FHIR is the transactional model; OMOP is the analytical one. As FHIR-first systems spread, "FHIR for OLTP, OMOP for OLAP" becomes the default architecture, and the conversion layer between them becomes critical infrastructure.
The OMOP community itself usually builds pipelines in ELT style. So do we — the conversion is a layered DAG of exactly the artifacts described above:
The basics, layer by layer.
Staging ViewDefinitions flatten each resource into exactly what OMOP needs — keys, dates, and source codings, one row per coding:
{
"resourceType": "ViewDefinition",
"name": "condition_staging",
"resource": "Condition",
"select": [
{
"column": [
{"name": "condition_id", "path": "getResourceKey()", "type": "string"},
{"name": "person_id", "path": "subject.getReferenceKey(Patient)", "type": "string"},
{"name": "start_date", "path": "onset.ofType(dateTime)", "type": "dateTime"}
]
},
{
"forEach": "code.coding",
"column": [
{"name": "source_system", "path": "system", "type": "uri"},
{"name": "source_code", "path": "code", "type": "code"}
]
}
]
}
Mapping SQLViews hold the semantics: the OMOP Athena vocabularies are loaded right into the database, and concept-id resolution is a JOIN, not a terminology-server call:
-- SQLView: ConditionMappedView
-- depends on: .../ViewDefinition/condition_staging as cs
SELECT cs.condition_id,
cs.person_id,
std.concept_id_2 AS condition_concept_id,
cs.start_date,
src.concept_id AS condition_source_concept_id,
cs.source_code AS condition_source_value,
std_c.domain_id AS target_domain
FROM cs
JOIN concept src
ON src.concept_code = cs.source_code AND src.vocabulary_id = 'ICD10CM'
JOIN concept_relationship std
ON std.concept_id_1 = src.concept_id AND std.relationship_id = 'Maps to'
JOIN concept std_c
ON std_c.concept_id = std.concept_id_2
It looks tricky, but it's kind of trivial: it does the mapping, it does the joining, it translates on the fly. And the joins naturally handle the genuinely hard parts — Maps to fan-out (one ICD code becoming several SNOMED rows), domain routing (a FHIR Condition landing in condition_occurrence, observation, or measurement depending on the target concept's domain), and discard rules for unmappable records.
Load SQLQueries read the mapped views and populate the CDM tables:
INSERT INTO condition_occurrence
SELECT condition_id, person_id, condition_concept_id,
start_date, condition_source_concept_id, condition_source_value
FROM cm
WHERE target_domain = 'Condition'
Why SQL and not FHIRPath or the FHIR Mapping Language? Because for this job you'd need lookups into mapping tables, splitting one record into many, conditional logic across vocabularies. You could in principle route every code through a terminology server's $translate — but that's not viable for bulk transformation; people will burn CPUs doing it row by row. And this has to be efficient at population scale — billions of records, not thousands. For us this is just SQL, and set-based joins over billions of rows is the one problem databases have spent fifty years getting good at.
A full disclaimer: this project — fhir2omop — is at a very early stage, a work in progress in the open. The ideas we're exploring: profile-gated conversion (a resource converts to an OMOP table iff it validates against a gating FHIR profile), golden test cases (a FHIR resource in, the exact OMOP rows out — because corner cases are exactly what samples make visible), and jurisdiction modules like US Core to OMOP or German ICD to OMOP that the community can extend.
But the goal is bigger than one converter. FHIR-to-OMOP is how we battle-prove SQL on FHIR: it's the toughest conversion around, and if the DSL can express it — the staging, the vocabulary joins, the domain routing, all of it as portable artifacts — then everything simpler comes for free. So we're inviting everybody: OMOP people, FHIR people, data engineers. Bring your mappings, your corner cases, your skepticism — the working group is dedicating sessions to OMOP, and the room is open.
Come build it with us
Join the #analytics-on-FHIR stream on chat.fhir.org and the weekly working group calls. Try the playground for a five-minute taste, or the full DevDays workshop — PostgreSQL, Grafana, Jupyter, Synthea data — for a hands-on start. And mark your calendar for FHIR Analytics — our free online conference dedicated to exactly this space.
And if you want to run all of this today: Aidbox is a transactional FHIR server with built-in real-time analytics on SQL on FHIR — and the first FHIR server to pass all SQL on FHIR tests. ViewDefinitions over PostgreSQL, a visual ViewDefinition builder and SQL view/query managers in the UI, $materialize, and adapters that keep your tables fresh in ClickHouse, BigQuery, and Databricks. The transactional and the analytical world, finally on one bridge.

![Terminology is fun: CodeableConcept.coding[]](/assets/images/articles/horizontal-3.avif?v=2nw7.mq9)



