---
{
  "title": "SQL on FHIR: Interoperable Analytics",
  "description": "How SQL on FHIR makes healthcare analytics interoperable: ViewDefinition, the new SQLQuery and SQLView profiles, the run/export/materialize API, and a dbt-style ELT pipeline for FHIR to OMOP.",
  "date": "2026-06-11",
  "author": "Nikolai Ryzhikov",
  "reading-time": "22 min read",
  "tags": [
    "SQL on FHIR",
    "Analytics",
    "FHIR Standard"
  ],
  "tldr": "SQL on FHIR started as a community spec for flattening FHIR resources. Today it's a full-featured DSL for describing ELT and interoperable analytics: ViewDefinition for staging, SQLQuery and SQLView for layered transformations, and a standard run/export/materialize API — every layer separating authoring from implementation and usage. Next steps: FHIR R6 core, where ViewDefinition becomes a standard resource, and an open FHIR-to-OMOP implementation guide built on this DSL.",
  "utm-campaign": "analytics",
  "utm-content": "interoperable-analytics"
}
---
There are two mature worlds that barely talk to each other.

On one side, FHIR. We finally have a lot of FHIR data — [the large majority of US hospitals expose FHIR APIs](https://healthit.gov/data/data-briefs/hospital-use-apis-enable-data-sharing-between-ehrs-and-apps/) and the share grows every year, while FHIR-first systems store clinical data natively in resources. On the other side, a very mature ecosystem of analytical tools: modern databases, BI platforms, dataframes, the whole modern data stack.

You can already load FHIR into these tools — modern databases handle nested data well: Postgres with binary JSON, BigQuery, Spark, DuckDB. In fact, the first version of SQL on FHIR was exactly that: query nested FHIR right in the database. It worked — and it didn't standardize. Every engine has its own dialect for nested data; a query written for BigQuery has nothing in common with the same query in Postgres. Nothing to standardize.

Underneath sits the old **object-relational impedance mismatch**: FHIR resources are nested documents, and the analytical world — SQL, BI tools, dataframes — still wants flat tables. The naive fix, generating a table for every FHIR element, just doesn't work: you get a thousand tables nobody can comprehend. So everyone ends up writing use-case-specific ETL to flatten resources — repeating the same work, slightly differently, with the same subtle bugs around arrays, choice types, and references.

[SQL on FHIR](https://build.fhir.org/ig/HL7/sql-on-fhir/) version two is our attempt to build a standard bridge between these two worlds — this time by standardizing the *flat views* instead of the nested queries. It started as a community draft; today it's a full specification developed in the open, adopted into the HL7 ballot process, published under CC0 — with a peer-reviewed paper in [npj Digital Medicine](https://www.nature.com/articles/s41746-025-01708-w) that validates the approach on ~300,000 patients and replicates a published clinical study on two independent implementation stacks. And in FHIR R6, ViewDefinition is on track to become a standard FHIR resource — an *additional resource*, R6's mechanism for growing the core specification in modules.

What the standard really buys you is separation: **authoring is separated from implementation, and implementation from usage**. One person describes the analytics; anyone's engine runs it; any tool consumes the result.

And it's no longer just view definitions you author. You describe your *entire analytics* as portable, standard artifacts — flat views, the transformation layers on top of them, queries, whole data marts and conversion pipelines. Run them on different engines, over different data sets, against different vendors' servers — and even distribute the whole thing as an implementation guide, like any other piece of the FHIR ecosystem. That's what "interoperable analytics" means, and the spec now delivers it at three levels: views, queries, and the API.

## ViewDefinition: a language for flattening

A ViewDefinition is, simply put, a language for explaining to a server how to flatten resources into a flat table. It describes a tabular view of exactly one resource type — columns, filters, and unnesting — using a [minimal FHIRPath subset](https://build.fhir.org/ig/HL7/sql-on-fhir/StructureDefinition-ViewDefinition.html#fhirpath-functionality):

```json
{
  "resourceType": "ViewDefinition",
  "resource": "Patient",
  "name": "patient_demographics",
  "select": [
    {
      "column": [
        {"name": "patient_id", "path": "getResourceKey()"},
        {"name": "gender", "path": "gender"},
        {"name": "dob", "path": "birthDate"},
        {"name": "active", "path": "active", "type": "boolean"}
      ]
    },
    {
      "forEach": "name.where(use = 'official').first()",
      "column": [
        {"path": "given.join(' ')", "name": "given_name"},
        {"path": "family", "name": "family_name"}
      ]
    }
  ]
}
```

Because the definition is declarative and engine-neutral, the same view runs on a JavaScript ETL runner, inside PostgreSQL, on Spark, or over a pile of ndjson files from Bulk Export. The whole model reduces to five composable functions, and a complete runner is small enough to read in an afternoon — the [JavaScript reference implementation](https://github.com/FHIR/sql-on-fhir.js) is roughly 400 lines:

| Function | What it does | SQL analogy |
|----------|-------------|-------------|
| `column` | Extract elements by FHIRPath into columns | `SELECT` |
| `where` | Filter resources (e.g., by profile or status) | `WHERE` |
| `forEach` / `forEachOrNull` | Unnest a collection into rows | `INNER JOIN` / `LEFT OUTER JOIN` against a nested table |
| `select` | Cross-join parent columns with unnested rows | join of subselects |
| `unionAll` | Concatenate rows from different branches | `UNION ALL` |

One honest disclaimer: this flattening is **lossy, and we understand this**. There is no universal tabular representation of FHIR, so views are use-case specific by design — I sometimes joke that a ViewDefinition is the CSV mode of FHIR. That's not a weakness; it's the contract: one engineer defines the view once, and other engineers and tools just use the flat table.

A natural unit for a view is a profile. Imagine a blood pressure profile — and on top of it a nice `blood_pressure` table with `systolic` and `diastolic` columns, one row per measurement. The profile pins down where the data lives in the resource; the view turns that knowledge into columns. Every well-defined profile is a flat table waiting to be declared.

Two recent additions are worth knowing:

- **`repeat`** handles truly recursive structures — QuestionnaireResponse items, CodeSystem concepts — where you don't know the depth in advance. Give it the paths to traverse and every nested item becomes a row, whatever its level.
- **`%rowIndex`** captures the position of an element during iteration. SQL result sets are unordered, but FHIR arrays are not — the first `name` is not the same as the third. The index lets you preserve FHIR ordering and build surrogate keys.

### Joins without joins

A single ViewDefinition never joins resources — by design. Instead, two functions emit keys that your database joins on: `getResourceKey()` for the row's primary key, and `subject.getReferenceKey(Patient)` for the foreign key (with a type filter that returns an empty collection — a null in the output — if the reference points elsewhere). A Condition view carries `patient_id`; joining conditions to patients is a plain SQL join in whatever engine you use.

How keys are actually derived — plain `id`, primary identifier, hash — is left to the implementation. That's deliberate: it's what makes the same view portable across systems with different data invariants.

And the things ViewDefinition deliberately *doesn't* do — no cross-resource joins, no aggregation, no sorting, no output formats — are not gaps. Every one of them was a trade-off we debated: should we complicate every view runner, or delegate to the database? Databases are specialized in joins; we don't cross the resource boundary. That restraint is what keeps a runner implementable everywhere from a 400-line library to a distributed SQL engine.

Runners themselves come in two flavors. **ETL runners** take a stream of resources and produce rows — trivial to write if you have a FHIRPath engine (implementers have reported writing one in Rust in about a month). **ELT runners** are more like transpilers than runners: they compile a ViewDefinition into a sophisticated SQL query over a FHIR-native database, so the view becomes a real database view and nothing is duplicated — hundreds of view definitions can run over the same resource table.

And the spec was built bottom-up, not top-down: [implementations existed *before* the first release](https://sql-on-fhir.org/extra/impls.html). A shared test suite is part of the specification — dataset, view definition, expected result — and the conformance matrix is implementers running the same tests and reporting back. That's how we guarantee the implementations stay compatible with each other.

## SQLQuery and SQLView: queries become shareable

Flat tables are half the bridge. The other half: where do the *queries* live? The cohort definition, the quality measure, the dashboard query — these are the least portable artifacts in healthcare analytics today, scattered across wikis and notebooks, married to one site's schema.

So we paired the view definition with a query resource, and now you can share the whole thing. **SQLQuery** is a profile on the FHIR Library resource that packages one logical SQL query:

```json
{
  "resourceType": "Library",
  "meta": {"profile": ["https://sql-on-fhir.org/ig/StructureDefinition/SQLQuery"]},
  "type": {"coding": [{"system": "https://sql-on-fhir.org/ig/CodeSystem/LibraryTypesCodes", "code": "sql-query"}]},
  "name": "DiagnosisByAgeSummary",
  "status": "active",
  "relatedArtifact": [
    {"type": "depends-on", "resource": "https://example.org/ViewDefinition/patient_demographics", "label": "pt"},
    {"type": "depends-on", "resource": "https://example.org/ViewDefinition/diagnoses_view", "label": "dg"}
  ],
  "parameter": [
    {"name": "from_date", "type": "date", "use": "in"}
  ],
  "content": [{
    "contentType": "application/sql",
    "extension": [{
      "url": "https://sql-on-fhir.org/ig/StructureDefinition/sql-text",
      "valueString": "SELECT pt.gender, dg.code, count(*) FROM pt JOIN dg USING (patient_id) WHERE dg.onset >= :from_date GROUP BY 1, 2"
    }],
    "data": "..."
  }]
}
```

The structure breaks down into three ideas:

- **Dependencies as aliases.** You declare which view definitions the query depends on, and the `label` becomes the table name in your SQL. The query never hardcodes physical table names — the execution environment resolves `pt` and `dg` to whatever the views are materialized as.
- **Safe parameters.** Parameters are declared in the Library and referenced as `:from_date` placeholders. The spec is blunt: string interpolation MUST NOT be used — real binding only.
- **Dialect variants.** One Library can carry a portable `application/sql` default plus `;dialect=postgresql` or `;dialect=spark` attachments, as long as they're functionally equivalent.

There's also an authoring convenience the spec describes for tooling: write a plain `.sql` file with a few annotation comments (`@name`, `@param`, `@relatedDependency`) and let a builder turn it into the Library resource. SQL stays the source of truth; FHIR becomes the packaging.

**SQLView** is the newest profile, and it exists for one reason: so we can stack queries one on top of another. It's almost the same as SQLQuery but without parameters — and the intention is different. A query is something you run; a view describes a table. When you say "I need a view in a database", people understand what you're talking about — that's why it's a separate profile rather than a flag on SQLQuery.

### How SQLViews stack

An SQLView is a Library with `type = sql-view`, a canonical URL, dependencies, and SQL. Here's one that defines "active patients" on top of the `patient_demographics` ViewDefinition:

```json
{
  "resourceType": "Library",
  "meta": {"profile": ["https://sql-on-fhir.org/ig/StructureDefinition/SQLView"]},
  "type": {"coding": [{"system": "https://sql-on-fhir.org/ig/CodeSystem/LibraryTypesCodes", "code": "sql-view"}]},
  "url": "https://example.org/Library/ActivePatientsView",
  "name": "ActivePatientsView",
  "status": "active",
  "relatedArtifact": [
    {"type": "depends-on", "resource": "https://example.org/ViewDefinition/patient_demographics", "label": "pt"}
  ],
  "content": [{
    "contentType": "application/sql",
    "extension": [{
      "url": "https://sql-on-fhir.org/ig/StructureDefinition/sql-text",
      "valueString": "SELECT patient_id, gender, dob FROM pt WHERE active = true"
    }],
    "data": "..."
  }]
}
```

The crucial part is the `url`. Because the view has a canonical URL, *anything* can now depend on it the same way it would depend on a ViewDefinition — just add a `relatedArtifact` entry and use the `label` as a table name. A second view builds on the first:

```sql
-- SQLView: DiabeticPatientsView
-- depends on: .../Library/ActivePatientsView as ap
-- depends on: .../ViewDefinition/diagnoses_view as dg
SELECT ap.patient_id, ap.gender, ap.dob, dg.onset
  FROM ap
  JOIN dg USING (patient_id)
 WHERE dg.code = '44054006'   -- type 2 diabetes (SNOMED)
```

And a parameterized SQLQuery sits on top of both layers for the final report. The dependency rules are simple:

- A **SQLView** may depend on ViewDefinitions and other SQLViews — never on SQLQueries.
- A **SQLQuery** may depend on ViewDefinitions and SQLViews.
- The references form a directed graph, and you keep it acyclic — like views in any database.

Stack enough of these and you've described a complete analytical system — in the same terms a data team would use for any warehouse. ViewDefinitions are the **staging layer**: raw FHIR landed as flat tables. SQLViews are the **intermediate models**: cleaned, filtered, joined building blocks. The top of the graph is the **data mart**: the cohort registers, fact tables, and parameterized reports that analysts actually touch.

```mermaid
flowchart LR
    subgraph staging ["Staging layer — ViewDefinitions"]
        VD1[patient_demographics]
        VD2[diagnoses_view]
        VD3[observations_view]
    end
    subgraph views ["Intermediate models — SQLViews"]
        SV1[ActivePatientsView]
        SV2[DiabeticPatientsView]
        SV3[LatestHbA1cView]
        SV4[DiabetesRegisterView]
    end
    subgraph queries ["Data mart — SQLQueries"]
        Q1["RegisterByAgeReport(:from_date)"]
        Q2["PatientDrilldown(:patient_id)"]
    end
    VD1 --> SV1
    SV1 --> SV2
    VD2 --> SV2
    VD3 --> SV3
    SV2 --> SV4
    SV3 --> SV4
    SV4 --> Q1
    SV4 --> Q2
```

Each node is small, readable, and testable on its own: staging views only reshape, each intermediate model adds one transformation step, queries only parameterize the final cut. Complexity lives in the *composition*, not in any single artifact — which is exactly how mature analytical systems are built inside warehouses today. There's no ceiling here: a disease registry, a quality-measure suite, a dimensional model with facts and dimensions, a hundred-table conversion — anything you can express as layered SQL over flat views, you can express as this graph.

Two bonuses fall out for free. The dependency graph *is* your data lineage: for any column in the mart you can trace, artifact by artifact, the path back to the FHIRPath expression that produced it. And the whole graph ships as FHIR resources.

What the spec deliberately does *not* mandate is execution: whether `ActivePatientsView` becomes an inlined CTE, a database view, or a materialized table is the engine's call — the graph describes the logic, the engine picks the physics.

### A full-featured DSL for ELT

Tell a data engineer "we have a DAG here" — and they'll understand immediately. The graph you just saw *is* ELT, the dominant pattern of the modern data stack: load raw data first, transform it in layers inside the engine. If you know dbt, you already know this shape. The moment we added the ability to build queries on top of queries, SQL on FHIR became a full-featured DSL for describing ELT pipelines: ViewDefinition gives you the **EL** — extract FHIR, load flat tables — and SQLView with SQLQuery add the **T**. The loop is closed.

So what's actually new here, compared to the ELT tooling data teams already run? The distribution model. A transformation project is normally code in *your* repository, assuming *your* warehouse. A SQL on FHIR pipeline is a set of resources with canonical URLs that you can publish in an implementation guide, version, and run against any conformant stack. And because the artifacts are technology-neutral, people will write translators from them into stack-specific assets — a warehouse view, a step in your orchestrator, a model in whatever transformation framework your team runs. We describe the data mart once; the target technology is a compilation detail.

This also completes the authoring story. Take the blood pressure example from earlier — the profile and its `systolic`/`diastolic` view — and now add a set of useful queries on top: hypertension reports, trend dashboards. Profile, view, queries — one shippable pack. We believe SQL on FHIR becomes part of authoring itself: an IG that defines data should ship the views and queries to analyze it.

## The API: portable clients, no vendor lock-in

The third piece is a standard HTTP API, and it matters for the same reason the artifacts do: without it, your *tooling* is married to a vendor even if your views aren't. With it, a dashboard, a pipeline, a notebook — anything that speaks the API — works against any conformant server. Swap the backend, keep the workflow. Clients discover what a server supports through the standard CapabilityStatement.

The API is three verbs. Each answers a different question:

| Verb | Question it answers | Operations | Mode |
|------|--------------------|------------|------|
| **run** | "Give me the rows, now" | `$viewdefinition-run`, `$sqlview-run`\*, `$sqlquery-run` | Synchronous, streamed |
| **export** | "Build the files, tell me when done" | `$viewdefinition-export`, `$sqlview-export`\*, `$sqlquery-export` | Async, to file storage |
| **materialize** | "Keep a table fresh for me" | `$materialize`\* | Async, server-managed |

\* `$sqlview-run`, `$sqlview-export`, and `$materialize` are coming to the spec — the working group is standardizing them now.

Note the symmetry: every artifact in the dependency graph — ViewDefinition, SQLView, SQLQuery — gets the same verbs. You can run or export any node of your pipeline, whether it's a raw staging view, an intermediate model, or a final parameterized query. And materialize applies to both ViewDefinitions and SQLViews — any non-parameterized node can become a managed table.

### run — for authoring and real-time

`$viewdefinition-run` — invoked as `$run` on the ViewDefinition resource — is a synchronous operation designed for authoring and small-dataset use cases. The simplest call is one GET on a stored view:

```http
GET /ViewDefinition/patient-demographics/$run?_format=csv&_limit=100
Accept: text/csv
```

```
id,birthDate,family,given
pt-1,1990-01-15,Smith,John
pt-2,1985-03-22,Johnson,Mary
```

When the view isn't saved yet, you POST it inline as a `Parameters` body — optionally together with the resources to transform. That's the authoring loop: edit the definition, POST it, see rows, repeat. It's also the real-time path: a dashboard widget or an AI agent that wants a patient's conditions as a flat table rather than a resource graph. Runtime concerns stay at runtime: `patient`, `group`, `_since`, and `_limit` are operation parameters, not view properties — the same view definition serves the full-population export and the single-patient query.

`$sqlquery-run` is the same verb one layer up: execute a stored or inline SQLQuery against the materialized views, passing query parameters by name (a nested `Parameters` resource, bound safely to the declared `Library.parameter` entries), with `_format` and `_limit` for output control. And `$sqlview-run` (coming to the spec) fills the middle: evaluate any intermediate SQLView — with its whole dependency subgraph resolved by the server — and stream the rows back. Very handy when you're debugging one node of a deep pipeline.

### export — bulk export, upgraded

Think of `$viewdefinition-export` as an upgraded [FHIR Bulk Export](https://hl7.org/fhir/uv/bulkdata/). With classic Bulk Export you get *all* resources, as raw ndjson — and then the flattening is your problem: you stand up an ETL pipeline just to make the data queryable. With SQL on FHIR export you ask for the views you really need, and what lands in your bucket is already flat — csv, ndjson, or parquet, ready to be read by Spark, DuckDB, Athena, or loaded into a warehouse.

The flow is four steps:

1. **Kick off.** POST a list of views with the async header:

```http
POST /ViewDefinition/$viewdefinition-export HTTP/1.1
Prefer: respond-async
Content-Type: application/fhir+json

{
  "resourceType": "Parameters",
  "parameter": [
    {"name": "view", "part": [{"name": "viewReference",
      "valueReference": {"reference": "ViewDefinition/patient-demographics"}}]},
    {"name": "view", "part": [{"name": "viewReference",
      "valueReference": {"reference": "ViewDefinition/diagnoses"}}]},
    {"name": "_format", "valueCode": "parquet"}
  ]
}
```

2. **Get a ticket.** The server replies `202 Accepted` with a `Content-Location` header — your status URL.

3. **Poll.** The status URL returns `202` while the job runs (with optional progress). When the job completes, it answers `303 See Other` with a `Location` header pointing to the result.

4. **Collect the files.** GET the result URL — the response lists an output URL per view:

```json
{
  "resourceType": "Parameters",
  "parameter": [
    {"name": "exportId", "valueString": "job-42"},
    {"name": "status", "valueCode": "completed"},
    {"name": "output", "part": [
      {"name": "name", "valueString": "patient_demographics"},
      {"name": "location", "valueUri": "https://storage.example.org/exports/patient_demographics.parquet"}
    ]},
    {"name": "output", "part": [
      {"name": "name", "valueString": "diagnoses"},
      {"name": "location", "valueUri": "https://storage.example.org/exports/diagnoses.parquet"}
    ]}
  ]
}
```

If you've implemented Bulk Data Export, you already know this dance — same async choreography, but the payload is analysis-ready tables instead of raw resources. It's less stuff to export, you skip the ETL step entirely, and a server that supports export natively can optimize heavily under the hood.

The same filters apply as for run — `patient`, `group`, `_since` — so a payer can export one member's views just as easily as a whole population. And the verb extends up the graph: `$sqlview-export` (coming to the spec) materializes an intermediate model into files, `$sqlquery-export` does the same for query results too large for a synchronous response. Export the staging layer for your lake, or export the finished mart — your choice of cut point.

### materialize — "hey server, keep it up to date"

`$materialize` is the operation where you tell the server: here is my view definition — build a managed view from it, and keep it up to date as the data changes. It's an asynchronous operation: you give it a target name and an update policy (`manual`, or `scheduled` with a cron expression), the server builds the view in the background, and when the job completes you get a reference to the materialized view you can query from then on:

```http
POST /ViewDefinition/patient-demographics/$materialize HTTP/1.1
Prefer: respond-async
Content-Type: application/fhir+json

{
  "resourceType": "Parameters",
  "parameter": [
    {"name": "targetName", "valueString": "patient_demographics"},
    {"name": "updatePolicy", "valueCode": "scheduled"},
    {"name": "schedule", "valueString": "0 0 * * *"}
  ]
}
```

When the job completes you get back a reference to the materialized view; how it's exposed for querying — schema, naming, access — is up to the implementation, with `targetName` as the requested handle. From that point the server owns freshness (nightly in this example), and any SQL tool just queries the result:

```sql
SELECT * FROM patient_demographics WHERE dob > '1990-01-01';
```

Your BI tool connects to a table and never knows FHIR was involved.

The same verb applies to SQLViews. Materializing an intermediate model is exactly what you do in a warehouse when a view gets hot: the server resolves the dependency subgraph, builds the table, and owns its refresh. Which nodes of your DAG are materialized and which stay virtual becomes a runtime tuning decision — the pipeline definition doesn't change.

This pattern is already proven in production. In Aidbox we implemented [$materialize over PostgreSQL](/articles/introducing-materialize-sql-interface-for-fhir-data?utm_source=blog&utm_medium=article&utm_campaign=analytics&utm_content=interoperable-analytics), plus adapters to ClickHouse, BigQuery, and Databricks: a very efficient initial load — millions of resources in seconds — and then the table is kept up to date in near real time using subscriptions. Same view definition, four different engines — which is rather the point. The operation is now being standardized so that "keep this table fresh" becomes a portable request, not a vendor feature.

Together: **run** for development and real-time access, **export** for feeding external engines, **materialize** for in-database analytics — all standard endpoints, all vendor-neutral.

Where this is heading: from the user's perspective the server becomes a magic box. You send it view definitions and queries; the views stay up to date; you just run your reports. And since LLMs are already quite good at authoring SQL and view definitions, the next interface on top of this box is plain language — with the standard API as what the agent drives underneath.

## FHIR to OMOP: battle-proving the DSL

So that's the full toolkit: a DSL for staging (ViewDefinition), a DSL for transformations (SQLView/SQLQuery), and an API to run it all. The best way to find out whether a toolkit is real is to throw the toughest conversion we know at it — and that's FHIR to OMOP. Frankly, this project already *dictated* parts of the design: we needed views of views to express it, and that need is a big part of why SQLView exists.

OMOP CDM is the OHDSI standard for observational research, and what I like about OMOP is that it's extremely pragmatic: a fixed set of tables, all codes normalized to standard concepts, the whole infrastructure — terminology included — sitting right there in the database. FHIR is the transactional model; OMOP is the analytical one. As FHIR-first systems spread, "FHIR for OLTP, OMOP for OLAP" becomes the default architecture, and the conversion layer between them becomes critical infrastructure.

The OMOP community itself usually builds pipelines in ELT style. So do we — the conversion is a layered DAG of exactly the artifacts described above:

```mermaid
flowchart LR
    A[FHIR resources] -->|"ViewDefinitions<br/>(staging layer)"| B[condition_staging<br/>patient_staging<br/>observation_staging]
    B -->|"SQLViews<br/>(mapping layer<br/>+ OMOP vocabularies)"| C[condition_mapped<br/>person_mapped<br/>domain_routed]
    C -->|"SQLQueries<br/>(load layer)"| D[OMOP CDM<br/>condition_occurrence<br/>person, measurement]
    D --> E[OHDSI tools<br/>Atlas, HADES]
```

The basics, layer by layer.

**Staging ViewDefinitions** flatten each resource into exactly what OMOP needs — keys, dates, and source codings, one row per coding:

```json
{
  "resourceType": "ViewDefinition",
  "name": "condition_staging",
  "resource": "Condition",
  "select": [
    {
      "column": [
        {"name": "condition_id", "path": "getResourceKey()", "type": "string"},
        {"name": "person_id", "path": "subject.getReferenceKey(Patient)", "type": "string"},
        {"name": "start_date", "path": "onset.ofType(dateTime)", "type": "dateTime"}
      ]
    },
    {
      "forEach": "code.coding",
      "column": [
        {"name": "source_system", "path": "system", "type": "uri"},
        {"name": "source_code", "path": "code", "type": "code"}
      ]
    }
  ]
}
```

**Mapping SQLViews** hold the semantics: the OMOP Athena vocabularies are loaded right into the database, and **concept-id resolution is a JOIN, not a terminology-server call**:

```sql
-- SQLView: ConditionMappedView
-- depends on: .../ViewDefinition/condition_staging as cs
SELECT cs.condition_id,
       cs.person_id,
       std.concept_id_2   AS condition_concept_id,
       cs.start_date,
       src.concept_id     AS condition_source_concept_id,
       cs.source_code     AS condition_source_value,
       std_c.domain_id    AS target_domain
  FROM cs
  JOIN concept src
    ON src.concept_code = cs.source_code AND src.vocabulary_id = 'ICD10CM'
  JOIN concept_relationship std
    ON std.concept_id_1 = src.concept_id AND std.relationship_id = 'Maps to'
  JOIN concept std_c
    ON std_c.concept_id = std.concept_id_2
```

It looks tricky, but it's kind of trivial: it does the mapping, it does the joining, it translates on the fly. And the joins naturally handle the genuinely hard parts — `Maps to` fan-out (one ICD code becoming several SNOMED rows), domain routing (a FHIR Condition landing in `condition_occurrence`, `observation`, or `measurement` depending on the target concept's domain), and discard rules for unmappable records.

**Load SQLQueries** read the mapped views and populate the CDM tables:

```sql
INSERT INTO condition_occurrence
SELECT condition_id, person_id, condition_concept_id,
       start_date, condition_source_concept_id, condition_source_value
  FROM cm
 WHERE target_domain = 'Condition'
```

Why SQL and not FHIRPath or the FHIR Mapping Language? Because for this job you'd need lookups into mapping tables, splitting one record into many, conditional logic across vocabularies. You could in principle route every code through a terminology server's `$translate` — but that's not viable for bulk transformation; people will burn CPUs doing it row by row. And this has to be efficient at population scale — billions of records, not thousands. For us this is just SQL, and set-based joins over billions of rows is the one problem databases have spent fifty years getting good at.

A full disclaimer: this project — [fhir2omop](https://github.com/lampadephoros/fhir2omop) — is at a very early stage, a work in progress in the open. The ideas we're exploring: profile-gated conversion (a resource converts to an OMOP table *iff* it validates against a gating FHIR profile), golden test cases (a FHIR resource in, the exact OMOP rows out — because corner cases are exactly what samples make visible), and jurisdiction modules like *US Core to OMOP* or *German ICD to OMOP* that the community can extend.

But the goal is bigger than one converter. FHIR-to-OMOP is how we **battle-prove SQL on FHIR**: it's the toughest conversion around, and if the DSL can express it — the staging, the vocabulary joins, the domain routing, all of it as portable artifacts — then everything simpler comes for free. So we're inviting everybody: OMOP people, FHIR people, data engineers. Bring your mappings, your corner cases, your skepticism — the working group is dedicating sessions to OMOP, and the room is open.

## Come build it with us

Join the [#analytics-on-FHIR stream](https://chat.fhir.org/#narrow/stream/179219-analytics-on-FHIR) on chat.fhir.org and the weekly working group calls. Try the [playground](https://sql-on-fhir.org/extra/playground.html) for a five-minute taste, or the [full DevDays workshop](/articles/lets-build-sql-on-fhir-video-nikolai-ryzhikov-at-fhir-devdays-2025?utm_source=blog&utm_medium=article&utm_campaign=analytics&utm_content=interoperable-analytics) — PostgreSQL, Grafana, Jupyter, Synthea data — for a hands-on start. And mark your calendar for [FHIR Analytics](/events/fhir-analytics-2025?utm_source=blog&utm_medium=article&utm_campaign=analytics&utm_content=interoperable-analytics) — our free online conference dedicated to exactly this space.

And if you want to run all of this today: [Aidbox](/fhir-server?utm_source=blog&utm_medium=article&utm_campaign=analytics&utm_content=interoperable-analytics) is a transactional FHIR server with built-in real-time analytics on SQL on FHIR — and the first FHIR server to pass all SQL on FHIR tests. ViewDefinitions over PostgreSQL, a visual ViewDefinition builder and SQL view/query managers in the UI, `$materialize`, and adapters that keep your tables fresh in ClickHouse, BigQuery, and Databricks. The transactional and the analytical world, finally on one bridge.
