|
12 min read

@atomic-ehr/codegen: US Core Profiles in Python

Summarize this article with:
ChatGPTPerplexityClaudeGrok

Building US Core resources by hand is tedious. You stamp meta.profile, look up LOINC codes, hand-roll the us-core-race nested extension — every field is a typo waiting to happen, every profile is its own version of the same ceremony.

@atomic-ehr/codegen makes that boilerplate disappear. Point it at the US Core IG and you get one Pydantic model per base type plus a plain-Python wrapper class per profile, with typed accessors for fixed values, extensions, and slices, and a validate() that knows what the profile requires.

This tutorial walks through that end-to-end on two US Core profiles: US Core Patient and US Core Blood Pressure.

What You'll Build

A CSV-to-FHIR converter, built step by step:

  1. generate profile classes for US Core Patient and US Core Blood Pressure from hl7.fhir.us.core@8.0.1,
  2. turn each row into a US Core Patient — typed extension setters and apply(),
  3. turn each row into a US Core Blood Pressure — typed slices, fixed LOINC, and validate(),
  4. package them as a Bundle,
  5. read the bundle back with typed getters to compute an average BP,
  6. post the bundle to a local Aidbox server via fhirpy client.

Prerequisites

  • Node.js 20+ (or Bun) — the generator itself is the @atomic-ehr/codegen Node package. You run it once to emit Python; after that you don't need Node again. The generation script is a few lines of TypeScript (shown below).
  • Python 3.12+ — the generated code targets modern Python (PEP 604 X | None unions, generic models via typing_extensions).
  • Pydantic v2 (pydantic>=2.11) and fhirpy — generated models are Pydantic v2; with the default fhirpy client they also drop into fhirpy's async client (Step 6). Both are pinned in the generated requirements.txt (see Step 1); pass client: "none" for plain Pydantic with no client code.
  • Basic familiarity with FHIR and US Core (knowing what "profile" and "slice" mean is enough).

Step 1 — Generate Profile Classes

Code generation runs through the Node tool, so set up a small generator project alongside your Python app:

mkdir py-us-core-tutorial && cd py-us-core-tutorial
npm init -y
npm install --save-dev @atomic-ehr/codegen tsx typescript

Create generate.ts:

import { APIBuilder, mkCodegenLogger, prettyReport } from "@atomic-ehr/codegen";

const main = async () => {
  const logger = mkCodegenLogger({
    suppressTags: ["#fieldTypeNotFound", "#duplicateSchema", "#duplicateCanonical", "#largeValueSet"],
  });

  const builder = new APIBuilder({ logger })
    .fromPackage("hl7.fhir.us.core", "8.0.1")
    .typeSchema({
      treeShake: {
        "hl7.fhir.us.core": {
          "http://hl7.org/fhir/us/core/StructureDefinition/us-core-patient": {},
          "http://hl7.org/fhir/us/core/StructureDefinition/us-core-blood-pressure": {},
        },
        "hl7.fhir.r4.core": {
          "http://hl7.org/fhir/StructureDefinition/Bundle": {},
        },
      },
    })
    .python({
      generateProfile: true,
      allowExtraFields: false,
      primitiveTypeExtension: true,
    })
    .outputTo("./fhir_types")
    .cleanOutput(true);

  const report = await builder.generate();
  console.log(prettyReport(report));
  if (!report.success) process.exit(1);
};

main();

The knobs that matter here:

  • generateProfile: true — emit a wrapper class per profile with typed accessors for extensions, slices, and fixed values. Without it you get only the base R4 Pydantic models.
  • allowExtraFields: false — generated models use Pydantic's extra="forbid", so an unknown field raises at parse time instead of being silently dropped.
  • primitiveTypeExtension: true — also generate the FHIR primitive-extension siblings (the _field companions, e.g. birthDateExtension) so you can attach extensions and ids to primitive values.
  • treeShake: { ... } — only the listed canonicals and their transitive deps are generated (~20 files instead of hundreds).

Run it. prettyReport(report) prints a grouped summary so you see what got emitted without crawling the output dir:

$ npx tsx generate.ts
# generation logs omitted; this is the prettyReport summary
Generated files (24 files, 12 kloc):
  python (23 files, 3.6 kloc):
    - fhir_types/ (4 files, 622 loc)
    - fhir_types/hl7_fhir_r4_core/ (8 files, 1.1 kloc)
    - fhir_types/hl7_fhir_r4_core/profiles/ (2 files, 164 loc)
    - fhir_types/hl7_fhir_us_core/profiles/ (9 files, 1.8 kloc)
  ir-report (1 files, 8.2 kloc):
    - fhir_types/README.md (8223 loc)
Duration: 8097ms
Status: 🟩 Success

The on-disk layout looks like this:

fhir_types/
├── hl7_fhir_r4_core/                  # Base R4 Pydantic models
│   ├── base.py                        # Element, Coding, CodeableConcept, Quantity, ...
│   ├── resource.py                    # Resource, DomainResource, Meta
│   ├── patient.py
│   ├── observation.py
│   ├── bundle.py
│   ├── profiles/                      # base R4 profiles US Core builds on
│   │   └── observation_observation_vitalsigns.py   # vital-signs base (BP derives from it)
│   └── ...
├── hl7_fhir_us_core/
│   └── profiles/
│       ├── __init__.py                # re-exports the profile classes
│       ├── patient_uscore_patient_profile.py
│       ├── observation_uscore_blood_pressure_profile.py
│       ├── extension_uscore_race_extension.py
│       └── ...
├── fhirpy_base_model.py               # fhirpy client base model (default fhirpy client)
├── profile_helpers.py                 # Runtime helpers shared by all profile classes
├── README.md                          # IR report — human-readable dump of the generated types
└── requirements.txt                   # pydantic, fhirpy (+ pytest, requests for tests/Step 6)
Setup Python virtual environment
python3.14 -m venv venv
source venv/bin/activate

Point your Python app at the emitted fhir_types/ and install the dependencies:

pip install -r fhir_types/requirements.txt

The generated requirements.txt pins Pydantic and fhirpy plus pytest and requests for the tests and examples.

The full tutorial code lives in Aidbox/examples — generate.ts, load.py, avg.py, post.py, the CSV, and the committed fhir_types/ so you can browse the generated code without running the generator. For broader profile-API exploration, the codegen repo also has a python-r4-us-core test example. Both use the default camelCase attribute names, just like the snippets here. (Pass fieldFormat: "snake_case" if you'd rather spell attributes birth_date, effective_date_time; serialization always emits FHIR-correct camelCase JSON either way.)

Step 2 — Row to a US Core Patient

The input is patients.csv — basic demographics plus one BP reading per patient. Race uses the OMB-category codes US Core expects:

mrn,family,given,birthDate,gender,raceCode,raceDisplay,effectiveDateTime,systolic,diastolic
MRN-001,Lovelace,Ada,1815-12-10,female,2106-3,White,2026-04-15,120,80
MRN-002,Turing,Alan,1912-06-23,male,2106-3,White,2026-04-15,118,76
MRN-003,Curie,Marie,1867-11-07,female,2106-3,White,2026-04-16,125,82
MRN-004,Carver,George,1864-01-01,male,2054-5,Black or African American,2026-04-16,135,88
MRN-005,Ochoa,Ellen,1958-05-10,female,2054-5,Black or African American,2026-04-17,128,84

csv.DictReader hands each row over as a plain dict[str, str]; numeric parsing happens later, where we pass values to typed profile setters.

The US Core Patient profile adds a few extensions and makes identifier and name required. The generated class has a typed setter for each:

from fhir_types.hl7_fhir_r4_core.base import Identifier, HumanName, Coding
from fhir_types.hl7_fhir_r4_core.patient import Patient
from fhir_types.hl7_fhir_us_core.profiles import UscorePatientProfile

def row_to_patient(row: dict[str, str]) -> UscorePatientProfile:
    base_patient = Patient(
        resourceType="Patient",
        identifier=[Identifier(system="http://hospital.example.org/mrn", value=row["mrn"])],
        name=[HumanName(family=row["family"], given=[row["given"]])],
        gender=row["gender"],          # gender is a Literal type — Pydantic validates the value
        birthDate=row["birthDate"],    # default camelCase attrs match the FHIR wire names
    )

    patient = UscorePatientProfile.apply(base_patient)

    patient.set_race({
        "ombCategory": {"system": "urn:oid:2.16.840.1.113883.6.238", "code": row["raceCode"], "display": row["raceDisplay"]},
        "text": row["raceDisplay"],
    })

    return patient

Two phases:

  1. Build the plain Patient — profile-required (identifier, name) and must-support (gender, birthDate) fields as a typed R4 Pydantic model. Construct with the default camelCase attribute names (birthDate, the FHIR wire names); values are validated immediately (e.g. gender is a Literal["male", "female", "other", "unknown"]).
  2. Then UscorePatientProfile.apply(base_patient) stamps meta.profile and returns a profile instance with typed accessors for the US Core extensions. apply() wraps the resource in place — the profile mutates the same Patient object.

Three notes on what the profile API does for you:

  • Three extension setter forms. set_race({ "ombCategory": ..., "text": ... }) takes flat input — note the sub-extension keys (ombCategory, detailed, text) are the camelCase slice names — and generates the nested extension[] plumbing. The same setter also accepts a typed extension-profile instance (UscoreRaceExtension) or a raw Extension, and raises if a raw extension's url doesn't match.
  • Single-value extensions take the value directly. us-core-individual-sex carries one valueCoding, so set_sex(Coding(code="female")) takes a Coding (or a raw Extension).
  • No setters for must-support base fields. gender, birthDate, and address aren't profiled further by US Core, so the profile class emits no .set_gender()-style wrappers — populate them as normal Patient fields. validate() still warns if a must-support field is missing.

Pydantic emits a UserWarning when an extension[] list holds plain dicts rather than Extension instances — expected with the current flat-dict plumbing. Silence it with warnings.filterwarnings("ignore", category=UserWarning, module="pydantic").

Step 3 — Row to a US Core Blood Pressure

The BP profile is where codegen really earns its keep. The US Core Blood Pressure profile:

  • fixes code to LOINC 85354-9 ("Blood pressure panel"),
  • fixes a vital-signs category slice,
  • defines component[systolic] and component[diastolic] slices with specific LOINC discriminators (8480-6 and 8462-4),
  • requires an effectiveDateTime or effectivePeriod,
  • requires valueQuantity inside each slice.

Hand-rolling that per row is the kind of thing codegen eliminates. The generated class collapses it to three setters:

from fhir_types.hl7_fhir_r4_core.base import Reference
from fhir_types.hl7_fhir_us_core.profiles import UscoreBloodPressureProfile

def row_to_bp(row: dict[str, str], patient_urn: str) -> UscoreBloodPressureProfile:
    bp = UscoreBloodPressureProfile.create(
        status="final",
        subject=Reference(reference=patient_urn),
    )

    (
        bp.set_effective_date_time(row["effectiveDateTime"])
          .set_systolic({"value": float(row["systolic"]), "unit": "mmHg", "system": "http://unitsofmeasure.org", "code": "mm[Hg]"})
          .set_diastolic({"value": float(row["diastolic"]), "unit": "mmHg", "system": "http://unitsofmeasure.org", "code": "mm[Hg]"})
    )

    errors = bp.validate()["errors"]
    if errors:
        raise ValueError(f"{row['mrn']}: {'; '.join(errors)}")

    return bp

What happens behind the scenes:

  • create() does the ceremony. It stamps meta.profile, fills the fixed code (LOINC 85354-9), appends the vital-signs category slice, and adds empty component[systolic] / component[diastolic] stubs with discriminator codes already set. create() takes keyword-only args; create_resource() is the same but returns a plain Observation instead of a profile wrapper.
  • set_systolic({ "value": ..., "unit": ... }) fills the valueQuantity inside the systolic slice. The discriminator code on that component is already there from create() — you only supply the reading.
  • validate() returns {"errors": [...], "warnings": [...]}. Errors block (required fields, excluded fields, disallowed choice variants, slice cardinality). Warnings surface must-support concerns. A malformed row fails fast with the MRN — you don't discover it at POST time.

You didn't type the discriminator codes. You didn't remember 85354-9. The setters chain fluently (each returns the profile), just like the TypeScript API.

Step 4 — Assemble the Bundle

Each row produces a Patient and a BP Observation linked by the Patient's urn:uuid placeholder. Package them as transaction entries. The generated Bundle and BundleEntry are generic over the contained resource, so a Bundle[Patient | Observation] keeps entry[].resource typed to that union. (row_to_patient and row_to_bp are the Step 2–3 functions; in the example they all live in one load.py, so they're already in scope here.)

import json
import csv
import uuid

from fhir_types.hl7_fhir_r4_core.patient import Patient
from fhir_types.hl7_fhir_r4_core.observation import Observation
from fhir_types.hl7_fhir_r4_core.bundle import Bundle, BundleEntry, BundleEntryRequest

def row_to_entries(row: dict[str, str]) -> list[BundleEntry[Patient | Observation]]:
    patient_urn = f"urn:uuid:{uuid.uuid4()}"
    patient = row_to_patient(row)
    bp = row_to_bp(row, patient_urn)

    return [
        BundleEntry(fullUrl=patient_urn, resource=patient.to_resource(),
                    request=BundleEntryRequest(method="POST", url="Patient")),
        BundleEntry(fullUrl=f"urn:uuid:{uuid.uuid4()}", resource=bp.to_resource(),
                    request=BundleEntryRequest(method="POST", url="Observation")),
    ]

rows = list(csv.DictReader(open("patients.csv")))
print(f"Loaded {len(rows)} rows")

entries = [entry for row in rows for entry in row_to_entries(row)]

bundle = Bundle[Patient | Observation](
    resourceType="Bundle",
    type="transaction",
    entry=entries,
)

with open("bundle.json", "w") as f:
    json.dump(bundle.model_dump(by_alias=True, exclude_none=True), f, indent=2)
print(f"Wrote bundle with {len(entries)} entries")
$ python load.py
Loaded 5 rows
Wrote bundle with 10 entries

Worth noticing:

  • to_resource() gives you the plain model — the underlying Pydantic resource, no wrapper, ready to drop into a BundleEntry.
  • model_dump(by_alias=True, exclude_none=True) produces FHIR JSON — by_alias serializes through the FHIR-wire aliases (so a snake_case build still emits effectiveDateTime) and exclude_none drops None-valued fields. The one serialization call you'll use everywhere.
  • urn:uuid references. The patient's fullUrl and the observation's subject.reference share one UUID; the server resolves it to a real id on commit.

Step 5 — Read Back: Average BP from the Bundle

Now read it back. Parse bundle.json and compute the average systolic/diastolic to exercise the read-side API:

import json
from typing import Any

from fhir_types.hl7_fhir_r4_core.observation import Observation
from fhir_types.hl7_fhir_us_core.profiles import UscoreBloodPressureProfile

bundle = json.load(open("bundle.json"))

def is_us_core_bp(resource: dict[str, Any]) -> bool:
    return (
        resource.get("resourceType") == "Observation"
        and UscoreBloodPressureProfile.canonical_url in (resource.get("meta", {}).get("profile") or [])
    )

bps = [
    UscoreBloodPressureProfile.from_resource(Observation.model_validate(entry["resource"]))
    for entry in bundle.get("entry", [])
    if is_us_core_bp(entry["resource"])
]

def avg(xs: list[float]) -> float:
    return sum(xs) / len(xs)

# get_systolic()/get_diastolic() are Optional, so guard with a walrus before indexing.
systolic = [s["value"] for bp in bps if (s := bp.get_systolic()) is not None]
diastolic = [d["value"] for bp in bps if (d := bp.get_diastolic()) is not None]

print(f"Avg BP: {avg(systolic):.1f}/{avg(diastolic):.1f} mmHg (n={len(bps)})")
$ python avg.py
Avg BP: 125.2/82.0 mmHg (n=5)

Three things the profile does here:

  • from_resource(obs) validates as it wraps. It checks that meta.profile includes the canonical URL and returns a profile instance, raising if a resource that claims the profile is malformed — so a broken bundle fails at read time, not on the next field access.
  • No built-in type guard. Unlike the TypeScript API's is() predicate, the Python classes don't ship a .filter()-style guard. You select candidates yourself — check resourceType and canonical_url in meta.profile (above), or wrap from_resource() in try/except ValueError. Either way canonical_url is exposed as a class attribute for exactly this.
  • get_systolic() / get_diastolic() return the flat slice value. No walking component[].code.coding[].code to match LOINC codes — the profile already knows which slice is which, and hands you back the Quantity data as a plain dict.

That's the round-trip: CSV → typed profiles → validated Bundle → typed read-back with profile-aware getters. The same handful of lines would process BPs fetched from a FHIR server, loaded from a file, or received on a Subscription — the typed profile is the common shape, no matter the source.

Step 6 — Land Your Bundle on a FHIR Server

The typed pipeline is only half the story. To actually see the transaction commit — patient IDs assigned, urn:uuid references rewritten, resources stored and searchable — you need a FHIR server. Spin up and run Aidbox:

curl -JO https://aidbox.app/runme && docker compose up -d

Open http://localhost:8080 in your browser to grab a free developer license, then pull the root client secret out of docker-compose.yaml into an env var — reused by the curls and the Python script below:

export BOX_ROOT_CLIENT_SECRET=$(awk '/BOX_ROOT_CLIENT_SECRET:/{print $2}' docker-compose.yaml)

Verify the FHIR endpoint is up:

curl -u "root:$BOX_ROOT_CLIENT_SECRET" http://localhost:8080/fhir/metadata

You should see a JSON CapabilityStatement.

Send the bundle.json you just wrote with fhirpy's async client:

import asyncio
import base64
import json
import os

from fhirpy import AsyncFHIRClient
from fhir_types.hl7_fhir_r4_core import Bundle

secret = os.environ["BOX_ROOT_CLIENT_SECRET"]  # exported above
auth = base64.b64encode(f"root:{secret}".encode()).decode()


async def main() -> None:
    client = AsyncFHIRClient("http://localhost:8080/fhir", authorization=f"Basic {auth}")
    bundle = json.load(open("bundle.json"))
    resp: Bundle = await client.execute("/", method="post", data=bundle)
    if resp.entry is None:
        return

    for entry in resp.entry:
        if entry.response is None:
            continue
        print(entry.response.status, entry.response.location)


asyncio.run(main())

Aidbox returns a transaction-response bundle — one entry per input, each with a 201 Created and a location pointing at the stored resource:

$ python post.py
201 Created Patient/<id>/_history/1
201 Created Observation/<id>/_history/1
...

Query an observation back and look at its subject:

$ curl -u "root:$BOX_ROOT_CLIENT_SECRET" \
  "http://localhost:8080/fhir/Observation?code=http://loinc.org|85354-9" \
  | jq '.entry[].resource.subject.reference'
"Patient/01J..."
"Patient/01J..."

No urn:uuid — Aidbox rewrote the placeholders atomically on commit.

Type-Check the Pipeline

The generated models are Pydantic v2, so the converter type-checks with mypy — already pinned in the generated requirements.txt. One requirement: enable the Pydantic mypy plugin (it ships with Pydantic, no extra install), or mypy can't tell that a Field(None, ...) default makes a field optional and floods you with false "missing argument" errors on every model you construct.

Drop a mypy.ini next to your code:

[mypy]
strict = True
plugins = pydantic.mypy

Then run it:

$ mypy .
Success: no issues found in 35 source files

Both your converter modules — load.py, avg.py, post.py — and the generated fhir_types/ package come back clean. The typed factories, to_resource(), and the Bundle[Patient | Observation] generic all check out, so a wrong field type or a missing required argument is caught before you ever reach the server. The generated profile layer type-checks under full --strict too — the whole point of mypy.ini being just those two lines: no disable_error_code, no strict_optional = False, nothing hand-edited inside fhir_types/. The only thing your own code supplies is an ordinary None-guard when you read an optional field — the walrus in avg.py above, or if entries is None: ... before indexing a list. That's plain strict-mode Python, not a generator wart.

Where To Go Next

  • More of the profile API. Other factories, getters, and slice/extension forms are exercised in the codegen example tests: test_profile_patient.py, test_profile_bp.py, test_profile_bodyweight.py, and test_profile_typed_bundle.py.
  • Typed bundles with named entry slices. A profiled Bundle generates per-slice setters/getters (set_patient_entry, get_organization_entry) with single vs. unbounded (max: *) cardinality handled for you — see the typed-bundle test above.
  • Tune the output for your codebase. fieldFormat (snake_case/camelCase), client ("fhirpy"/"none"), allowExtraFields, and primitiveTypeExtension are all toggles on .python({ ... }).
  • Mix profiles from multiple packages. APIBuilder.fromPackage() chains — US Core alongside your custom IG or a regional base. localStructureDefinitions() pulls in profiles straight from a folder of StructureDefinition JSON.

Wrap Up

The generator emits both the base R4 Pydantic models and a thin profile-class layer on top — no runtime DSL, no ORM, no framework. to_resource() always gives you a plain Pydantic resource, and model_dump(by_alias=True, exclude_none=True) always gives you plain FHIR JSON you can send to any server.

@atomic-ehr/codegen is MIT-licensed; issues and PRs welcome.

GitHub | NPM | US Core IG

Share this article
Comments
Comments
Sign in
Loading comments...
Subscribe to our blog

Get the latest articles on FHIR, interoperability, and healthcare IT.