|
4 min read

Atom feed syndication in Termbox

Summarize this blog post with:

Terminology content in healthcare ecosystems is inherently fragmented and heterogeneous. For a hospital network, a national health authority, or a regional HIE, ensuring that all participants work from the same consistent terminology is a significant operational challenge.

The content is diverse:

  • Maintained by different organizations, with different legal requirements.
  • With different release and update lifecycles as well as versioning strategies.
  • Formatted and distributed in different flavors: Implementation Guides (IGs), standalone FHIR resources, proprietary binary formats, disparate distribution formats (RF2, RRF, CSVs, etc), FHIR npm packages, etc.

The content clients have some general needs:

  • Staying up-to-date: Receive newly published content as quickly as possible.
  • Minimal dependency: Select only the terminologies and versions they depend on.
  • Accuracy and completeness: Ensuring that the terminology data is complete and correct according to the interpretation defined in the guides and standards.
  • Reproducibility: Consistently and predictably bring the terminology server to a specific state.
  • Operational agility: ensuring that the state change occurs in the shortest possible time and with the most optimal use of resources.

FHIR Community Effort

To address these needs, terminology servers have adopted a content syndication mechanism based on The Atom Syndication Format RFC-4287. Although this mechanism has been present in the healthcare ecosystem for some time, no formal effort to standardize it within FHIR existed until recently. This has led to private implementations and the introduction of proprietary features by server vendors. In May 2026, a formal standardization effort was initiated, covering not just terminology, but FHIR content distribution in general.

This topic is still under development. The Termbox team aims to actively participate in defining this standard and enrich the discussion with our customers' use cases and the challenges we have faced while implementing support for atom feed syndication.

Content load in Termbox

Loading terminology content in Termbox is declarative, based on a definition file data.yaml. This file is meant to be versioned, and represents the state of the terminology server. This mechanism simplifies operations management, ensures platform reproducibility, and provides a declarative way to specify the terminologies a Termbox instance depends on. Termbox aims to be easily integrated into pipelines, allowing versioning of a Termbox instance in Git. Multiple types of terminology sources are supported: NPM packages, curated pre-indexed terminologies, standalone FHIR resources, and Atom Feed Syndication.

Termbox Support for Atom Feed

Atom Feed Syndication was pioneered by Ontoserver, and Termbox is among the first terminology servers to support it, bringing its own additions on top of the base spec.

One of these features is Filtering. Syndication feeds can contain hundreds of entries spanning dozens of terminologies and versions. Termbox allows for entry selection, both whitelist and blacklist, and wildcard version matching as defined in the FHIR spec, in a declarative way, see example below.

Termbox also implements a sync mode. Terminology server instances change as clients install packages, create resources, remove content, and so on. Parent terminology servers retire content as they go, removing packages, versions, etc. Termbox allows specifying a sync: true option in the data.yaml, which instructs Termbox to fully synchronize with the syndication server, keeping only the declared terminologies. This feature saves disk space and ensures the terminology server's consistency.

Example: Including only SNOMED UK edition, and ICD-10-UK version 4.0

sync: true
sources:
  - type: atom
    feed: https://ontology.nhs.uk/production1/synd/syndication.xml
    auth:
      type: client_credentials
      client_id_env: NHS_CLIENT_ID
      client_secret_env: NHS_SECRET
      token_url: https://ontology.nhs.uk/authorisation/auth/realms/nhs-digital-terminology/protocol/openid-connect/token
    include:
      - url: http://snomed.info/sct
        version: http://snomed.info/sct/83821000000107
      - url: http://hl7.org/fhir/sid/icd-10-uk
        version: '4.0'

Streaming ingestion at scale

There are some challenges to overcome when integrating with a content syndication server. One of them is FHIR resource size. FHIR still doesn't have a streaming friendly distribution format 1, hence large JSONs are common. Due to the very nature of the JSON format, ingesting a large CodeSystem like dm+d (394MB) involves excessive RAM usage, which scales as more CodeSystems need to be ingested. Termbox implements a streamed FHIR resource ingestion mechanism, optimized for low RAM usage, regardless of the amount of FHIR resources to be ingested or the size of those resources. Termbox also optimizes the time it takes to load large terminologies, e.g. SNOMED, LOINC, RxNorm.

To explore what else Termbox offers, the getting started guide is a quick entry point, and content loading is documented in depth at Loading Data.

Footnotes

  1. The FHIR community has started discussing proposals for streaming friendly distribution formats. We'll explore this topic in future articles. ↩

Comments
Comments
Sign in
Loading comments...
Subscribe to our blog

Get the latest articles on FHIR, interoperability, and healthcare IT.