🔥 Terminology is fun: CodeableConcept.coding[]

Strip a FHIR resource of its coded elements and what's left is plumbing: ids, references, dates, statuses. What the condition is, what was observed, what was prescribed — the meaning of healthcare data lives in terminology. The structure tells you that a Condition exists and whom it belongs to; the codes tell you what it actually is.

And almost everywhere meaning appears in FHIR, it appears as one data type: CodeableConcept — Condition.code, Observation.code, Procedure.code, MedicationRequest.medication, AllergyIntolerance.code. Learn to read this one type correctly and you've learned to read the semantics of FHIR; read it wrong and the structure stays perfectly valid while the meaning silently slips away.

So — how to read and interpret CodeableConcept correctly? For example if I'm looking at diagnosis in SNOMED CT?

First idea is to use code.coding.where(system='...snomed...').code.first(). But that's not correct. That's what I learned at the HL7 FHIR-I working group meeting from Lloyd McKenzie.

There can absolutely be two or three SNOMED codes, and you have to support that.

To understand why — and what to do instead — it helps to step back and look at coded data as a conversation between two parties who never meet: the one who writes the code and the one who reads it.

What it means to code data

To code is to replace meaning with a pointer into a shared dictionary: a concept from some code system. A code without a system is meaningless — the pair (system, code) is the unit of meaning. The goal is that two systems that know nothing about each other understand the data the same way.

Three things complicate the picture immediately:

Concepts have granularity. They live in hierarchies: "diabetes" ⊃ "type 2 diabetes" ⊃ "type 2 diabetes, well-controlled". Choosing a level is a decision about how much information you capture — and how much you lose.

Code systems are built on different principles. ICD is a classification: its goal is to sort every case into non-overlapping buckets with no remainder (for statistics and billing) — hence the "other" and "unspecified" bins. One case → exactly one bucket. SNOMED CT is an ontology: its goal is to describe clinical reality — multi-parent hierarchies, formal definitions, composable expressions. One case → as precise a description as you need. They answer different questions: ICD — "which bucket for the report", SNOMED — "what is this". Mapping between them is inherently lossy and many-to-many.

Sometimes the concept doesn't exist — but you can build it. That's post-coordination: the code system's grammar lets you compose a concept from existing ones. SNOMED CT expression 81745001:272741003=7771000 — "eye : laterality = left". The expression goes, whole, into a single Coding.code — the spec is explicit that "any expression defined by the code system is still regarded as a 'code' and represented as such":

{
  "system": "http://snomed.info/sct",
  "code": "128045006:{363698007=56459004}"
}

Note: the result is still one code. It is not two codings "eye" + "left" in an array — we'll see below why that's forbidden.

What CodeableConcept actually is

CodeableConcept carries one meaning and holds its representations in a coding array plus a human-readable text.

{
  "coding": [
    { "system": "http://loinc.org",      "code": "8867-4",    "display": "Heart rate" },
    { "system": "http://snomed.info/sct", "code": "364075005", "display": "Heart Rate" }
  ],
  "text": "Heart rate"
}

The key idea: multiple codings are not multiple concepts. One concept, written in different "languages" for different recipients. An ICD-10 code for the payer, a SNOMED CT code for the clinical system, a local code for debugging — all in one field.

Why would one field need all that? Because of who's on each end.

The gap: one writer, many readers

Here is the central asymmetry of coded data exchange: the producer writes once — the consumers are many, and they want different things. The regulator wants coarse buckets for a report, the researcher wants maximal detail, billing wants ICD, clinical decision support wants SNOMED. Lloyd McKenzie, at the FHIR-I session:

There can absolutely be two or three SNOMED codes, and you have to support that. Because you're gonna have situations where somebody has an implementation guide that says I'm capturing this with just high-level codes... I need you to send me that code so that I can make that government report. And then you've got another use case, which is I want to know exactly what type of cancer it was, I want to know what stage it is, I want to know whether it's a recurrence or not. And you've got two different systems that are both wanting slightly different information, and the only way to do that is to send both codes.

And the producer can't pick one "for everyone":

When you create an instance, you're creating it for all receivers — we spit out the same thing to everyone. So such a prohibition is not reasonable.

This asymmetry explains every "strange" property of CodeableConcept:

There is no "main" code — because "main" is a property of the pair (data, consumer), not of the data. The producer has no right to decide for all readers. I asked Lloyd directly whether there's a flag for the most precise code:

We have a core element that lets you say this is the code that the user selected. But in the situation where you're doing postcoding, there is no mechanism to indicate which one is the most specific. And there may be circumstances where they're equally specific, they're just telling you different things.

A single "most specific" code often doesn't even exist:

Both of them have more detail than the base code, but you can't mark one of them as this is the most detailed, and there is no code in SNOMED for sitting digital-artery right systolic blood pressure. That does not exist.

Order means nothing — the CI build of the spec states it normatively: "Ordering of codings is undefined and SHALL NOT be used to infer meaning." CodeableConcept.coding has no orderMeaning (Zulip thread). So .first() doesn't grab the main code — it grabs a random one.

Extra codes are not synonyms. Elliot Silver, at the same meeting, set up this exact trap:

let's say we've got your one codeable concept that's coded with lung cancer, cancer, and diseases of the lung. My system does not recognize lung cancer. It does recognize diseases of the lung and it does recognize cancers... I'm gonna see a match for diseases of the lung in there, and that's the one I'm gonna keep. And I'm gonna assume anything else in there is a synonym...

Lloyd's correction is the whole point:

It is encoding one concept, but it's not necessarily expressing the same information with each one... one code is conveying an aspect or a set of information that the other one is not, and vice versa. That's what happens with code systems: they have different levels of preciseness and different capabilities of expressiveness.

So when Elliot's system silently picks disease of the lung and discards lung cancer as a "synonym", "disease of the lung" ends up in the analytics database instead of "lung cancer", and nobody notices.

The gap extends through time. Today's consumer doesn't know tomorrow's codes in the same field. Lloyd:

receivers that might have only had to deal with one or two codes initially may be faced with 7 or 8 codes eventually. As a consumer, you can't presume that what you see today is going to be the same in the future.

Why did FHIR choose this model? Grahame Grieve:

the expectation/desire in FHIR is for the information to be robust in multiple environments absent of a specific context with a custom agreement, because that's how people actually use the information anyway. That has a price, and here we are.

The price lands asymmetrically: the producer pays cheap (put everything in the array), the consumer pays dear (a selection policy, a terminology server, mappings). CodeableConcept is, in essence, a protocol for communicating across the gap: the producer maximizes the chance of being understood; the consumer is obliged to know how to choose.

So let's write down the rules for each side.

The one hard rule that binds the array

Before the rules — the single constraint the spec puts on multiple codings. It was hammered out in the Multiple codes in CodeableConcept thread (#terminology, 113 messages) and at the FHIR-I meeting, producing two JIRA tickets — FHIR-50188 and FHIR-50189 — and a rewritten chunk of the spec.

The published spec (up to R5) said codings "may have slightly different granularity", and that word confused implementers for years. Rutt Lindström:

Most discussions I've been in (usually about Medication.code) have worn itself out over the word "slightly". Is the following a slight difference or a bloody big difference?

Grahame Grieve replied that the rule isn't about size of difference:

legal is any amount of granularity difference as long as the intersection is not empty; good practice is not to push the envelope here, and vary granularity as little as possible.

"slightly" is gone: the CI build now reads "When multiple codes are present, the intersection of their meanings SHALL NOT be empty. (Strictly, based on ISO 704, the intersection of their 'extensions' is not empty.)"

Valid: LOINC 8867-4 (Heart rate) + SNOMED CT 364075005 (Heart Rate).

Invalid: LOINC 9279-1 (Respiratory Rate) + SNOMED CT 364075005 (Heart Rate). Michael Lawley:

there's no thing that is both a respiratory rate and a heart rate — strictly, bottom (⊥) is the concept that is the intersection of every concept, but by definition has no instances.

Note: the rule isn't machine-checkable — it's guidance, not a validator. And there was no formal vote at the FHIR-I meeting — quorum dropped at the end. Substantively agreed; the rewritten text from FHIR-50188 has already landed in the CI build, while published releases up to R5 still carry the old wording.

Rules for the producer

The producer's asymmetry: you are the last point where the knowledge exists. The clinician saw the patient; the device made the measurement. The moment of coding is the moment of contact with reality — after you write the resource, the window closes. Everything you don't capture now is gone, because:

Generalizing later is computable; specializing is not. Any consumer with a terminology server can roll "type 2 diabetes, well-controlled" up to "diabetes" ($subsumes, the hierarchy). The reverse direction is impossible in principle: the information isn't in the data.
Cross-system mapping later is a lottery. ConceptMaps between systems are many-to-many with "wider / narrower / inexact" matches. The downstream mapper sees only the code; you saw the case. You resolve the ambiguity with knowledge; $translate resolves it with a guess.

Hence the core rule:

P0. Capture everything only you know, at coding time:

The most precise code you can justify. Nobody downstream can ever add precision back.
Codes in alternative systems you know (ICD bucket and SNOMED concept). Your choice of bucket is knowledge; a mapping computed later is a guess.
Always text — the full meaning in words. It's the channel of last resort that 100% of receivers (including humans) understand, and the only place to preserve detail that didn't fit any code: "text": "Negative for Chlamydia Trachomatis rRNA" next to a coding that just says "Negative".
Mark provenance: userSelected = true on the code a human actually chose. In an array where all codes are equal, there is a hidden asymmetry of origin — one code was picked by a person looking at the patient, the rest were derived. This is undecidable from the data afterwards. (But remember: userSelected marks source, not granularity.)
Declare the rules you played by: meta.profile. Three codes in an array — is that "dumped what we had" or "US Core required binding + legacy code + regulator translation"? Structurally indistinguishable. The profile claim tells the reader which contract you were executing — and lets them validate it.

And three safeguards against distorting the meaning while packing it:

P1. Each coding expresses the whole concept on its own. Not parts, not modifiers. Lloyd, on the "eye" + "left" trick (thread):

You don't interpret them by combining them together. Because these codes are SNOMED, you can combine the concepts via post-coordination: 81745001:272741003=7771000 — "eye:laterality=left". There is no 'modifier' element in Coding.

Why this is rule number one: the consumer is guaranteed to take one code from the array — that's their algorithm. If your codes are parts, misinterpretation isn't a risk, it's a certainty.

And if the code system has no post-coordination grammar — say, a LOINC code that needs a SNOMED qualifier — you can't simulate it with two codings either, and you can't invent a combined code. Lloyd, answering exactly this question in the Handling post-coordination in CodeableConcept thread:

If you want to send a post-coordinated expression, the Coding.system must be a code system that defines the post-coordination syntax and how to interpret the expression... In your case, there's no published syntax by either LOINC or SNOMED for what you're doing, so you couldn't call the result a LOINC or SNOMED code... Sending a LOINC code with an extension (or using the 'method' or other elements on Observation) to convey your qualification is likely to be better understood.

The legal escape hatches live outside coding[]: an extension for the qualifier, dedicated resource elements (Observation.method, bodySite), or a separate linked resource (Observation.hasMember). The grammar belongs to the code system, not to you.

P2. Codes must not contradict each other — the non-empty-intersection rule above. It's what makes P1 possible at all: "any code is true" only works if all codes are about the same thing.

P3. Stay within the element's scope, and watch the granularity direction — especially under negation. Don't drop a hierarchy root like SNOMED 138875005 (SNOMED CT Concept) into Observation.code (that's FHIR-50189). And the new spec text carries a warning the old one didn't — mixed granularity is not just inconvenient, it can be unsafe:

a prescription for "APO Amoxicillin 250 mg capsule" (needed to fill the order) could safely also have an additional coding of "Broad-spectrum Penicillin" (perhaps needed to trigger antibiotic monitoring protocols). On the other hand, if there were an order that said "Do not administer APO Amoxicillin 250 mg capsule", adding a translation of "broad-spectrum Penicillin" could be incorrectly interpreted as a prohibition against all broad-spectrum penicillins.

Same two codes, same non-empty intersection — but flip the statement to a negation and the broader coding silently widens the prohibition. When the statement is negative, err narrow.

Rules for the consumer

The consumer's asymmetry mirrors the producer's: the producer knew the reality but not the application; you know the application — which analysis, which action, what an error costs. That knowledge isn't in the data, and only you can apply it.

C0. Interpretation is your job, and it runs on an explicit policy chosen in advance. Lloyd's algorithm, from the same FHIR-I exchange:

The first thing that you do is you discard all the ones you don't understand. Ideally you want everything, even if you don't know what it is, because somebody downstream might understand it. [If] I cannot interpolate the meaning from multiple concepts, then I have to pick which of these things is most preferred for the type of analysis that I'm doing. And you would have made that decision in advance in terms of which of these terminologies is my most preferred terminology.

The key phrase is in advance. The preference policy (systems → value sets → granularity → userSelected → text) lives in your mapping config — versioned and reviewed — not in a FHIRPath expression at read time. .first() is a policy nobody approved.

Classify your consumption first

Before picking codes, ask what operation you're actually performing on the array. There are only a few consumption classes, and each has its own semantics and its own risk:

Class	Codes needed	Lossy?	Main risk
Membership test — "does it fall under concept X?" (CDS trigger, cohort inclusion)	any one (∃)	no	negation
Projection — "give me exactly one code in vocabulary Y" (ETL, FHIR-to-OMOP, billing)	exactly one	yes	choosing without a policy
Aggregation — "which bucket?" (reporting, statistics)	exactly one bucket	yes	double counting
Transit — store & forward, gateway	all, including not-understood	no	"cleaning up" the array
Display — show a human	none (`text`)	no	empty text
Inference — subsumption reasoning, merging details	all + tx server	no	cost

Two observations fall out of this table.

First, membership is the safest class — no code selection, no loss: an OR over the array, where extra codings only help (more representations → higher chance of hitting your value set). All the pains of this article — "which code is main, which is most precise" — are pains of the projection class. Half the cure is realizing your task is actually membership, and you can stop choosing a code where checking containment suffices.

Second, .first() is projection performed without a policy — the worst class executed the worst way.

Membership means subsumption, not equality

And here our own P0 comes back around: we told the producer to code as precisely as possible — so a consumer testing equality will find almost nothing. You search for "diabetes" (73211009), but the data says "type 2 diabetes, well-controlled", because the producer honestly followed P0. Equality → false → the patient drops out of the cohort. The better the data, the worse the naive consumer performs.

So the membership test must be:

∃ coding : subsumes(X, coding)    — "the code is X or a descendant of X"

Mechanics, take your pick: $subsumes for a point check; a ValueSet with an is-a filter or SNOMED ECL << 73211009 plus $expand / $validate-code for a materialized set; in OMOP, concept_ancestor — the same transitive closure precomputed as a table.

Where it breaks:

Hierarchy lives inside a system. There is no direct subsumes(SNOMED-X, ICD-code) — first map, then climb, and the losses of both steps multiply. One more argument for P0: a producer who included a SNOMED code gave you an entry into a good hierarchy.
Hierarchies are not equal. SNOMED's is-a is a formal ontology; ICD's "hierarchy" is chapter organization — fine for reporting, not for clinical logic. Choosing a system for membership = choosing a hierarchy you trust.
Post-coordination never matches by equality — but ECL-based subsumption catches it, if your terminology server handles expressions.
Negation amplifies again: an exclusion criterion via subsumption on a broad concept excludes all its descendants — remember the amoxicillin. Include by broad anchor is safe; exclude by broad anchor is dangerous.

One more thing $translate won't do for you — generalization. Michael Lawley, answering Lloyd:

There's nothing in the definition of $translate to suggest it does anything "semantic"... should $translate of B return both X and Y? That's likely to lead to very noisy and unwieldy behaviour.

If you need ancestor concepts, that's a value set with a generalizes filter (or ECL) — not a $translate call.

Read through the profile — and the projection gap

If the producer declared meta.profile, use it: a profile narrows "unknown array" down to "guaranteed minimum plus possible extras". A required binding tells you at least one coding comes from a known value set; slicing tells you the array has named roles ("the mandatory LOINC code" in Vital Signs) recognizable by discriminators.

Note what's missing in the tooling, though. The profile describes a projection — slicing partitions coding[] into named parts — but there's no standard operation to execute it. Validators resolve slices internally (they must, to check cardinalities) and throw the assignment away, returning only yes/no. There is no $project(profile) that returns the resource viewed through the profile, each coding annotated with its slice name. Today you reimplement discriminators by hand (coding.where(memberOf('http://...vs'))) or encode the selection policy in a SQL-on-FHIR ViewDefinition. A profile is a contract; validation is the producer-side check of the contract; projection would be the consumer-side execution of it — the first two exist in the spec, the third doesn't yet.

The remaining safeguards

C1. The array's structure carries no meaning. Order, count, composition — none of it is a signal. The array reflects the producer's set of consumers, not your priorities.

C2. Choosing a code is a deliberate loss of the others. Each coding may carry detail its neighbors lack (lung cancer ≠ disease of the lung). Take one — know what you discarded. And keep the codes you don't understand when passing data on: you're not the last reader.

C3. Know your safe direction of error. For positive actions, interpreting by the broader code is conservative and safe; for negative ones (prohibitions, exclusions, "all except" filters) — only the narrow code, or you widen the prohibition.

C4. The "understood nothing" branch must exist and be meaningful. Unknown array length, unknown systems, new codes tomorrow. Nothing matched → text → human → explicit refusal. Not an exception and not a silent skip — otherwise .first() sneaks back in as "well, at least something".

The bottom line

CodeableConcept is a protocol across the producer/consumer gap. The producer's core duty — capture the knowledge that exists only at the moment of coding: the most precise code, the codes in alternative systems, the full text, the provenance, the profile. Every one of those answers a consumer's question that is otherwise unanswerable: "how precise?", "which bucket?", "what does it mean in words?", "which code did the human pick?", "what rules were in play?".

The consumer's core duty — apply the knowledge that exists only on their side: the use case. Classify the consumption, test membership through subsumption rather than equality, project through profiles where they exist, and choose codes by an explicit policy — never by position in the array.

Terminology is fun, as Lloyd McKenzie likes to say. Especially when the fun arrives as the seventh code in an array your parser wasn't built for.

Sources

Chat.fhir.org threads (Zulip login required):

Multiple codes in CodeableConcept — #terminology, 113 messages. The main debate and the spec rewrite.
Will $translate do generalization? — #terminology. Why $translate doesn't do generalization.
Does array order ever matter? — #implementers. orderMeaning and ordering in arrays.
CodeableConcept.coding.userSelected — #terminology. userSelected vs coding-purpose.
Adding a tag to coding in CodeableConcept — #implementers. A coding-purpose use case.
Handling post-coordination in CodeableConcept — #implementers.
Multiple Coding in Codeable Concept — #implementers. The "eye + left" laterality case.
codings with null — #terminology. US Core, VA VUID, "do not send UNK".
CodeableConcept Validation — #terminology. "99% valid is still invalid".

HL7 JIRA:

FHIR-50188 — new language for multiple codings in CodeableConcept.
FHIR-50189 — scope/granularity for Observation.code.

FHIR specification:

CodeableConcept, CI build — the rewritten "Multiple Codings" section with the intersection rule and the negation warning.
CodeableConcept, R5 — the old "slightly different granularity" wording, for comparison.