🔥 Beyond Patient/$merge: A Resource-Agnostic, Client-Driven Merge for FHIR

We've been running Master Data Management in MDMbox across production deployments for the past two years — deduplicating patients, merging organizations, reconciling practitioners across facilities. Along the way, we learned that every organization merges differently, and no single server-side algorithm can cover the variety of real-world merge policies.

FHIR R5 introduced Patient/$merge — a long-awaited operation that lets you merge duplicate patient records. It defines input parameters (source-patient, target-patient, result-patient), expects the server to handle reference updates, and returns the merged result. The operation is currently at maturity level 0 and hasn't seen further development since its introduction.

It's a good start. But after implementing merge in production, we found the spec doesn't go far enough.

The problem with server-driven merge

The FHIR $merge spec assumes the server knows how to merge. The server deactivates the source, copies identifiers, updates references. But merge logic varies wildly between organizations:

Hospital A deletes the source patient entirely and rewrites all references
Hospital B keeps the source inactive and creates a Linkage resource for provenance
Hospital C merges Encounters and Observations into the target, but preserves separate AllergyIntolerance records for manual review
Health information exchange D needs to merge Organizations and Practitioners, not just Patients

The spec doesn't cover any of that — it's Patient-only and the "update all references" step is hand-waved.

Merge flow

What we needed was a merge operation where:

The client decides what happens (which resources to update, create, or delete)
The server guarantees atomicity, creates an audit trail, and enforces safety checks
It works for any resource type, not just Patient

Our approach: the plan parameter

We built a FHIR-like $merge that accepts a standard transaction Bundle as an additional plan parameter:

POST $merge

{
  "resourceType": "Parameters",
  "parameter": [
    {"name": "source", "valueReference": {"reference": "Patient/123"}},
    {"name": "target", "valueReference": {"reference": "Patient/456"}},
    {"name": "preview", "valueBoolean": false},
    {
      "name": "plan",
      "resource": {
        "resourceType": "Bundle",
        "type": "transaction",
        "entry": [
          {
            "resource": {"resourceType": "Patient", "id": "456", "...": "merged state"},
            "request": {"method": "PUT", "url": "Patient/456", "ifMatch": "W/\"3\""}
          },
          {
            "resource": {"resourceType": "Encounter", "id": "789",
                         "subject": {"reference": "Patient/456"}},
            "request": {"method": "PUT", "url": "Encounter/789", "ifMatch": "W/\"1\""}
          },
          {
            "request": {"method": "DELETE", "url": "Patient/123", "ifMatch": "W/\"2\""}
          }
        ]
      }
    }
  ]
}

The client sends every PUT, POST, and DELETE explicitly. The server wraps the plan with audit resources (Task + Provenance) and executes the whole thing as a single FHIR transaction. Either everything succeeds — including the audit trail — or nothing does.

Notice the ifMatch headers on every entry — this is optimistic locking. If any resource was modified between the time the client built the plan and the time it's executed, the transaction fails. No silent overwrites, no lost updates.

It's a superset, not a fork

You'll notice we use source, target, and result instead of the spec's source-patient, target-patient, and result-patient. This is intentional — the operation is resource-agnostic, so Patient-specific naming would be misleading. Merging duplicate Organizations? Practitioners? Same operation, same audit trail.

Drop the plan parameter and pass result instead — you get standard FHIR $merge behavior. Identifier-based lookup for source and target is supported. The plan is what makes it strictly more powerful: when present, the client takes full control of the merge logic.

The missing piece: $referencing

To build a good merge plan, the client needs to answer one question: "What resources reference Patient/123?"

FHIR has _revinclude for search results and $everything for Patient, but no generic operation that says "give me every resource in the database that points at this reference." We built one.

Before and after merge

Under the hood, it's a PostgreSQL query using JSONPath to search across resource tables:

SELECT id, resource_type, resource
FROM <resource_table>
WHERE jsonb_path_query_array(resource, '$.** ? (@.reference == "Patient/123")')
      != '[]'::jsonb

The query above is simplified for clarity, but the core idea holds: $.** recursively traverses the entire resource structure, finding any nested object where reference equals the target — regardless of where in the resource tree it appears. No hardcoded paths, no per-resource-type configuration. One query finds references in Encounter.subject, Observation.performer, Claim.provider, or any other reference field.

On performance: this query uses a GIN index on the JSONB column, so the recursive traversal happens against the index — not a full table scan. For a typical merge, the operation completes in milliseconds. For resource types with millions of rows, we parallelize the search across tables and stream results back to the client.

We've exposed this as a standalone $referencing operation in MDMbox — a building block that any merge (or impact analysis) workflow can use.

Task + Provenance: not just audit

Every merge creates a Task and a Provenance inside the same transaction. This is more than compliance:

Audit trail

Subscriptions. External systems can subscribe to merge Task creation and react immediately — update their caches, trigger downstream workflows, notify users. Standard FHIR Subscriptions, no custom plumbing.

Versioned snapshots. The Provenance records a versioned reference (e.g. Patient/456/_history/3) for every resource modified or deleted by the merge. Combined with FHIR's History API, this means the complete pre-merge state is always recoverable.

Lifecycle tracking. The Task records the merge status (requested → in-progress → completed or failed), who initiated it, when it happened, and links to both the source and target. Querying "show me all merges that affected this patient" is a single FHIR search on Task.

What's next: $unmerge

FHIR doesn't define an unmerge operation yet. But consider this scenario: a merge is executed on Monday, and on Wednesday someone realizes the two patients were actually different people. By then, 50 resources have been modified — Encounters repointed, Observations reassigned, the source Patient deleted.

With Task tracking the merge lifecycle, Provenance capturing every affected resource with its pre-merge version, and the History API preserving all prior states — we have everything needed to reverse a merge reliably. The $unmerge operation reads the Provenance, retrieves the pre-merge versions from history, and constructs a reverse transaction Bundle that restores every resource to its prior state.

We'll cover $unmerge in the next post.

Want to try client-driven merge in your project? MDMbox is available today.