--- description: >- This page explains how the MPI matching model works, describing its structure, scoring logic, and configurable elements with an example. --- # Matching Model Explanation {% hint style="info" %} This page provides the **matching model code** and explains its elements.\ For an overview of probabilistic matching concepts and match score calculation, see our article [Master Patient Index and Record Linkage](https://www.health-samurai.io/articles/master-patient-index-and-record-linkage). {% endhint %} This model is used for **patient record matching**, but the same approach can be adapted to detect duplicates for any type of resource.\ If you are interested in applying this approach to your use case, please, [contact us](../../overview/contact-us.md). ## Core Idea The model compares selected fields from patient records and evaluates predefined comparison rules.\ Each rule in the **features** section contains an expression `expr` and an associated weight `bf` (Bayes Factor), indicating how strongly a match or mismatch on that field affects the total score. All weights are summed into a **total score**. If the score is above the defined threshold, the record pair is included in the match results; if it is below, it is excluded. ## Model Structure **Which fields to compare** and **how to compare** them is described in the example model:
{
"id": "model",
"vars": {
"dob": "(#.resource#>>'{birthDate}')",
"name": "((#.#family) || ' ' || (#.#given))",
"given": "(immutable_unaccent_upper(#.resource#>>'{name,0,given,0}'))",
"family": "(immutable_unaccent_upper(#.resource#>>'{name,0,family}'))",
"gender": "(#.resource#>>'{gender}')",
"address": "(#.resource#>>'{address,0,line,0}')",
"addressLength": "(length(#.resource#>>'{address,0,line,0}'))",
"telecomArray": "array(select jsonb_array_elements_text(jsonb_path_query_array( #.resource, '$.telecom[*] ? (@.value != \"\").value')))"
},
"blocks": {
"fn": {
"var": "name"
},
"dob": {
"var": "dob"
},
"addr": {
"sql": "(l.#address % r.#address)"
}
},
"features": {
"fn": [
{
"bf": 0,
"expr": " ( l.resource->'name' IS NULL OR r.resource->'name' IS NULL )"
},
{
"bf": 13.336495228175629,
"expr": "l.#name = r.#name"
},
{
"bf": 13.104401641242227,
"expr": "r.#given = l.#family AND l.#given = r.#family"
},
{
"bf": 9.288385498954133,
"expr": "levenshtein(l.#name, r.#name) <= 2"
},
{
"bf": 10.36329167966839,
"expr": "r.#given = l.#given AND string_to_array(l.#family, ' ') && string_to_array(r.#family, ' ')"
},
{
"bf": 10.36329167966839,
"expr": "r.#family = l.#family AND string_to_array(l.#given, ' ') && string_to_array(r.#given, ' ')"
},
{
"bf": 2.402276401131933,
"expr": "r.#given = l.#given"
},
{
"else": -12.37233293924643
}
],
"dob": [
{
"bf": 0,
"expr": " ( l.#dob IS NULL OR r.#dob IS NULL )"
},
{
"bf": 10.59415069916466,
"expr": "l.#dob = r.#dob"
},
{
"bf": 3.9911610470417744,
"expr": "levenshtein(l.#dob, r.#dob) <= 1"
},
{
"bf": 0.5164298695732575,
"expr": "levenshtein(l.#dob, r.#dob) <= 2"
},
{
"else": -10.322063538772698
}
],
"ext": [
{
"bf": 9.236771286242664,
"expr": "((l.#telecomArray && r.#telecomArray) AND (((l.#addressLength > r.#addressLength) and (l.#address %>> r.#address)) or ((l.#addressLength <= r.#addressLength) and (l.#address <<% r.#address))))"
},
{
"bf": 7.465648574292063,
"expr": "(((l.#addressLength > r.#addressLength) and (l.#address %>> r.#address)) or ((l.#addressLength <= r.#addressLength) and (l.#address <<% r.#address)))"
},
{
"bf": 6.465648574292063,
"expr": "l.#telecomArray && r.#telecomArray"
},
{
"else": -10.517360697819983
}
],
"sex": [
{
"bf": 0,
"expr": " ( l.#gender IS NULL OR r.#gender IS NULL )"
},
{
"bf": 1.8504082299552485,
"expr": " l.#gender = r.#gender"
},
{
"else": -4.842034404727677
}
]
},
"resource": "Patient",
"thresholds": {
"auto": 25,
"manual": 16
},
"resourceType": "AidboxLinkageModel"
}
### **Variables (`vars`)**
**Variables** defined in the model can **reference resource fields** directly or be composed from them using expressions (e.g., concatenating values, applying normalization, or calculating derived values). These variables are used in feature expressions and blocking rules.
* `dob` – patient birth date
* `name` – concatenation of family and given names
* `given` – normalized first name (accents removed, uppercase)
* `family` – normalized last name (accents removed, uppercase)
* `gender` – gender value
* `address` – normalized address line
* `telecomArray` – contact information (phone, email)
### **Comparison Blocks (`blocks`)**
Blocking rules **limit** the number of candidate record pairs by selecting only those that **share key characteristics** (e.g., similar names, matching birth dates, or addresses).\
This **reduces** the number of comparisons, which significantly **speeds up processing**, while still preserving potential matches for scoring.
* `fn`: blocks by patient name
* `dob`: blocks by date of birth
* `addr`: blocks by address
### **Matching Features and Scoring**
Features describe **how resource fields are compared** and **how much each comparison influences** the overall **match score**.
Each feature contains:
* `expr` – a logical expression that compares values of specific fields or variables between two records.
* `bf` (Bayes factor / weight) – a numeric value representing how strongly a match or mismatch on that feature affects the total score.
When records are compared, all satisfied feature expressions **add their weights** to the total score. If a mismatch is detected, **negative weights** may be applied. The result is an aggregated score reflecting the likelihood that two records refer to the same entity.
{% hint style="info" %}
The model uses **Levenshtein distance** to tolerate typos and small text differences. It counts how many single‑character edits (insertions, deletions, substitutions) are needed to make two strings equal.\
For example, levenshtein('Jonathan', 'Jonatan') = 1.
{% endhint %}
#### **Name Matching (`fn`)**:
* Exact match: 13.34 points
* Swapped first/last names: 13.10 points
* Levenshtein distance ≤ 2: 9.29 points
* Partial matches (same first name + matching parts of last name): 10.36 points
* Same first name only: 2.40 points
* No match: -12.37 points
```json
"fn": [
{
"bf": 0,
"expr": " ( l.resource->'name' IS NULL OR r.resource->'name' IS NULL )"
},
{
"bf": 13.336495228175629,
"expr": "l.#name = r.#name"
},
{
"bf": 13.104401641242227,
"expr": "r.#given = l.#family AND l.#given = r.#family"
},
{
"bf": 9.288385498954133,
"expr": "levenshtein(l.#name, r.#name) <= 2"
},
{
"bf": 10.36329167966839,
"expr": "r.#given = l.#given AND string_to_array(l.#family, ' ') && string_to_array(r.#family, ' ')"
},
{
"bf": 10.36329167966839,
"expr": "r.#family = l.#family AND string_to_array(l.#given, ' ') && string_to_array(r.#given, ' ')"
},
{
"bf": 2.402276401131933,
"expr": "r.#given = l.#given"
},
{
"else": -12.37233293924643
}
]
```
#### **Date of Birth Matching (`dob`)**:
* Exact match: 10.59 points
* Levenshtein distance ≤ 1: 3.99 points
* Levenshtein distance ≤ 2: 0.52 points
* No match: -10.32 points
```json
"dob": [
{
"bf": 0,
"expr": " ( l.#dob IS NULL OR r.#dob IS NULL )"
},
{
"bf": 10.59415069916466,
"expr": "l.#dob = r.#dob"
},
{
"bf": 3.9911610470417744,
"expr": "levenshtein(l.#dob, r.#dob) <= 1"
},
{
"bf": 0.5164298695732575,
"expr": "levenshtein(l.#dob, r.#dob) <= 2"
},
{
"else": -10.322063538772698
}
]
```
#### **Address Matching (`ext`)**:
* Exact address match: 7.47 points
* Matching contact information: 9.24 points
* No match: -10.52 points
```json
"ext": [
{
"bf": 9.236771286242664,
"expr": "((l.#telecomArray && r.#telecomArray) AND (((l.#addressLength > r.#addressLength) and (l.#address %>> r.#address)) or ((l.#addressLength <= r.#addressLength) and (l.#address <<% r.#address))))"
},
{
"bf": 7.465648574292063,
"expr": "(((l.#addressLength > r.#addressLength) and (l.#address %>> r.#address)) or ((l.#addressLength <= r.#addressLength) and (l.#address <<% r.#address)))"
},
{
"bf": 6.465648574292063,
"expr": "l.#telecomArray && r.#telecomArray"
},
{
"else": -10.517360697819983
}
]
```
#### **Gender Matching (`sex`)**:
* Exact match: 1.85 points
* No match: -4.84 points
```json
"sex": [
{
"bf": 0,
"expr": " ( l.#gender IS NULL OR r.#gender IS NULL )"
},
{
"bf": 1.8504082299552485,
"expr": " l.#gender = r.#gender"
},
{
"else": -4.842034404727677
}
]
```
### **Thresholds**
Thresholds define the **decision boundaries** for match results.\
After the total score is calculated based on all feature comparisons, it is compared against threshold values:
* `auto`: matching score ≥ 25 → automatic merge can be processed
* `manual`: 16 ≤ matching score < 25 → manual review required
* Below `manual` – score < 16 → non‑match