Configure MPI module
This page explains how to configure the Aidbox MPI module by adding a matching model. It includes an example model for testing, and notes on performance and model tuning.
The current implementation only supports the Patient resource. If you need support for additional resources, please contact us.
The example in the next section provides a basic model that allows you to start the MPI module and test its functionality. For a detailed explanation of all model elements and matching logic, see Matching Model Explanation.
Create OAuth Client for MDM Frontend
To enable authentication for the MDM frontend, create an OAuth client in Aidbox:
PUT /fhir/Client/mpi-dev
content-type: text/yaml
accept: text/yaml
id: mpi-dev
auth:
authorization_code:
redirect_uri: http://localhost:3000/api/auth/callback/aidbox
token_format: jwt
refresh_token: true
secret_required: true
access_token_expiration: 36000
refresh_token_expiration: 864000
secret: pass
first_party: true
grant_types:
- code
Add admin privileges to your user
Navigate to: Aidbox → IAM → Users → Your Admin
- 1.Open the Aidbox dashboard.
- 2.Go to the IAM (Identity and Access Management) section.
- 3.Select Users field.
- 4.Find and open your Admin user profile.
Add the following section to the user configuration JSON:
{
"data": {
"groups": [
"SIT_EMPI_ADMIN_DEV"
]
}
}
Create SQL functions
Create the following SQL functions in your Aidbox database:
CREATE OR REPLACE FUNCTION public.immutable_unaccent(x text)
RETURNS text
LANGUAGE sql
IMMUTABLE
AS $function$
SELECT
unaccent($1);
$function$;
CREATE OR REPLACE FUNCTION public.immutable_unaccent_upper(text)
RETURNS text
LANGUAGE plpgsql
IMMUTABLE
AS $function$
BEGIN
RETURN upper(public.unaccent($1));
END;
$function$;
CREATE OR REPLACE FUNCTION public.immutable_remove_spaces_unaccent_upper(text)
RETURNS text
LANGUAGE plpgsql
IMMUTABLE
AS $function$
BEGIN
RETURN replace(public.upper(public.unaccent($1)), ' ', '');
END;
$function$;
Create database indexes
The indexes below are recommendations that work well with the example model from the "Add model to Aidbox" section. Since the matching model is customizable, you may need to adjust these indexes based on your specific model configuration and data requirements.
Create the following indexes to optimize matching performance and resource reference lookups:
-- Patient indexes for matching and search
CREATE INDEX IF NOT EXISTS patient_full_name_idx_mpi ON public.patient USING btree ((((immutable_unaccent_upper((resource #>> '{name,0,family}'::text[])) || ' '::text) || immutable_unaccent_upper((resource #>> '{name,0,given,0}'::text[]))))); -- match blocks
CREATE INDEX IF NOT EXISTS patient_given_gin_idx_mpi ON public.patient USING gin (((resource #>> '{name,0,given,0}'::text[])) gin_trgm_ops); -- search by partial given
CREATE INDEX IF NOT EXISTS patient_family_gin_idx_mpi ON public.patient USING gin (((resource #>> '{name,0,family}'::text[])) gin_trgm_ops); -- search by partial family
CREATE INDEX IF NOT EXISTS patient_given_btree_idx_mpi ON public.patient USING btree (immutable_unaccent_upper((resource #>> '{name,0,given,0}'::text[]))); -- search by exact given
CREATE INDEX IF NOT EXISTS patient_family_btree_idx_mpi ON public.patient USING btree (immutable_unaccent_upper((resource #>> '{name,0,family}'::text[]))); -- search by exact family
CREATE INDEX IF NOT EXISTS patient_email_idx_mpi ON public.patient USING gin (jsonb_path_query_array(resource, '$."telecom"[*]?(@."system" == "email")."value"'::jsonpath) jsonb_path_ops); -- search by email
CREATE INDEX IF NOT EXISTS patient_identifier_idx_mpi ON public.patient USING gin (jsonb_path_query_array(resource, '$."identifier"[*]."value"'::jsonpath) jsonb_path_ops); -- search by identifier
CREATE INDEX IF NOT EXISTS patient_phone_idx_mpi ON public.patient USING gin (jsonb_path_query_array(resource, '$."telecom"[*]?(@."system" == "phone")."value"'::jsonpath) jsonb_path_ops); -- search by phone
CREATE INDEX IF NOT EXISTS patient_address_line_btree_idx_mpi ON public.patient USING btree (immutable_remove_spaces_unaccent_upper((resource #>> '{address,0,line,0}'::text[]))); -- match blocks
CREATE INDEX IF NOT EXISTS patient_identifier_idx2_mpi ON public.patient USING gin (((resource #> '{identifier}'::text[]))); -- for second model, review needed
CREATE INDEX IF NOT EXISTS patient_birthdate_idx_mpi ON public.patient USING btree (((resource #>> '{birthDate}'::text[]))); -- match blocks
-- Observation indexes for merge/unmerge operations
CREATE INDEX IF NOT EXISTS observation_encounter_references_idx_mpi ON public.observation USING gin (jsonb_path_query_array(resource, '$.**?(@."resourceType" == "Encounter")."id"'::jsonpath)); -- unmerge
CREATE INDEX IF NOT EXISTS observation_patient_references_idx_mpi ON public.observation USING gin (jsonb_path_query_array(resource, '$.**?(@."resourceType" == "Patient")."id"'::jsonpath)); -- merge
-- Specimen indexes for merge operations
CREATE INDEX IF NOT EXISTS specimen_patient_references_idx_mpi ON public.specimen USING gin (jsonb_path_query_array(resource, '$.**?(@."resourceType" == "Patient")."id"'::jsonpath)); -- merge
-- DiagnosticReport indexes for merge/unmerge operations
CREATE INDEX IF NOT EXISTS diagnosticreport_patient_references_idx_mpi ON public.diagnosticreport USING gin (jsonb_path_query_array(resource, '$.**?(@."resourceType" == "Patient")."id"'::jsonpath)); -- merge
CREATE INDEX IF NOT EXISTS diagnosticreport_encounter_references_idx_mpi ON public.diagnosticreport USING gin (jsonb_path_query_array(resource, '$.**?(@."resourceType" == "Encounter")."id"'::jsonpath)); -- unmerge
-- Encounter indexes for merge operations
CREATE INDEX IF NOT EXISTS encounter_patient_references_idx_mpi ON public.encounter USING gin (jsonb_path_query_array(resource, '$.**?(@."resourceType" == "Patient")."id"'::jsonpath)); -- merge
CREATE INDEX IF NOT EXISTS encounter_identifier_idx_mpi ON public.encounter USING gin ((jsonb_path_query_array(resource, '$."identifier".**."value"')) jsonb_path_ops);
-- Condition indexes for merge/unmerge operations
CREATE INDEX IF NOT EXISTS condition_patient_references_idx_mpi ON public.condition USING gin (jsonb_path_query_array(resource, '$.**?(@."resourceType" == "Patient")."id"'::jsonpath)); -- merge
CREATE INDEX IF NOT EXISTS condition_encounter_references_idx_mpi ON public.condition USING gin (jsonb_path_query_array(resource, '$.**?(@."resourceType" == "Encounter")."id"'::jsonpath)); -- unmerge
-- Media indexes for merge/unmerge operations
CREATE INDEX IF NOT EXISTS media_patient_references_idx_mpi ON public.media USING gin (jsonb_path_query_array(resource, '$.**?(@."resourceType" == "Patient")."id"'::jsonpath)); -- merge
CREATE INDEX IF NOT EXISTS media_encounter_references_idx_mpi ON public.media USING gin (jsonb_path_query_array(resource, '$.**?(@."resourceType" == "Encounter")."id"'::jsonpath)); -- unmerge
-- SourceMessage indexes for merge/unmerge operations
CREATE INDEX IF NOT EXISTS sourcemessage_patient_references_idx_mpi ON public.sourcemessage USING gin (jsonb_path_query_array(resource, '$.**?(@."resourceType" == "Patient")."id"'::jsonpath));
CREATE INDEX IF NOT EXISTS sourcemessage_encounter_references_idx_mpi ON public.sourcemessage USING gin (jsonb_path_query_array(resource, '$.**?(@."resourceType" == "Encounter")."id"'::jsonpath));
Add model to Aidbox
To add a matching model, you need to create a custom resource named AidboxLinkageModel.
Example of creating an AidboxLinkageModel:
POST /fhir/AidboxLinkageModel
content-type: text/yaml
accept: text/yaml
features:
fn:
- bf: 0
expr: ' ( l.resource->''name'' IS NULL OR r.resource->''name'' IS NULL )'
- bf: 13.336495228175629
expr: l.#name = r.#name
- bf: 13.104401641242227
expr: r.#given = l.#family AND l.#given = r.#family
- bf: 5.36329167966839
expr: >-
r.#family = l.#family AND
length(l.#given) <= 5 AND
length(r.#given) <= 5 AND
levenshtein(l.#given, r.#given) <= 2
- bf: 9.288385498954133
expr: levenshtein(l.#name, r.#name) <= 2
- bf: 10.36329167966839
expr: >-
r.#given = l.#given AND string_to_array(l.#family, ' ') &&
string_to_array(r.#family, ' ')
- bf: 10.36329167966839
expr: >-
r.#family = l.#family AND string_to_array(l.#given, ' ') &&
string_to_array(r.#given, ' ')
- bf: 2.402276401131933
expr: r.#given = l.#given
- else: -12.37233293924643
dob:
- bf: 0
expr: ' ( l.#dob IS NULL OR r.#dob IS NULL )'
- bf: 10.59415069916466
expr: l.#dob = r.#dob
- bf: 3.9911610470417744
expr: levenshtein(l.#dob, r.#dob) <= 1
- bf: 0.5164298695732575
expr: levenshtein(l.#dob, r.#dob) <= 2
- else: -10.322063538772698
telecom:
- bf: 0
expr: ' ( l.#telecomArray IS NULL OR r.#telecomArray IS NULL OR array_length(l.#telecomArray, 1) IS NULL OR array_length(r.#telecomArray, 1) IS NULL )'
- bf: 6.465648574292063
expr: l.#telecomArray && r.#telecomArray
- else: -10.517360697819983
address:
- bf: 0
expr: ' ( l.#address IS NULL OR r.#address IS NULL )'
- bf: 9.236771286242664
expr: >-
((l.#addressLength > r.#addressLength) and (l.#address %>> r.#address))
or ((l.#addressLength <= r.#addressLength) and (l.#address <<% r.#address))
- bf: 7.465648574292063
expr: >-
(l.#addressLength = r.#addressLength) and (l.#address = r.#address)
- else: -10.517360697819983
sex:
- bf: 0
expr: ' ( l.#gender IS NULL OR r.#gender IS NULL )'
- bf: 1.8504082299552485
expr: ' l.#gender = r.#gender'
- else: -4.842034404727677
resourceType: AidboxLinkageModel
id: >-
model
resource: Patient
thresholds:
auto: 25
manual: 16
blocks:
fn:
var: name
dob:
var: dob
addr:
sql: (l.#address = r.#address)
vars:
dob: (#.resource#>>'{birthDate}')
name: ((#.#family) || ' ' || (#.#given))
given: (immutable_unaccent_upper(#.resource#>>'{name,0,given,0}'))
family: (immutable_unaccent_upper(#.resource#>>'{name,0,family}'))
gender: (#.resource#>>'{gender}')
address: (immutable_remove_spaces_unaccent_upper(#.resource#>>'{address,0,line,0}'))
telecomArray: >-
array(select jsonb_array_elements_text(jsonb_path_query_array( #.resource,
'$.telecom[*] ? (@.value != "").value')))
addressLength: (length(#.resource#>>'{address,0,line,0}'))
Matching Model Tuning
The example model is intended for testing and demonstration purposes and may not deliver optimal results out of the box.
For production use and reliable, accurate matching on your data, you should:
- Adapt the model to reflect your data specifics and your definition of a correct match.
- Calibrate feature weights using your real-world data. This step typically involves machine learning and manual expert tuning.
We offer a professional service for model training and expert tuning.
If you need assistance, please contact us.
Performance considerations
For fast and accurate matching, consider the following:
- Database indexes: If you are working with large volumes of patient records, ensure proper database indexes are created to keep matching fast and scalable.
- Data normalization: Matching quality depends heavily on well‑normalized input data. Avoid using placeholders like
"UNKNOWN"or"not provided"for names, addresses, or birthdates, as they negatively impact results.
Configure Audit Events (Optional)
The MDM module can track and export audit events for compliance and monitoring purposes. When enabled, the system generates FHIR AuditEvent resources for operations like:
- Patient merge/unmerge operations
- Patient search and matching
- Marking/unmarking duplicates
- Patient record creation and viewing
Enable Audit Worker
To enable audit event collection and export, configure the following environment variables in your backend service:
# Enable audit worker
MPI_AUDIT_WORKER_ENABLE=true
# URL where audit events will be sent (FHIR Bundle endpoint)
MPI_AUDIT_CONSUMER_URL=http://your-audit-repository:8080/fhir/Bundle
# Polling interval in milliseconds (how often to check for pending events)
MPI_AUDIT_INTERVAL=1000
# Number of events to process per batch
MPI_AUDIT_BATCH_SIZE=10
# PostgreSQL advisory lock ID (prevents concurrent workers)
MPI_AUDIT_LOCK_ID=54321
How it works
- 1.
Event Collection: The system creates FHIR AuditEvent resources for auditable operations and stores them in the
mpi.audit_eventtable withsend_status = 'pending'. - 2.
Worker Processing: The audit worker periodically:
- Fetches pending audit events (up to
batch-size) - Bundles them into a FHIR Bundle (type: "collection")
- POSTs the bundle to the configured
audit-repository-url - Marks events as
deliveredon successful response (HTTP 2xx)
- Fetches pending audit events (up to
- 3.
Event Format: Each audit event includes:
- Operation type and outcome (success/failure)
- User information (from Aidbox IAM)
- Affected resources (patients, related resources)
- Timestamp and source system details
Audit Repository Requirements
The audit events are sent as FHIR AuditEvent resources following the BALP (Basic Audit Log Patterns) specification. You can use any FHIR-compliant audit repository, but we recommend Auditbox for optimal integration and audit log management.
Recommended: Use Auditbox for comprehensive audit event storage, querying, and compliance reporting with built-in FHIR AuditEvent support.
Your audit consumer endpoint should:
- Accept FHIR Bundle resources via HTTP POST
- Support
application/jsoncontent type - Return HTTP 2xx status for successful processing
- Handle Bundle resources with
type: "collection"containing AuditEvent entries - Support FHIR AuditEvent resources (R4 specification)
Configure Merge/Unmerge Notifications (Optional)
The notification worker sends real-time alerts when patient merge or unmerge operations occur, allowing external systems to react to patient record changes.
Enable Notification Worker
Configure the following environment variables in your backend service:
# Enable notification worker
MPI_NOTIFICATION_WORKER_ENABLE=true
# URL where notifications will be sent
MPI_NOTIFICATION_CONSUMER_URL=http://your-consumer-service:9876/notifications
# Polling interval in milliseconds
MPI_NOTIFICATION_INTERVAL=1000
# Number of notifications to process per batch
MPI_NOTIFICATION_BATCH_SIZE=10
# PostgreSQL advisory lock ID (prevents concurrent workers)
MPI_NOTIFICATION_LOCK_ID=12345
How it works
- 1.
Event Tracking: When merge/unmerge operations complete, they are marked with
notification_status = 'not_delivered'in the database. - 2.
Worker Processing: The notification worker periodically:
- Fetches undelivered merge and unmerge operations (up to
batch-size) - POSTs them to the configured
consumer-url - Marks as
deliveredon successful response (HTTP 2xx)
- Fetches undelivered merge and unmerge operations (up to
- 3.
Notification Payload: The worker sends a JSON payload containing:
{
"merges": [
{
"id": "merge-id",
"target-patient-id": "Patient/123",
"source-patient-id": "Patient/456",
"related-resources-refs": ["Observation/789", "Encounter/012"],
"result-patient": { /* FHIR Patient resource */ }
}
],
"unmerges": [
{
"id": "unmerge-id",
"merge-id": "original-merge-id",
"source-patient": { /* Restored Patient resource */ },
"user-id": "user-123",
"related-resources": ["Observation/789", "Encounter/012"]
}
]
}
Consumer Endpoint Requirements
Your notification consumer endpoint should:
- Accept HTTP POST requests with
Content-Type: application/json - Process the payload containing
mergesandunmergesarrays - Return HTTP 2xx status for successful processing