Aidbox Docs

Indexes

Database indexes are essential for performance. In particular you will need indexes to speed up search requests.

Aidbox provides mechanisms to

Background

Aidbox uses PostgreSQL database for storage. Most of resource data is contained in resource column with jsonb type. See Database overview for the full picture of how resources map onto SQL.

Consider simple example: active search parameter for Patient resource.

Let's try the search query

GET /fhir/Patient?active=true

Use _explain to find out SQL query generated by this request

GET /fhir/Patient?active=true&_explain=analyze

Possible response is

{
  "query": [
    "SELECT \"patient\".* FROM \"patient\" WHERE \"patient\".resource @> ? LIMIT ? OFFSET ?",
    "{\"active\":true}",
    100,
    0
  ],
  "query-inline": [
    "SELECT \"patient\".* FROM \"patient\" WHERE \"patient\".resource @> '{\"active\":true}' LIMIT 100 OFFSET 0"
  ],
  "plan": "Limit  (cost=0.00..1.01 rows=1 width=124) (actual time=0.015..0.015 rows=0 loops=1)\n  ->  Seq Scan on patient  (cost=0.00..1.01 rows=1 width=124) (actual time=0.014..0.014 rows=0 loops=1)\n        Filter: (resource @> '{\"active\": true}'::jsonb)\n        Rows Removed by Filter: 1\n  Planning Time: 0.729 ms\n  Execution Time: 0.050 ms"
}

Corresponding SQL is

SELECT "patient".*
FROM "patient"
WHERE "patient".resource @> '{"active": "true"}'::jsonb
LIMIT 100
OFFSET 0

Here @> is containment operator. It tests whether jsonb value on the right-hand side is contained in the jsonb value on the left-hand side.

Without indexes Postgres has to check this condition for every Patient resource stored in the database.

However, GIN indexes can speed up these kind of queries. A GIN index inverts the jsonb structure into a lookup table of the keys and values it contains, so a containment test (@>) can jump straight to matching rows instead of scanning the whole table.

We can create GIN index for the resource column

CREATE INDEX patient_resource_gin_idx
ON Patient
USING GIN (resource)

Now Postgres can use this index to make search much faster.

Functional indexes

Consider more complex example: name search parameter for Patient resource.

Request

GET /fhir/Patient?name=abc

Generates SQL like

SELECT *
FROM Patient
WHERE
  aidbox_text_search(
    knife_extract_text(
      resource,
      '[["name","family"],["name","given"],["name","middle"],["name","text"],["name","prefix"],["name","suffix"]]'
    )
  ) ILIKE unaccent('% abc%')
LIMIT 100
OFFSET 0

Postgres' pg_trgm module supports index searches for ILIKE queries.

You can create functional index to speed up this query:

CREATE INDEX patient_name_trgm_idx
ON Patient
USING GIN (
  aidbox_text_search(
    knife_extract_text(
      resource,
      '[["name","family"],["name","given"],["name","middle"],["name","text"],["name","prefix"],["name","suffix"]]'
    )
  ) gin_trgm_ops
)

Which indexes does Aidbox need?

It depends — and that's the point. A short tour of what can vary:

  • Index method. GIN for @> over jsonb, GIN with gin_trgm_ops for fuzzy text (name, _text), btree for ordered access (id, _lastUpdated, date), GiST for spatial (near on Location). The right choice depends on the SearchParameter's type, not its name.
  • Modifiers. :contains and :exact on the same name parameter need different functional indexes; :in / :not-in / :above / :below on token parameters expand into ValueSet lookups; :identifier / :of-type pull from different jsonb paths.
  • Path expressions. Aidbox stores resources as jsonb, so the suggester emits functional indexes over knife_extract_text(...) or jsonb_path_query(...) — one per SP path — rather than indexes on plain columns.
  • Joins. Chained queries (Observation?subject:Patient.name=John) and reverse-chain _has queries translate into SQL joins or subselects; both sides need their own indexes.
  • Full-resource fallback. Token and reference parameters without a dedicated path fall back to a GIN over the whole jsonb. It rescues queries that no functional index covers, but it's larger on disk.

Hand-picking the right combination per parameter is impractical. The next sections cover Aidbox's suggest-index RPCs, which compute the candidates for you, and the usage-statistics RPCs, which tell you which suggestions actually deserve the disk space.

Index suggestion

Aidbox provides two RPCs that can suggest you indexes

Suggest indexes for parameter

Use aidbox.index/suggest-index RPC to get index suggestion for specific search parameter

POST /rpc
Content-Type: text/yaml
Accept: text/yaml

method: aidbox.index/suggest-index
params:
  resource-type: <resourceType>
  search-param: <searchParameter>

Suggest indexes for query

Use aidbox.index/suggest-index-query RPC to get index suggestions based on query

POST /rpc
Content-Type: text/yaml
Accept: text/yaml

method: aidbox.index/suggest-index-query
params:
  resource-type: Observation
  query: date=gt2022-01-01&_id=myid

Usage statistics

Aidbox tracks how often each SearchParameter is queried and exposes the numbers via RPCs. Use them to rank "hot" parameters, decide which suggested indexes are worth creating, and confirm a created index is actually being used. Available since Aidbox 2605.

See also

Last updated: