This functionality is available starting from version 2507 and requires FHIR Schema validation engine to be enabled.

Overview

The ClickHouse Topic Destination module provides integration between Aidbox's topic-based subscriptions and ClickHouse analytical database. It enables real-time export of FHIR resources from Aidbox to ClickHouse in a flattened format using ViewDefinitions and SQL-on-FHIR technology.

This module is designed for analytical workloads where you need to export FHIR data in a structured, queryable format suitable for business intelligence, reporting, and data analysis.

Installation

To use the ClickHouse TopicDestination module, you need to install and configure it in your Aidbox instance:

1. Configure Deployment

Docker Compose

For Docker Compose deployments, first download the module JAR file locally:

curl -O https://storage.googleapis.com/aidbox-modules/topic-destination-clickhouse/topic-destination-clickhouse-2507.jar

Then update your docker-compose.yaml to include the ClickHouse module:

  aidbox:
    extra_hosts:
    # connect Aidbox container and ClickHouse from localhost
    - "host.docker.internal:host-gateway"
    volumes:
    # module jar to turn on ClickHouse support
    - ./topic-destination-clickhouse-2507.jar:/topic-destination-clickhouse-2507.jar
    # ... other volumes ...
    environment:
      BOX_MODULE_LOAD: io.healthsamurai.topic-destination.clickhouse.core
      BOX_MODULE_JAR: "/topic-destination-clickhouse-2507.jar"
      # Enable FHIR Schema validation (required for ClickHouse module)
      BOX_FHIR_SCHEMA_VALIDATION: true
      # ... other environment variables ...

Kubernetes

For Kubernetes deployments, the module is downloaded automatically using an init container. Update your Aidbox deployment manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: aidbox
spec:
  template:
    spec:
      initContainers:
      - name: download-clickhouse-module
        image: debian:bookworm-slim
        imagePullPolicy: Always
        command:
        - sh
        - -c
        - |
          apt-get -y update
          apt-get -y install curl
          curl -L -o /modules/topic-destination-clickhouse-2507.1.jar \
            https://storage.googleapis.com/aidbox-modules/topic-destination-clickhouse/topic-destination-clickhouse-2507.1.jar
          chmod 644 /modules/topic-destination-clickhouse-2507.1.jar
        volumeMounts:
        - mountPath: /modules
          name: modules-volume
      containers:
      - name: aidbox
        image: healthsamurai/aidboxone:edge
        env:
        - name: BOX_MODULE_LOAD
          value: "io.healthsamurai.topic-destination.clickhouse.core"
        - name: BOX_MODULE_JAR
          value: "/modules/topic-destination-clickhouse-2507.1.jar"
        - name: BOX_FHIR_SCHEMA_VALIDATION
          value: "true"
        # ... other environment variables ...
        volumeMounts:
        - name: modules-volume
          mountPath: /modules
        # ... other volume mounts ...
      volumes:
      - name: modules-volume
        emptyDir: {}
      # ... other volumes ...

2. Verify Installation

In AidboxUI, go to FHIR Packages -> io.healthsamurai.topic and verify that ClickHouse profiles are present:

  • http://aidbox.app/StructureDefinition/aidboxtopicdestination-clickhouse
  • http://aidbox.app/StructureDefinition/aidboxtopicdestination-clickhouse-at-least-once

Key Features

  • Real-time data export: Automatically exports FHIR resources to ClickHouse as they are created, updated, or deleted
  • Data flattening: Uses ViewDefinitions to transform complex FHIR resources into flat, analytical-friendly tables
  • Two delivery modes: Best effort and at-least-once delivery guarantees
  • Batch processing: Configurable batch sizes and intervals for optimal performance
  • Initial export: Automatically exports existing data when setting up a new destination
  • Monitoring: Built-in metrics and status reporting
  • Flexible configuration: Support for custom ClickHouse configurations and table schemas

Architecture

The module consists of two main components:

1. Best Effort Delivery (clickhouse-best-effort)

  • Immediate delivery of individual messages
  • No persistent queue
  • Suitable for non-critical analytics where some data loss is acceptable
  • Lower latency and resource usage

2. At-Least-Once Delivery (clickhouse-at-least-once)

  • Persistent message queue with guaranteed delivery
  • Batch processing capabilities
  • Configurable retry logic
  • Suitable for critical analytics where data completeness is essential

Configuration

Common Parameters

Both delivery modes support the following parameters:

ParameterTypeRequiredDescription
urlstringYesClickHouse server URL
databasestringYesClickHouse database name
userstringYesClickHouse username
passwordstringYesClickHouse password
viewDefinitionstringYesName of the ViewDefinition resource to use for data transformation
destinationTablestringYesTarget table name in ClickHouse

At-Least-Once Specific Parameters

ParameterTypeRequiredDescription
batchSizeunsignedIntYesNumber of messages to batch together (default: 10)
sendIntervalMsunsignedIntYesTime interval between sends in milliseconds

Usage Examples

1. Best Effort Configuration

resourceType: AidboxTopicDestination
id: patient-clickhouse-best-effort
kind: clickhouse
parameter:
  - name: url
    valueString: "http://clickhouse:8123"
  - name: database
    valueString: "analytics"
  - name: user
    valueString: "default"
  - name: password
    valueString: "password"
  - name: viewDefinition
    valueString: "patient_flat_view"
  - name: destinationTable
    valueString: "patients"

2. At-Least-Once Configuration

resourceType: AidboxTopicDestination
id: patient-clickhouse-reliable
kind: clickhouse-at-least-once
parameter:
  - name: url
    valueString: "http://clickhouse:8123"
  - name: database
    valueString: "analytics"
  - name: user
    valueString: "default"
  - name: password
    valueString: "password"
  - name: viewDefinition
    valueString: "patient_flat_view"
  - name: destinationTable
    valueString: "patients"
  - name: batchSize
    valueUnsignedInt: 50
  - name: sendIntervalMs
    valueUnsignedInt: 5000

ViewDefinition Integration

The module uses ViewDefinitions to transform FHIR resources into flat, analytical structures.

Example ViewDefinition

resourceType: ViewDefinition
id: patient_flat_view
name: patient_flat_view
resource: Patient
status: active
select:
  - column:
      - name: id
        path: id
        description: Patient ID
      - name: gender
        path: gender
        description: Patient gender
      - name: birth_date
        path: birthDate
        description: Date of birth
  - forEach: "name.where(use = 'official').first()"
    column:
      - name: family_name
        path: family
        description: Family name
      - name: given_name
        path: given.join(' ')
        description: Given names joined

This ViewDefinition will flatten Patient resources into a table with columns: id, gender, birth_date, family_name, given_name.

Data Transformation

The module automatically:

  1. 1.
    Applies ViewDefinition: Transforms FHIR resources using the specified ViewDefinition
  2. 2.
    Adds deletion flag: Includes is_deleted field (1 for DELETE operations, 0 for CREATE/UPDATE)

ClickHouse Table Requirements

Your ClickHouse table should match the structure defined in your ViewDefinition:

CREATE TABLE analytics.patients (
    id String,
    gender String,
    birth_date Date,
    family_name String,
    given_name String,
    is_deleted UInt8
) ENGINE = MergeTree()
ORDER BY id;

Monitoring and Metrics

Both delivery modes provide:

  • Status reporting: Current state of the destination (active, failed, etc.)
  • Message metrics: Count of sent, failed, and queued messages
  • Initial export status: Progress of historical data export
  • Error tracking: Detailed error messages and stack traces

Status API

GET /AidboxTopicDestination/{id}/$status

Returns current status for the destination.

Error Handling

Best Effort Mode

  • Errors are logged but don't block processing
  • Failed messages are not retried
  • Suitable for non-critical analytics

At-Least-Once Mode

  • Failed messages remain in queue for retry
  • Configurable retry intervals
  • Detailed error logging and reporting
  • Automatic recovery from transient failures

Initial Export

When a new destination is created, the module automatically:

  1. 1.
    Exports existing data matching the subscription filter
  2. 2.
    Applies the ViewDefinition to historical data
  3. 3.
    Populates the ClickHouse table with existing records
  4. 4.
    Switches to real-time processing mode

Performance Considerations

Best Effort Mode

  • Lower latency (immediate processing)
  • Lower memory usage
  • Suitable for real-time dashboards

At-Least-Once Mode

  • Higher throughput with batching
  • Configurable batch sizes for optimal performance
  • Better for bulk analytics and reporting

Troubleshooting

Common Issues

  1. 1.
    Connection failures: Verify ClickHouse URL, credentials, and network connectivity
  2. 2.
    Table schema mismatches: Ensure ClickHouse table structure matches ViewDefinition output
  3. 3.
    ViewDefinition errors: Validate ViewDefinition syntax and resource compatibility
  4. 4.
    Timestamp format issues: The module automatically handles ClickHouse timestamp requirements

Debug Tips

  • Check AidboxTopicDestination status endpoint for detailed error messages
  • Verify ViewDefinition works correctly using Aidbox UI
  • Test ClickHouse connection independently
  • Monitor ClickHouse logs for ingestion errors

Last updated 2025-08-19T16:42:17Z