Deploy Production-ready Aidbox to Kubernetes
Production-ready infrastructure
Key infrastructure elements:
- Cluster configuration — Node pool and tooling
- Database — Cloud or self-managed database
- Aidbox — Aidbox installation
- Logging — Collect application and cluster logs
- Monitoring — Collect, alert, and visualize cluster and application metrics
- Security — Vulnerability scanning and policy management
Cluster configuration and tooling
Recommended Kubernetes cluster configuration:
- Small and medium workloads — 3 nodes X 4 VCPU 16 GB RAM
- Huge workloads — 3 nodes X 8 VCPU X 64 GB RAM
Toolkit required for development and deployment:
- AWS, GCP, AZURE - Cloud provider CLI and SDK. Depends on your cloud provider:
- Kubectl - connection and cluster management
- Helm - Kubernetes package manager
- Lens - Kubernetes IDE
Optional - Development and Delivery tooling:
- Terraform - Infrastructure automation tool
- Grafana tanka - configuration utility for your Kubernetes
- Argo CD - GitOps delivery and management
- Flux - set of continuous and progressive delivery solutions for Kubernetes
Database
Managed solution
Aidbox supports all popular managed Postgresql databases. Supported versions - 13 and higher. See more details in this article — Run Aidbox on managed PostgreSQL.
Self-managed solution
For a self-managed solution in Kubernetes, we recommend using the CloudNativePG operator. It provides high availability, automated failover, backups, and seamless PostgreSQL cluster management.
Install CloudNativePG operator
kubectl apply --server-side -f \
https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/main/releases/cnpg-1.28.0.yaml
Create PostgreSQL cluster
apiVersion: v1
kind: Secret
metadata:
name: postgres
namespace: prod
stringData:
password: <your-password>
username: postgres
type: kubernetes.io/basic-auth
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: aidbox-db
namespace: prod
spec:
instances: 3
bootstrap:
initdb:
database: aidbox
owner: postgres
secret:
name: postgres
postgresql:
parameters:
shared_buffers: '2GB'
max_wal_size: '4GB'
pg_stat_statements.max: '500'
pg_stat_statements.track: 'all'
shared_preload_libraries: 'pg_stat_statements'
resources:
requests:
memory: 4Gi
cpu: '2'
limits:
memory: 8Gi
storage:
size: 100Gi
storageClass: managed-premium
CloudNativePG automatically creates services for connecting to the database:
aidbox-db-rw— read-write service (primary)aidbox-db-ro— read-only service (replicas)aidbox-db-r— any instance
Configure backups
CloudNativePG supports backups to S3, Azure Blob Storage, and Google Cloud Storage. See CloudNativePG backup documentation for details.
Alternative solutions
- Crunchy Postgres Operator — Production-ready PostgreSQL on Kubernetes.
Aidbox
First, you must get an Aidbox license on the Aidbox user portal.
You might want to use the Helm charts prepared by our DevOps engineers to make the deployment experience smoother.
Create ConfigMap with all required config and database connection
This ConfigMap example uses our default Aidbox Configuration Project Template. It's recommended to clone this template and bind your Aidbox installation with it.
apiVersion: v1
kind: ConfigMap
metadata:
name: aidbox
namespace: prod
data:
AIDBOX_BASE_URL: https://my.box.url
AIDBOX_FHIR_PACKAGES: 'hl7.fhir.r4.core#4.0.1' # your packages
AIDBOX_TERMINOLOGY_SERVICE_BASE_URL: 'https://tx.health-samurai.io/fhir'
AIDBOX_BOX_ID: aidbox
AIDBOX_PORT: '8080'
AIDBOX_STDOUT_PRETTY: all
BOX_INSTANCE_NAME: aidbox
BOX_METRICS_PORT: '8765'
PGDATABASE: aidbox
PGHOST: db.prod.svc.cluster.local # database address
PGPORT: '5432' # database port
AIDBOX_FHIR_SCHEMA_VALIDATION: 'true'
AIDBOX_COMPLIANCE: 'enabled'
AIDBOX_CORRECT_AIDBOX_FORMAT: 'true'
AIDBOX_CREATED_AT_URL: 'https://aidbox.app/ex/createdAt'
BOX_SEARCH_INCLUDE_CONFORMANT: 'true'
BOX_SEARCH_FHIR__COMPARISONS: 'true'
BOX_COMPATIBILITY_VALIDATION_JSON__SCHEMA_REGEX: '#{:fhir-datetime}'
apiVersion: v1
kind: Secret
metadata:
name: aidbox
namespace: prod
data:
AIDBOX_ADMIN_PASSWORD: <admin_password>
AIDBOX_CLIENT_SECRET: <root_client_password>
AIDBOX_LICENSE: <JWT-LICENSE> # JWT license from the Aidbox user portal
PGUSER: <db_user> # database username
PGPASSWORD: <db_password> # database password
Aidbox Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: aidbox
namespace: prod
spec:
replicas: 2
selector:
matchLabels:
service: aidbox
template:
metadata:
labels:
service: aidbox
spec:
containers:
- name: main
image: healthsamurai/aidboxone:latest
ports:
- containerPort: 8080
protocol: TCP
- containerPort: 8765
protocol: TCP
envFrom:
- configMapRef:
name: aidbox
- secretRef:
name: aidbox
livenessProbe:
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 20
timeoutSeconds: 10
periodSeconds: 10
successThreshold: 1
failureThreshold: 12
readinessProbe:
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 20
timeoutSeconds: 10
periodSeconds: 10
successThreshold: 1
failureThreshold: 6
startupProbe:
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 20
timeoutSeconds: 5
periodSeconds: 5
successThreshold: 1
failureThreshold: 4
When Aidbox starts for the first time, resolving all the dependencies takes longer. If you encounter startupProbe failure, you might want to consider increasing the initialDelaySeconds and failureThreshold under the startupProbe spec in the config above.
For running multiple Aidbox replicas, ensure all instances share the same RSA keys and secrets. See Configure Aidbox for details.
To verify that Aidbox started correctly you can check the logs:
kubectl logs -f <aidbox-pod-name>
Create the Aidbox k8s service
apiVersion: v1
kind: Service
metadata:
name: aidbox
namespace: prod
spec:
ports:
- protocol: TCP
port: 80
targetPort: 8080
selector:
service: aidbox
Ingress
A Cluster must have an ingress controller Installed.
Our recommendation is to use the Kubernetes Ingress NGINX Controller. As an alternative, you can use Traefik.
More additional information about Ingress in k8s can be found in this documentation — Kubernetes Service Networking
Ingress NGINX controller
Ingress-nginx — is an Ingress controller for Kubernetes using NGINX as a reverse proxy and load balancer.
helm upgrade \
--install ingress-nginx ingress-nginx \
--repo https://kubernetes.github.io/ingress-nginx \
--namespace ingress-nginx --create-namespace
CertManager
To provide a secure HTTPS connection you can use paid SSL certificates, issued for your domain, or use LetsEncrypt-issued certificates. In the case of using LetsEcrypt, we recommend installing and configuring Cert Manager Operator
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install \
cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--set crds.enabled=true
Configure Cluster Issuer:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt
spec:
acme:
email: hello@my-domain.com
preferredChain: ''
privateKeySecretRef:
name: issuer-key
server: https://acme-v02.api.letsencrypt.org/directory
solvers:
- http01:
ingress:
class: nginx # Ingress class name
If you use Multibox image and want to use cert manger — you should configure DNS01 authorization to provide wildcard certificates
https://letsencrypt.org/docs/challenge-types/#dns-01-challenge
Ingress resource
Now you can create k8s Ingress for Aidbox deployment
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: aidbox
namespace: prod
annotations:
acme.cert-manager.io/http01-ingress-class: nginx
cert-manager.io/cluster-issuer: letsencrypt
kubernetes.io/ingress.class: nginx
spec:
tls:
- hosts:
- my.box.url
secretName: aidbox-tls
rules:
- host: my.box.url
http:
paths:
- path: /
pathType: ImplementationSpecific
backend:
service:
name: aidbox
port:
number: 80
Now you can test ingress
curl https://my.box.url
Logging
General logging & audit information can be found in this article — Logging & Audit
Aidbox supports integration with the following systems:
- ElasticSearch — Elastic Logs and Monitoring Integration
- Loki — Grafana Loki Log management integration
- DataDog — Datadog Log management integration
ElasticSearch integration
You can install ECK using the official guide.
Configure Aidbox and ES integration
apiVersion: v1
kind: Secret
metadata:
name: aidbox
namespace: prod
data:
...
AIDBOX_ES_URL = http://es-service.es-ns.svc.cluster.local
AIDBOX_ES_AUTH = <user>:<password>
...
DataDog integration
apiVersion: v1
kind: Secret
metadata:
name: aidbox
namespace: prod
data:
...
AIDBOX_DD_API_KEY: <Datadog API Key>
...
Monitoring
For monitoring our recommendation is to use the Kube Prometheus stack
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack
Create Aidbox metrics service
apiVersion: v1
kind: Service
metadata:
name: aidbox-metrics
namespace: prod
labels:
operated: prometheus
spec:
ports:
- protocol: TCP
port: 80
targetPort: 8765
selector:
service: aidbox
Create ServiceMonitor config for scrapping metrics data
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app.kubernetes.io/component: metrics
release: kube-prometheus
serviceMonitorSelector: aidbox
name: aidbox
namespace: kube-prometheus
spec:
endpoints:
- honorLabels: true
interval: 10s
path: /metrics
targetPort: 8765
- honorLabels: true
interval: 60s
path: /metrics/minutes
targetPort: 8765
- honorLabels: true
interval: 10m
path: /metrics/hours
targetPort: 8765
namespaceSelector:
any: true
selector:
matchLabels:
operated: prometheus
Or you can directly specify the Prometheus scrapers configuration
global:
external_labels:
monitor: 'aidbox'
scrape_configs:
- job_name: aidbox
scrape_interval: 5s
metrics_path: /metrics
static_configs:
- targets: [ 'aidbox-metrics.prod.svc.cluster.local:8765' ]
- job_name: aidbox-minutes
scrape_interval: 30s
metrics_path: /metrics/minutes
static_configs:
- targets: [ 'aidbox-metrics.prod.svc.cluster.local:8765' ]
- job_name: aidbox-hours
scrape_interval: 1m
scrape_timeout: 30s
metrics_path: /metrics/hours
static_configs:
- targets: [ 'aidbox-metrics.prod.svc.cluster.local:8765' ]
Alternative solutions
- VictoriaMetrics — High-Performance Open Source Time Series Database.
- Thanos — highly available Prometheus setup with long-term storage capabilities.
- Grafana Mimir — highly available, multi-tenant, long-term storage for Prometheus.
Additional monitoring
System monitoring:
- node exporter — Prometheus exporter for hardware and OS metrics exposed by *NIX kernels
- kube state metrics — is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects
- cadvisor — container usage metrics
PostgreSQL monitoring:
- pg_exporter — Prometheus exporter for PostgreSQL server metrics
Alerting
Alerting rules allow you to define alert conditions based on Prometheus expression language expressions and to send notifications about firing alerts to an external service.
Alert rules
Alert for long-running HTTP queries with P99 > 5s in 5m interval
alert: SlowRequests
for: 5m
expr: histogram_quantile(0.99, sum (rate(aidbox_http_request_duration_seconds_bucket[5m])) by (le, route, instance)) > 5
labels: {severity: ticket}
annotations:
title: Long HTTP query execution
metric: '{{ $labels.route }}'
value: '{{ $value | printf "%.2f" }}'
Alert delivery
Alert manager template for Telegram
global:
resolve_timeout: 5m
telegram_api_url: 'https://api.telegram.org/'
route:
group_by: [alertname instance]
# Default receiver
receiver: <my-ops-chat>
routes:
# Mute watchdog alert
- receiver: empty
match: {alertname: Watchdog}
receivers:
- name: empty
- name: <my-ops-chat>
telegram_configs:
- chat_id: <chat-id>
api_url: https://api.telegram.org
parse_mode: HTML
message: |-
<b>[{{ .CommonLabels.instance }}] {{ .CommonLabels.alertname }}</b>
{{ .CommonAnnotations.title }}
{{ range .Alerts }}{{ .Annotations.metric }}: {{ .Annotations.value }}
{{ end }}
bot_token: <bot-token>
All other integrations you can find on the AlertManager documentation page.
Additional tools
- Embedded Grafana alerts
- Grafana OnCall
Security
Vulnerability and security scanners:
- Trivy operator — Kubernetes-native security toolkit.
- Trivy operator Lens extension — UI extension for Lens which provides visibility into Trivy reports
Kubernetes Policy Management:
- Kyverno OR Gatekeeper — Kubernetes policy management
Advanced:
- Datree — k8s resources linter