# Highly Available Aidbox {% hint style="info" %} Run parallel Aidbox replicas supported from **2208** version {% endhint %} ### Concept To provide increased High availability, the approach is to run two or more application instances. All incoming traffic is balanced between all running Aidbox instances. In case of failure of one of the instances, the network layer stops receiving incoming traffic to failed instance and distributes it to other available instances. The task of the orchestration system is to detect failure of one of the instances and restart it. {% hint style="warning" %} Attention: by default Aidbox generates both keypair and secret on every startup. This means that on every start all previously generated JWT will be invalid. In order to avoid such undesirable situation, you may pass RSA keypair and secret as Aidbox parameters. It is required to pass RSA keypair and secret as Aidbox parameters if you have multiple replicas of the same Aidbox/Multibox instance. Check out this section in the docs on how to configure it properly: [Set up RSA private/public keys and secret](../../../reference/all-settings.md#security.auth.keys.public "mention") {% endhint %} ### Configuration Let's take the Kubernetes example of a high availability Aidbox configuration (this example can also be applied to Multibox) ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: aidbox namespace: production spec: replicas: 2 selector: matchLabels: service: aidbox template: metadata: labels: service: aidbox spec: topologySpreadConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: ScheduleAnyway labelSelector: matchLabels: service: aidbox containers: - name: main image: healthsamurai/aidboxone:latest imagePullPolicy: Always ports: - containerPort: 8080 protocol: TCP envFrom: - configMapRef: name: aidbox - secretRef: name: aidbox livenessProbe: httpGet: path: /health port: 8080 scheme: HTTP initialDelaySeconds: 20 timeoutSeconds: 5 periodSeconds: 5 successThreshold: 1 failureThreshold: 4 readinessProbe: httpGet: path: /health port: 8080 scheme: HTTP initialDelaySeconds: 20 timeoutSeconds: 5 periodSeconds: 5 successThreshold: 1 failureThreshold: 2 ``` #### Replicas First of all you should specify how many replicas you need ```yaml ... spec: replicas: 2 ... ``` #### Readiness probe Readiness probe - indicates that applications running and ready to receive traffic. ```yaml readinessProbe: httpGet: path: /health port: 8080 scheme: HTTP initialDelaySeconds: 20 timeoutSeconds: 5 periodSeconds: 5 successThreshold: 1 failureThreshold: 2 ``` #### Liveness probe Liveness probe - indicates whether the container is running. If the liveness probe fails, the kubelet kills the container, and the container is subjected to its restart policy. ```yaml livenessProbe: httpGet: path: /health port: 8080 scheme: HTTP initialDelaySeconds: 20 timeoutSeconds: 5 periodSeconds: 5 successThreshold: 1 failureThreshold: 4 ``` #### Startup probe Startup probe - provide a way to defer the execution of liveness and readiness probes until a container indicates it’s able to handle them. Kubernetes won’t direct the other probe types to a container if it has a startup probe that hasn’t yet succeeded.. ```yaml startupProbe: httpGet: path: /health port: 8080 scheme: HTTP initialDelaySeconds: 20 timeoutSeconds: 5 periodSeconds: 5 successThreshold: 1 failureThreshold: 4 ``` #### Pod topology To improve fault tolerance in case of failure of one or more availability zones, you must specify — [Pod Topology Spread Constraints](https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/) ```yaml topologySpreadConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: ScheduleAnyway labelSelector: matchLabels: service: aidbox ```