Scale your Cluster

Scale Hydrolix services with Kubernetes

Overview

Scale a component's resources or replica count in the scale section of your hydrolixcluster.yaml file.

Hydrolix supports both stateful and stateless components, and scaling these requires different considerations.

Use the pre-made Hydrolix scale profiles that work for various throughput levels. For example, scale_profile: prod provides components scaled for a typical 1-4 TB/daily workload.

Override a component’s scale profile by setting the scale field in hydrolixcluster.yaml.

Deprecation notice

The stream-head and stream-peer services are deprecated, and were replaced with intake-head.

Stateful components

These components have a data_storage scale key:

ServiceDescription
postgresCore
prometheusReporting and Control
rabbitmqRabbitMQ
redpandaRedpanda
zookeeperCore

Stateless components

These components all have cpu, memory and storage scale keys along with replicas:

ServiceDescriptionServiceDescription
alter-peerAlter Jobsquery-headQuery
batch-headIngestquery-peerQuery
batch-peerIngestreaperData Lifecycle
decayData Lifecyclestream-headIngest
intake-apiIngeststream-peerIngest
kafka-peerIngestturbine-apiQuery
keycloakCoretraefikCore
merge-headMergeversionCore
merge-peerMergezookeeperCore

Configure scaling

Edit the hydrolixcluster.yaml file to add or override component scale profiles.

kubectl edit hydrolixcluster --namespace="$HDX_KUBERNETES_NAMESPACE"

🚧

Stateful Persistent Volume changes

Persistent volume storage can only be increased, not decreased.

Scale pod settings

Use these settings to set the resource values of a pod:

ValueDescriptionExample
cpuAmount of CPU to use for the pod/containercpu: 2
cpu: 2.5
memoryAmount of RAM to use for the pod/containermemory: 500Gi
storageAmount of ephemeral storage to be used: note this type of storage is not statefulstorage: 10Gi
data_storageTo scale the PVC for pods that support it, use the data_storage keydata_storage: 1TB

When specified, usually sets both request and limit for the specified resource (memory, CPU, storage, data_storage). Use overcommit or limit_cpu tunables for more flexibility. See HTN tunables for more information.

Configure single pods

Modify single pod settings in the hydrolixcluster.yaml file.

For example, this setting modifies the intake-head pods to have two CPUs and 10GiB of RAM allocated:

scale:
    intake-head:
       cpu: 2
       memory: 10Gi

Configure multi-container pods

Some Hydrolix services run as pods with multiple containers. For example, the stream peer service contains both the intake-head and turbine containers. The turbine container is the indexer component that executes transforms and indexes content.

Settings applied to the default intake-head service don't apply to the turbine container.

Use the <component>-indexer name to specify the turbine component in your hydrolixcluster.yaml file. For example, for intake, use intake-indexer.

See Scale Profiles for a list of pods.

Set scale profiles

Specify a scale_profile: key in your hydrolixcluster.yaml file with a value of prod or mega.

  • prod - A fully resilient production deployment: 1-4 TB/day

  • mega - A fully resilient large-scale production deployment: 10-50 TB/day

In this example, the scale_profile is set to prod:

apiVersion: hydrolix.io/v1
kind: HydrolixCluster
metadata:
  name: hdxcli-xxxyyyy
  namespace: hdxcli-xxxyyyy
spec:
  admin_email: [email protected]
  kubernetes_namespace: hdxcli-xxxyyyy
  kubernets_profile: gcp
  env:
    EMAIL_PASSWORD: 
  hydrolix_url: https://host.hydrolix.net
  db_bucket_region: us-central1
  scale_profile: prod  <--- For the Prod Profile

Apply the changes to automatically scale the system:

kubectl apply -f

See Scale Profiles to learn more.

Override a scale profile

To override component scale settings, add more instances for components, or increase resources, use the scale: section of your hydrolixcluster.yaml file.

The prod scale profile provides two GiB memory and one replica.

This example scales the batch-peer component, with five instances and more memory than the scale profile provides.

Edit the hydrolixcluster.yaml to add this override:

.....
spec:
  .....
  scale_profile: prod
  scale:
    batch-peer:
      memory: 5G
      replicas: 5
    .....

Apply your changes:

kubectl apply -f hydrolixcluster.yaml && kubectl -n $HDX_KUBERNETES_NAMESPACE rollout restart deployment operator

📘

Services with PVC Storage

Some of the services Hydrolix uses need to maintain state in order to provide high availability and redundancy. The postgres service uses PVC storage. Specify storage using the data_storage key.

PVC changes can have significant impact. Contact Hydrolix Support if you have any questions.

Scale to zero

To autoscale everything in a cluster off except for the operator pod, add this line to the top level spec:

scale_off : true

Only the operator pod remains running, to allow scaling back up.

...
  kubernets_profile: gcp
  hydrolix_url: https://host.hydrolix.net
  env:
    EMAIL_PASSWORD: 
  db_bucket_region: us-central1
  scale_off: true  <--- Turn everything off

It takes a few minutes for all components to scale down.

To scale back up, remove scale_off : trueand apply the changes to thehydrolixcluster.yaml file.

Custom autoscaling with Prometheus metrics

The hdx-scaler, or autoscaler, provides Kubernetes cluster scaling using Prometheus metrics with low resource overhead.

Use metrics from other services to scale up

The autoscaler uses external metrics to decide when to scale up a scaled-to-zero service. If no separate app metric is specified, the scaler sets the minimum replica to 1 instead of 0.

Use precision to set the scale ratio

The precision configuration sets the number of digits to round to when calculating the average-to-target ratio. The default is 10. A higher precision number smooths the transitions when scaling up and down.

Use precision to set when the ratio rounds to zero to trigger the desired number of pods to reach zero. For more frequent scaling to zero, set a lower precision. Set a higher precision value to keep small ratios above zero and have less frequent scaling to zero.

  • A ratio of 0.045 with precision ≤1 rounds to zero, scaling down to zero pods.
  • A ratio of 0.045 with precision ≥2 rounds up and keeps one replica active.

In this example, merge_duty_cycle, part of merge-controller, is the metric that determines when merge-peer scales up, if it's been set to a low or zero precision value.

merge-peer:
    cpu: 4
    memory: 4Gi
    hdxscalers:
    - metric: merge_duty_cycle
      port: 27182
      target_value: 0.5
      cool_down_seconds: 40
      app: merge-controller
      precision: 1
    replicas: 1-5
    scale_profile: I
    service: merge-peer

Scale to minimal

This feature was added in version 5.3.

Use scale to minimal to autoscale most components to zero while leaving the cluster available for API calls and the UI. This feature provides extra cost savings by reducing resources used for idle workloads, while still allowing API interactions and scaling up when needed. This setting is off by default.

Scale to minimal is most effective for less-frequently used components, as there is a delay before scaling back up.

Enable scale to minimal

To enable scale to minimal, edit the hydrolixclusterconfig.yaml file and add scale_min: true to the spec: section.

Active services when scaled to minimal

These pods remain active when setting scale_min: true:

  • traefik
  • ui
  • turbine-api
  • turbine-api-worker
  • zookeeper
  • keycloak
  • prometheus
  • validator
  • version

All other pods scale to zero when idle.