Autoscale with Prometheus

Hydrolix provides its own autoscaling system, HDX Autoscaler, to scale based on application-specific metrics like outstanding requests and data ingestion rates.

The HDX Autoscaler is managed by the Hydrolix operator and supports two types of scaling:

Horizontal scaling⚓︎

Most Hydrolix components scale by adding or removing pod replicas, known as horizontal scaling. This type of scaling is workload-reactive and responds to changes in demand. For example, intake applications scale horizontally based on the number of outstanding requests. The HDX Autoscaler supports both target-based scaling and range-based scaling for horizontal autoscaling.

This page covers horizontal pod autoscaling using Prometheus metrics.

Vertical scaling⚓︎

Vertical Pod Autoscaling (VPA) monitors resource consumption patterns and adjusts CPU or memory for individual pods. Instead of adding more pods, VPA increases or decreases the resources allocated to each pod based on observed high-water marks. This helps prevent out-of-memory events and optimizes resource allocation without manual tuning.

For vertical scaling configuration, see Vertical Pod Autoscaling.

Configure Autoscaler⚓︎

The HDX Autoscaler supports both target mode, by default, and range mode when metric_min and metric_max are set. Range mode normalizes metrics against the configured bounds and applies a sensitizer function to determine scaling aggressiveness.

Target mode scaling⚓︎

Works with metric + target_value
Ratio = average_value ÷ target_value
Dead-zone applied with tolerances
Desired replicas = ratio × current replicas (throttled)

Range mode scaling⚓︎

Activated when metric_min and metric_max are set
Normalizes metrics between 0 and 1
Applies a sensitizer function (exponential by default)
Supports aggressive scale-up and conservative scale-down

Sensitizer function for range mode⚓︎

The autoscaler applies a sensitizer function in range mode to adjust scaling responsiveness. By default, an exponential curve makes scaling more aggressive when far from the target, and gentler when close to it. This helps prevent overshoot and thrashing.

The sensitizer default is exp 1/3.

Sensitizer example⚓︎

This autoscaler example reacts quickly to real overloads, stays calm near the target, and avoids extreme scaling.

Sensitizer Example
spec:
  scale:
    intake-head:
      # Keep between 1 and 10 pods
      replicas: 1-10

      hdxscalers:
        - metric: http_source_outstanding_reqs   # Prometheus metric (per pod)
          port: 27182
          # ---- Range mode ----
          metric_min: 0
          metric_max: 200
          # Target (still used in range mode as the "aim point")
          target_value: 80

          # Sensitizer (exponent). < 1 = more aggressive far from target, gentler near target
          exp: 0.33

          # Dead-zone around target to reduce flapping
          tolerance_up: 0.10     # ignore upscales unless >10% above target
          tolerance_down: 0.10   # ignore downscales unless >10% below target

          # Cool windows to avoid back-to-back changes
          cool_up_seconds: 45
          cool_down_seconds: 60

          # Throttles (safety rails)
          scale_up_throttle: 9.0     # at most +900% in one step
          scale_down_throttle: 0.2   # at most -20% in one step

Enable the autoscaler⚓︎

The autoscaler feature is disabled by default. To enable it, set a replica count in the hydrolixcluster.yaml configuration file.

Hydrolix version 5.11 introduced hdx-scaler-go. For v5.11 and later, hdx-scaler-go is the recommended autoscaler. If both hdx-scaler and hdx-scaler-go are enabled, the Python scaler takes precedence.

v5.11 or laterv5.10 or earlier

Enable the Autoscaler
spec:
  scale:
    hdx-scaler-go:
      replicas: 1

Enable the Autoscaler
spec:
  scale:
    hdx-scaler:
      replicas: 1

Note

You can run only one autoscaler replica. This pod acts as the controller managing scaling for other services.

Configure autoscaling⚓︎

Autoscaler settings are defined in the hdxscalers block of the hydrolixcluster.yaml file. You can configure multiple scalers for each service or pool.

Vertical Pod Autoscaling⚓︎

For information about vertical scaling, see Vertical Pod Autoscaling.

Key fields for autoscaling⚓︎

Field	Description	Required	Default
`metric`	Prometheus metric to scale on	Yes	–
`port`	Port where metrics are served	Yes	–
`target_value`	Target metric value for scaling	Yes	–
`per_pod`	Whether `target_value` is per-pod or aggregated across all pods (v5.11+)	No	`true`
`metric_min` / `metric_max`	Lower/upper bounds to activate range mode. Both must be set if either is used.	No	–
`exp`	Exponent for sensitizer function in range mode	No	`1/3`
`anchor_point`	Range mode anchor point: `"current"` or `"max"`	No	`"current"`
`tolerance`	Fractional dead-zone for both scale-up and scale-down. Overridden by `tolerance_up` or `tolerance_down` if either is set.	No	–
`tolerance_up` / `tolerance_down`	Fractional dead-zone tolerances for scale-up and scale-down separately	No	`0.1` / `0.1`
`cool_up_seconds`	Minimum seconds to wait after a scale-up before allowing another upscale	No	`15`
`cool_down_seconds`	Minimum seconds to wait after a scale-down before allowing another downscale	No	`15`
`scale_up_throttle`	Max factor per upscale step (`9.0` = +900%)	No	`9.0`
`scale_down_throttle`	Max fraction removable per downscale step (`0.2` = -20%)	No	`0.2`
`app`	Kubernetes app label for pod discovery. Defaults to the service name.	No	Service name
`rate`	Treat metric as a counter and use per-second rate	No	`false`
`op`	Aggregation operation when multiple metric series exist: `sum`, `avg`, `min`, `max`, or `first`	No	`"first"`
`halflife`	EWMA smoothing half-life in seconds	No	`30`
`bias_correction`	Enable EWMA bias correction for faster initial response (v5.11+)	No	`false`
`precision`	Decimal places for ratio rounding	No	`10`
`path`	Metrics endpoint path	No	`"metrics"`
`persistence_interval_sec`	Seconds between state snapshots	No	`60`
`suspended`	Pause scaling without losing accumulated state (v5.11+)	No	`false`
`dry_run`	Log scaling decisions without applying them (v5.11+)	No	`false`

Use `precision` to set the scale ratio⚓︎

The precision configuration sets the number of digits to round to when calculating the average-to-target ratio. The default is 10.

A higher precision number smooths the transitions when scaling up and down.

For more frequent scaling to zero, set a lower precision. Set a higher precision value to keep small ratios above zero and have less frequent scaling to zero.

A ratio of 0.045 with precision ≤1 rounds to zero, scaling down to zero pods.
A ratio of 0.045 with precision ≥2 rounds up and keeps one replica active.

In this example, merge_duty_cycle, part of merge-controller, is the metric that determines when merge-peer scales up, if it's been set to a low or zero precision value.

Precision Configuration Example
merge-peer:
    cpu: 4
    memory: 4Gi
    hdxscalers:
    - metric: merge_duty_cycle
      port: 27182
      target_value: 0.5
      cool_down_seconds: 40
      app: merge-controller
      precision: 1
    replicas: 1-5
    scale_profile: I
    service: merge-peer

Aggregate multiple metric series with `op`⚓︎

This feature was introduced in Hydrolix version 5.6.2.

By default, when a scrape returns multiple metric series, the autoscaler uses the first value. Use the op field to specify how those series are combined.

Value	Behavior
`first`	Use the first value returned (default, preserves original behavior)
`sum`	Sum all values
`avg`	Average all values
`min`	Use the minimum value
`max`	Use the maximum value

Aggregation Operation Example
hdxscalers:
- metric: http_source_outstanding_reqs
  port: 27182
  target_value: 10
  op: sum

Use metrics from other services to autoscale from minimum replicas⚓︎

The autoscaler uses external metrics to decide when to scale up a scaled-to-zero service. If no separate app metric is specified, the scaler sets the minimum replica to 1 instead of 0.

Cool-up and cool-down windows⚓︎

The autoscaler supports configurable wait times between scaling actions.

The cool_up_seconds time - the minimum time to wait after a scale-up before another scale-up can occur
The cool_down_seconds time - the minimum time to wait after a scale-down before another scale-down can occur

The cool-up window prevents frequent scale-ups. The cool-down window prevents frequent scale-downs.

Tolerance windows (dead-zone)⚓︎

Use tolerance_up and tolerance_down to define a range around the target value where no scaling occurs. This prevents unnecessary pod changes when metrics fluctuate near the target value.

To apply the same tolerance to both directions, use the single tolerance field. If tolerance_up or tolerance_down are also set, they take precedence over tolerance for their respective direction.

Tolerance values are specified as fractions of the target value. For example:

tolerance_up: 0.1 means scale-up is skipped unless the metric is more than 10% above target value.
tolerance_down: 0.1 means scale-down is skipped unless the metric is more than 10% below target value.

A tolerance window helps keep the cluster stable and avoids thrashing.

Tolerance window example⚓︎

Tolerance window configuration
spec:
  scale:
    intake-head:
      replicas: 1-5
      hdxscalers:
      - metric: http_source_outstanding_reqs
        port: 27182
        target_value: 50
        tolerance_up: 0.1     # Scale up only if >10% above target_value
        tolerance_down: 0.2   # Scale down only if >20% below target_value

If the average requests per pod are ≤55 (50 + 10% of 50), the autoscaler holds steady and doesn't scale up.
If requests exceed 55, it triggers a scale-up.
If requests fall to <40 (50 - 20% of 50), it triggers a scale-down.
Between 40 and 55, no scaling occurs, keeping the system stable and avoiding thrash.

The tolerance dead-zone concept is similar to the Kubernetes HPA tolerance parameter. For more detail on the scaling algorithm, see the HPA documentation.

Basic target mode example⚓︎

This example shows a basic way to use target mode to autoscale.

Target Mode Example
spec:
  scale:
    intake-head:
      replicas: 1-3
      hdxscalers:
      - metric: http_source_outstanding_reqs
        port: 27182
        target_value: 4

Advanced range mode example⚓︎

This example sets specific values using cool-up and cool-down windows, throttle, and tolerance windows.

Range Mode Example
spec:
  scale:
    intake-head:
      replicas: 1-5
      hdxscalers:
      - metric: http_source_outstanding_reqs
        port: 27182
        metric_min: 0
        metric_max: 100
        target_value: 50
        exp: 0.33
        cool_up_seconds: 45
        cool_down_seconds: 60
        scale_up_throttle: 9.0
        scale_down_throttle: 0.2
        tolerance_up: 0.1
        tolerance_down: 0.1

Pause and test scaling⚓︎

The suspended and dry_run fields provide operational control over scaling behavior without removing configuration.

Suspend scaling⚓︎

Set suspended: true to pause a scaler without losing its accumulated state (EWMA history, cooldown timers). When resumed, the scaler continues from where it left off.

Suspended Scaler Example
hdxscalers:
- metric: http_source_outstanding_reqs
  port: 27182
  target_value: 10
  suspended: true

Use suspended during maintenance windows, incident response, or debugging.

Dry run mode⚓︎

Set dry_run: true to run a specific scaler configuration in observation mode. The hdx-scaler pod computes and logs decisions, but doesn't apply them to the workload.

Dry Run Example
hdxscalers:
- metric: http_source_outstanding_reqs
  port: 27182
  target_value: 10
  dry_run: true

Use dry_run to validate a new configuration before enabling it, or to understand how the scaler would behave on a service.

Observability⚓︎

Logs show scaling actions and reasons. Metrics are exported to Prometheus for dashboard visualization.

Use the current scaler state to verify a configuration change took effect, or to troubleshoot unexpected scaling behavior.

To inspect current scaler state: