Autoscale with Prometheus
Overview⚓︎
Hydrolix can autoscale cluster components based on Prometheus metrics.
The HDX Autoscaler is managed by the Hydrolix operator and supports both target-based scaling and range-based scaling.
Improvements to the HDX Autoscaler and its functions were added in the v5.3-v5.6 releases.
Configuration⚓︎
The HDX Autoscaler supports both target mode, by default, and range mode when metric_min and metric_max are set.
Range mode normalizes metrics against the configured bounds and applies a sensitizer function to determine scaling aggressiveness.
Target mode scaling⚓︎
- Works with
metric+target_value - Ratio =
average_value÷target_value - Dead-zone applied with tolerances
- Desired replicas =
ratio× current replicas (throttled)
Range mode scaling⚓︎
- Activated when
metric_minandmetric_maxare set - Normalizes metrics between
0and1 - Applies a sensitizer function (exponential by default)
- Supports aggressive scale-up and conservative scale-down
Sensitizer function for range mode⚓︎
The autoscaler applies a sensitizer function in range mode to adjust scaling responsiveness. By default, an exponential curve makes scaling more aggressive when far from the target, and gentler when close to it. This helps prevent overshoot and thrashing.
The sensitizer default is exp 1/3.
Sensitizer example⚓︎
This autoscaler example reacts quickly to real overloads, stays calm near the target, and avoids extreme scaling.
Enable the autoscaler⚓︎
The autoscaler (hdx-scaler) feature is disabled by default. To enable it, set a replica count in the hydrolixcluster.yaml configuration file:
You can run only one autoscaler replica. This pod acts as the controller managing scaling for other services.
Configure autoscaling⚓︎
Autoscaler settings are defined in the hdxscalers block of the hydrolixcluster.yaml file.
You can configure multiple scalers per service or pool.
Key fields for autoscaling⚓︎
| Field | Description | Required | Default |
|---|---|---|---|
metric |
Prometheus metric to scale on | Yes | – |
port |
Port where metrics are served | Yes | – |
target_value |
Target metric value for scaling | Yes | – |
metric_min / metric_max |
Lower/upper bounds to activate Range mode | No | – |
exp |
Exponent for sensitizer function in Range mode | No | 1/3 |
tolerance_up / tolerance_down |
Fractional dead-zone tolerances | No | 0.1 / 0.1 |
cool_up_seconds |
Minimum time (in seconds) to wait after a scale-up event before allowing another upscale. | No | 15 |
cool_down_seconds |
Minimum time (in seconds) to wait after a scale-down event before allowing another downsize. | No | 15 |
scale_up_throttle |
Max factor per upscale (9.0 = +900%) | No | 9.0 |
scale_down_throttle |
Max fraction per downscale (0.2 = -20%) | No | 0.2 |
app |
Use metrics from another service | No | – |
rate |
Use rate of change instead of absolute | No | false |
halflife |
EWMA decay time in seconds | No | 30 |
precision |
Digits of rounding when computing ratio | No | 10 |
path |
Metrics endpoint path | No | metrics |
Use precision to set the scale ratio⚓︎
The precision configuration sets the number of digits to round to when calculating the average-to-target ratio. The default is 10.
A higher precision number smooths the transitions when scaling up and down.
For more frequent scaling to zero, set a lower precision. Set a higher precision value to keep small ratios above zero and have less frequent scaling to zero.
- A ratio of
0.045with precision ≤1 rounds to zero, scaling down to zero pods. - A ratio of
0.045with precision ≥2 rounds up and keeps one replica active.
In this example, merge_duty_cycle, part of merge-controller, is the metric that determines when merge-peer scales up, if it's been set to a low or zero precision value.
Use metrics from other services to autoscale from minimum replicas⚓︎
The autoscaler uses external metrics to decide when to scale up a scaled-to-zero service. If no separate app metric is specified, the scaler sets the minimum replica to 1 instead of 0.
Cool-up and cool-down windows⚓︎
The autoscaler supports configurable wait times between scaling actions.
- The
cool_up_secondstime - the minimum time to wait after a scale-up before another scale-up can occur - The
cool_down_secondstime - the minimum time to wait after a scale-down before another scale-down can occur
The cool-up window prevents frequent scale-ups. The cool-down window prevents frequent scale-downs.
Tolerance windows (dead-zone)⚓︎
Use tolerance_up and tolerance_down to define a range around the target where no scaling occurs.
This prevents unnecessary pod changes when metrics fluctuate near the target value.
For example:
tolerance_up: 0.1means scale-up is skipped unless the metric is more than 10% above target.tolerance_down: 0.1means scale-down is skipped unless the metric is more than 10% below target.
A tolerance window helps keep the cluster stable and avoids thrashing.
Tolerance window example⚓︎
In this example:
- If the average requests per pod are ≤60 (50 + 10%), the autoscaler holds steady and doesn't scale up.
- If requests exceed 60, it triggers a scale-up.
- If requests fall to <40 (50 - 20%), it triggers a scale-down.
- Between 40 and 60, no scaling occurs, keeping the system stable and avoiding thrash.
Basic target mode example⚓︎
This example shows a basic way to use target mode to autoscale.
Advanced range mode example⚓︎
This example sets specific values using coolup and cooldown, throttle, and tolerance ranges.
Observability⚓︎
- Run
hdxscaler stateinside the autoscaler pod to inspect current settings - Logs show scaling action and reasons
- Metrics are exported to Prometheus for dashboard visualization