Autoscale with Prometheus
Hydrolix provides its own autoscaling system, HDX Autoscaler, to scale based on application-specific metrics like outstanding requests and data ingestion rates.
The HDX Autoscaler is managed by the Hydrolix operator and supports two types of scaling:
Horizontal scaling⚓︎
Most Hydrolix components scale by adding or removing pod replicas, known as horizontal scaling. This type of scaling is workload-reactive and responds to changes in demand. For example, intake applications scale horizontally based on the number of outstanding requests. The HDX Autoscaler supports both target-based scaling and range-based scaling for horizontal autoscaling.
This page covers horizontal pod autoscaling using Prometheus metrics.
Vertical scaling⚓︎
Vertical Pod Autoscaling (VPA) monitors resource consumption patterns and adjusts CPU or memory for individual pods. Instead of adding more pods, VPA increases or decreases the resources allocated to each pod based on observed high-water marks. This helps prevent out-of-memory events and optimizes resource allocation without manual tuning.
For vertical scaling configuration, see Vertical Pod Autoscaling.
Configure Autoscaler⚓︎
The HDX Autoscaler supports both target mode, by default, and range mode when metric_min and metric_max are set.
Range mode normalizes metrics against the configured bounds and applies a sensitizer function to determine scaling aggressiveness.
Target mode scaling⚓︎
- Works with
metric+target_value - Ratio =
average_value÷target_value - Dead-zone applied with tolerances
- Desired replicas =
ratio× current replicas (throttled)
Range mode scaling⚓︎
- Activated when
metric_minandmetric_maxare set - Normalizes metrics between
0and1 - Applies a sensitizer function (exponential by default)
- Supports aggressive scale-up and conservative scale-down
Sensitizer function for range mode⚓︎
The autoscaler applies a sensitizer function in range mode to adjust scaling responsiveness. By default, an exponential curve makes scaling more aggressive when far from the target, and gentler when close to it. This helps prevent overshoot and thrashing.
The sensitizer default is exp 1/3.
Sensitizer example⚓︎
This autoscaler example reacts quickly to real overloads, stays calm near the target, and avoids extreme scaling.
Enable the autoscaler⚓︎
The autoscaler feature is disabled by default. To enable it, set a replica count in the hydrolixcluster.yaml configuration file.
Hydrolix version 5.11 introduced hdx-scaler-go. For v5.11 and later, hdx-scaler-go is the recommended autoscaler. If both hdx-scaler and hdx-scaler-go are enabled, the Python scaler takes precedence.
Note
You can run only one autoscaler replica. This pod acts as the controller managing scaling for other services.
Configure autoscaling⚓︎
Autoscaler settings are defined in the hdxscalers block of the hydrolixcluster.yaml file.
You can configure multiple scalers for each service or pool.
Vertical Pod Autoscaling⚓︎
For information about vertical scaling, see Vertical Pod Autoscaling.
Key fields for autoscaling⚓︎
| Field | Description | Required | Default |
|---|---|---|---|
metric |
Prometheus metric to scale on | Yes | – |
port |
Port where metrics are served | Yes | – |
target_value |
Target metric value for scaling | Yes | – |
per_pod |
Whether target_value is per-pod or aggregated across all pods (v5.11+) |
No | true |
metric_min / metric_max |
Lower/upper bounds to activate range mode. Both must be set if either is used. | No | – |
exp |
Exponent for sensitizer function in range mode | No | 1/3 |
anchor_point |
Range mode anchor point: "current" or "max" |
No | "current" |
tolerance |
Fractional dead-zone for both scale-up and scale-down. Overridden by tolerance_up or tolerance_down if either is set. |
No | – |
tolerance_up / tolerance_down |
Fractional dead-zone tolerances for scale-up and scale-down separately | No | 0.1 / 0.1 |
cool_up_seconds |
Minimum seconds to wait after a scale-up before allowing another upscale | No | 15 |
cool_down_seconds |
Minimum seconds to wait after a scale-down before allowing another downscale | No | 15 |
scale_up_throttle |
Max factor per upscale step (9.0 = +900%) |
No | 9.0 |
scale_down_throttle |
Max fraction removable per downscale step (0.2 = -20%) |
No | 0.2 |
app |
Kubernetes app label for pod discovery. Defaults to the service name. | No | Service name |
rate |
Treat metric as a counter and use per-second rate | No | false |
op |
Aggregation operation when multiple metric series exist: sum, avg, min, max, or first |
No | "first" |
halflife |
EWMA smoothing half-life in seconds | No | 30 |
bias_correction |
Enable EWMA bias correction for faster initial response (v5.11+) | No | false |
precision |
Decimal places for ratio rounding | No | 10 |
path |
Metrics endpoint path | No | "metrics" |
persistence_interval_sec |
Seconds between state snapshots | No | 60 |
suspended |
Pause scaling without losing accumulated state (v5.11+) | No | false |
dry_run |
Log scaling decisions without applying them (v5.11+) | No | false |
Use precision to set the scale ratio⚓︎
The precision configuration sets the number of digits to round to when calculating the average-to-target ratio. The default is 10.
A higher precision number smooths the transitions when scaling up and down.
For more frequent scaling to zero, set a lower precision. Set a higher precision value to keep small ratios above zero and have less frequent scaling to zero.
- A ratio of
0.045with precision ≤1 rounds to zero, scaling down to zero pods. - A ratio of
0.045with precision ≥2 rounds up and keeps one replica active.
In this example, merge_duty_cycle, part of merge-controller, is the metric that determines when merge-peer scales up, if it's been set to a low or zero precision value.
| Precision Configuration Example | |
|---|---|
Aggregate multiple metric series with op⚓︎
This feature was introduced in Hydrolix version 5.6.2.
By default, when a scrape returns multiple metric series, the autoscaler uses the first value. Use the op field to specify how those series are combined.
| Value | Behavior |
|---|---|
first |
Use the first value returned (default, preserves original behavior) |
sum |
Sum all values |
avg |
Average all values |
min |
Use the minimum value |
max |
Use the maximum value |
| Aggregation Operation Example | |
|---|---|
Use metrics from other services to autoscale from minimum replicas⚓︎
The autoscaler uses external metrics to decide when to scale up a scaled-to-zero service. If no separate app metric is specified, the scaler sets the minimum replica to 1 instead of 0.
Cool-up and cool-down windows⚓︎
The autoscaler supports configurable wait times between scaling actions.
- The
cool_up_secondstime - the minimum time to wait after a scale-up before another scale-up can occur - The
cool_down_secondstime - the minimum time to wait after a scale-down before another scale-down can occur
The cool-up window prevents frequent scale-ups. The cool-down window prevents frequent scale-downs.
Tolerance windows (dead-zone)⚓︎
Use tolerance_up and tolerance_down to define a range around the target value where no scaling occurs.
This prevents unnecessary pod changes when metrics fluctuate near the target value.
To apply the same tolerance to both directions, use the single tolerance field. If tolerance_up or tolerance_down are also set, they take precedence over tolerance for their respective direction.
Tolerance values are specified as fractions of the target value. For example:
tolerance_up: 0.1means scale-up is skipped unless the metric is more than 10% above target value.tolerance_down: 0.1means scale-down is skipped unless the metric is more than 10% below target value.
A tolerance window helps keep the cluster stable and avoids thrashing.
Tolerance window example⚓︎
| Tolerance window configuration | |
|---|---|
- If the average requests per pod are ≤55 (50 + 10% of 50), the autoscaler holds steady and doesn't scale up.
- If requests exceed 55, it triggers a scale-up.
- If requests fall to <40 (50 - 20% of 50), it triggers a scale-down.
- Between 40 and 55, no scaling occurs, keeping the system stable and avoiding thrash.
The tolerance dead-zone concept is similar to the Kubernetes HPA tolerance parameter. For more detail on the scaling algorithm, see the HPA documentation.
Basic target mode example⚓︎
This example shows a basic way to use target mode to autoscale.
| Target Mode Example | |
|---|---|
Advanced range mode example⚓︎
This example sets specific values using cool-up and cool-down windows, throttle, and tolerance windows.
| Range Mode Example | |
|---|---|
Pause and test scaling⚓︎
The suspended and dry_run fields provide operational control over scaling behavior without removing configuration.
Suspend scaling⚓︎
Set suspended: true to pause a scaler without losing its accumulated state (EWMA history, cooldown timers). When resumed, the scaler continues from where it left off.
| Suspended Scaler Example | |
|---|---|
Use suspended during maintenance windows, incident response, or debugging.
Dry run mode⚓︎
Set dry_run: true to run a specific scaler configuration in observation mode. The hdx-scaler pod computes and logs decisions, but doesn't apply them to the workload.
| Dry Run Example | |
|---|---|
Use dry_run to validate a new configuration before enabling it, or to understand how the scaler would behave on a service.
Observability⚓︎
Logs show scaling actions and reasons. Metrics are exported to Prometheus for dashboard visualization.
Use the current scaler state to verify a configuration change took effect, or to troubleshoot unexpected scaling behavior.
To inspect current scaler state:
| Inspect Autoscaler State | |
|---|---|
Prometheus metrics are also available at :27183/metrics on the hdx-scaler-go pod.
Run hdxscaler state inside the autoscaler pod.