09 Jun 2025 - v5.3.0
22 days ago by Lyn Landon
Auto-scaler scale pods to minimal, independent Prometheus operator
New features in 5.3.0
Scale pods to minimal with autoscaler
Services can now dynamically shrink to zero pods, cutting idle costs.
- Use
precision
to choose how many decimals to keep when rounding the average + target ratio. Smaller numbers round to zero sooner, making scale-to-zero trigger more often. - Replica counts now use the deployment name/
alias
, not theapp
label, to fix cross-service scaling. - The cool-down period is respected after configuration changes or scaler restarts, preventing sudden swings.
- Logic to grow back from zero means a service can rise above zero pods when the load returns.
- See Scale Your Cluster for more details.
Enable an independent Prometheus operator in Hydrolix
- Added support for ServiceMonitor to enable an independent Prometheus operator. The Hydrolix Prometheus integration can also be disabled as needed.
- New tunables control this feature. Defaults to
off
. - See Enable an Independent Prometheus Operator for more information.
- New tunables control this feature. Defaults to
GKE
kubectl apply -f "https://www.hydrolix.io/operator/v5.3.0/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&gcp-storage-sa=${GCP_STORAGE_SA}"
EKS
kubectl apply -f "https://www.hydrolix.io/operator/v5.3.0/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&aws-storage-role=${AWS_STORAGE_ROLE}"
LKE
kubectl apply -f "https://www.hydrolix.io/operator/v5.3.0/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}"
Changelog
Updates
These changes include version upgrades and internal dependency bumps.
- Adapted new packages to address a vulnerability: CVE-2025-27789.
- Made changes to the following packages:
- Removed
next-pwa
axios
v1.7.7 -> v1.8.2, avoid possible SSRF and credential leakage using absolute URLre2
v1.21.4 ->re2js
v1.1.0jest-worker
v26.2.1 -> v27.4.5babel
v7.25.x -> v7.27.1 (multiple sub-packages)babel/compat-data
v7.25.7 -> v7.27.2babel/parser
v7.25.6 -> v7.27.2react-virtualized
v9.22.5 -> v9.22.6recharts
v2.12.7 -> v2.15.3eslint-plugin-jsx-a11y
v6.7.1 -> v6.10.2react-select
v5.6.1 -> v5.10.1eslint-config-next
v13.5.6 -> v13.5.11
- Removed
- Upgraded the
ring
Rust library to address a possible denial of service vulnerability. See CVE-2025-4432. - Updated the following libraries to address a vulnerability where
strncopy
functions didn't properly handle null-terminated strings. See snprintf(3) for more details.catch2
from 2.13.8 to 3.8.1google-cloud-cpp
from 1.30.1 to 2.17.0protobuf
from 3.17.1 to 6.30.2
Improvements
These changes improve behavior, resilience, or usability across components.
API
- Expanded list names forbidden to user projects to include Hydrolix and ClickHouse internal project names.
Intake
- Introduced
merge-peer
graceful shutdown. Added tracking of disappearingmerge-peer
as well as graceful shutdown to themerge-controller
. - Introduced adaptive memory coefficient computation into
merge-controller
. This is a step toward obviating the table-level settingmemory_coefficient
. - Added missing configuration support for auto values under
intake-rs
, includingtable_revision
andtransform_id
.
These fields now correctly populate when specifyingauto
in the table configuration.
This update ensures that automatic values liketable_revision
andtransform_id
work as expected, simplifying configuration and reducing setup errors. - Simplified and removed summary metrics in
intake-head
to reduce memory pressure on Prometheus.- Removed
hdx_sink_partition_rows_summary
. - Removed
hdx_upload_obj_store_duration_ns
. - Use only base pod labels on
hdx_sink_bucket_maint_duration_ns
for cardinality reduction. - Use only base pod labels on
hdx_upload_process_write_result_duration_ns
for cardinality reduction.
- Removed
- Introduced support for pool-level memory limit settings in
merge-controller
. Resource limits can be set globally, per-project, and per-table. - Added
bucket_duration
metric to track merge bucket closure timing.
The newbucket_duration
metric in merge-controller tracks how long buckets remain open before closing, with a basis label indicating whether the closure was due to:full
idle_ttl
age_ttl
segment_ttl
The olderexpired_buckets
metric was removed.
- Added counters to
merge-controller
tracking work progress. New metrics arepartitions_dispatched
andcandidates_dispatched
. - Added histograms
merge-controller
capturing count of partitions for each merge candidatepartition_per_candidate
and estimated memory size of dispatched candidatescandidate_mem_size
. - Added support for reading an optional configuration file in
merge-controller
managed by the operator. This enables operational reconfiguration of themerge-controller
with per-cluster settings. - Removed limit derived from available CPU count for
MAX_OUTSTANDING_REQUESTS
. Also removed constraints onACCEPT_DATA_TIMEOUT
for all ingestion services. Now these variables are solely controlled by tunables. - Introduced a merge feature, allowing download of candidate partitions for local execution. Feature must be enabled with new tunable
merge_download_partitions_enabled
.
Core
- Improved resilience when encountering corrupt partitions during query. The new behavior skips corrupted blocks, dropping rows read from affected blocks and resynchronizes on block boundaries. Errors will continue to be returned to merge systems when encountering corrupt blocks.
- You can now set custom column and row delimiters when using ClickHouse dictionaries. Supports the following formats:
CustomSeparated
CustomSeparatedWithNames
CustomSeparatedWithNamesAndTypes
- See Custom Dictionaries for more information.
- Fixed segfault when a cancellation thread releases
telem::SpanScope
while a query thread is still running. Let the query thread release the resource. - Deduplicated different declarations of HdxQueryInfo struct. Earlier, the declaration used would be determined during the linking phase of build.
- Fixed incorrect handling of summary table SQL expressions using
AS
to create column aliases. By converting computed alias columns to ClickHouse identifiers, summary table construction occurs correctly. This fixes errors likeCode: 47. DB::Exception: Missing columns:
. - Corrected build time SHA mismatch by always using exactly 8 characters of the SHA. Earlier, a spurious mismatch was suffixing the unnecessarily frightening string
-dirty
to a version output returned under some error conditions.
Operator
- Added anomaly detection tenant which defaults off but supports configuration. Introduced new tunable
hdx_anomaly_detection
. - Fixed startup looping bug for in-cluster Superset visualization tool.
- Introduced the database connection pooling tool
pgbouncer
at version 1.24.1 into the cluster. - Corrected a logic error that disabled basic authentication on certain endpoints when unified authentication was disabled. Setting
unified_auth: false
no longer allows unauthenticated access to these endpoints. - Disabled the MySQL listener on tcp/9004 by default. Individual clusters may still support plaintext MySQL query interface by setting tunables.
- Supported fallback logic for incoming connections to service IP on tcp/9444. If
chproxy
is unavailable, queries will be passed directly to thequery-head
. This means that grafana installations can use the same public URL without change. - Provided a Hydrolix cluster resource object validator for use with Kubernetes tooling. This allows detection of incorrectly spelled tunables and other misconfigurations.
- Disabled unified auth for in-cluster Grafana to work around a constant 401 response interaction with Google oAuth. Adjusted development scale profile and allow developers to suppress TLS requirement, by respecting the existing
pg_ssl_mode
tunable.
(xref and). - Set the
intake_head_raw_data_spill_config.enabled
tunable to act as a string and a boolean type to better support its use in all versions of Hydrolix.
API resources like tables and transforms can be modified and queried. Defaults tooff
. See Scale Your Cluster for more details. - Ensured that
cool_down_seconds
is transmitted to thehdxscaler
k8s ConfigMap. Corrects a condition in which cluster was scaling down more rapidly than expected.
UI
- Improved Search on Data page to select matches from both project and table names, to improve searchability on clusters with many projects. Earlier, only table names were searched for match.
- Introduced safety measure to prevent deletion of the default transform. If a user accidentally tries to delete the current default transform on a table, a modal dialog will prevent this, advising to set another transform as a default before deleting.
- Improved management of query options and switched to project- and table-level query options API calls. Now, it's possible to remove a query option from a project or table in the UI.
- Added a field displaying the stream ingestion URL into the Table > stream settings sidebar.
- Added nine new Linode entries to the Regions selector on the New bucket page. New regions:
de-fra-1
,us-ord-10
,us-sea-9
,us-iad-10
,in-bom-1
,jp-tyo-1
,sg-sin-1
,gb-lon-1
, andau-mel-1
. - Added a preview of output columns on the New transform page. This allows users to visualize output structure before finalizing transformation.