09 Jun 2025 - v5.3.0
5 months ago by Lyn Landon
Auto-scaler scale pods to minimal, independent Prometheus operator
New features in 5.3.0
Scale pods to minimal with autoscaler
Services can now dynamically shrink to zero pods, cutting idle costs.
- Use
precisionto choose how many decimals to keep when rounding the average + target ratio. Smaller numbers round to zero sooner, making scale-to-zero trigger more often. - Replica counts now use the deployment name/
alias, not theapplabel, to fix cross-service scaling. - The cool-down period is respected after configuration changes or scaler restarts, preventing sudden swings.
- Logic to grow back from zero means a service can rise above zero pods when the load returns.
- See Scale Your Cluster for more details.
Enable an independent Prometheus operator in Hydrolix
- Added support for ServiceMonitor to enable an independent Prometheus operator. The Hydrolix Prometheus integration can also be disabled as needed.
- New tunables control this feature. Defaults to
off. - See Enable an Independent Prometheus Operator for more information.
- New tunables control this feature. Defaults to
GKE
kubectl apply -f "https://www.hydrolix.io/operator/v5.3.0/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&gcp-storage-sa=${GCP_STORAGE_SA}"
EKS
kubectl apply -f "https://www.hydrolix.io/operator/v5.3.0/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&aws-storage-role=${AWS_STORAGE_ROLE}"
LKE
kubectl apply -f "https://www.hydrolix.io/operator/v5.3.0/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}"
Changelog
Updates
These changes include version upgrades and internal dependency bumps.
- Adapted new packages to address a vulnerability: CVE-2025-27789.
- Made changes to the following packages:
- Removed
next-pwa axiosv1.7.7 -> v1.8.2, avoid possible SSRF and credential leakage using absolute URLre2v1.21.4 ->re2jsv1.1.0jest-workerv26.2.1 -> v27.4.5babelv7.25.x -> v7.27.1 (multiple sub-packages)babel/compat-datav7.25.7 -> v7.27.2babel/parserv7.25.6 -> v7.27.2react-virtualizedv9.22.5 -> v9.22.6rechartsv2.12.7 -> v2.15.3eslint-plugin-jsx-a11yv6.7.1 -> v6.10.2react-selectv5.6.1 -> v5.10.1eslint-config-nextv13.5.6 -> v13.5.11
- Removed
- Upgraded the
ringRust library to address a possible denial of service vulnerability. See CVE-2025-4432. - Updated the following libraries to address a vulnerability where
strncopyfunctions didn't properly handle null-terminated strings. See snprintf(3) for more details.catch2from 2.13.8 to 3.8.1google-cloud-cppfrom 1.30.1 to 2.17.0protobuffrom 3.17.1 to 6.30.2
Improvements
These changes improve behavior, resilience, or usability across components.
API
- Expanded list names forbidden to user projects to include Hydrolix and ClickHouse internal project names.
Intake
- Introduced
merge-peergraceful shutdown. Added tracking of disappearingmerge-peeras well as graceful shutdown to themerge-controller. - Introduced adaptive memory coefficient computation into
merge-controller. This is a step toward obviating the table-level settingmemory_coefficient. - Added missing configuration support for auto values under
intake-rs, includingtable_revisionandtransform_id.
These fields now correctly populate when specifyingautoin the table configuration.
This update ensures that automatic values liketable_revisionandtransform_idwork as expected, simplifying configuration and reducing setup errors. - Simplified and removed summary metrics in
intake-headto reduce memory pressure on Prometheus.- Removed
hdx_sink_partition_rows_summary. - Removed
hdx_upload_obj_store_duration_ns. - Use only base pod labels on
hdx_sink_bucket_maint_duration_nsfor cardinality reduction. - Use only base pod labels on
hdx_upload_process_write_result_duration_nsfor cardinality reduction.
- Removed
- Introduced support for pool-level memory limit settings in
merge-controller. Resource limits can be set globally, per-project, and per-table. - Added
bucket_durationmetric to track merge bucket closure timing.
The newbucket_durationmetric in merge-controller tracks how long buckets remain open before closing, with a basis label indicating whether the closure was due to:fullidle_ttlage_ttlsegment_ttl
The olderexpired_bucketsmetric was removed.
- Added counters to
merge-controllertracking work progress. New metrics arepartitions_dispatchedandcandidates_dispatched. - Added histograms
merge-controllercapturing count of partitions for each merge candidatepartition_per_candidateand estimated memory size of dispatched candidatescandidate_mem_size. - Added support for reading an optional configuration file in
merge-controllermanaged by the operator. This enables operational reconfiguration of themerge-controllerwith per-cluster settings. - Removed limit derived from available CPU count for
MAX_OUTSTANDING_REQUESTS. Also removed constraints onACCEPT_DATA_TIMEOUTfor all ingestion services. Now these variables are solely controlled by tunables. - Introduced a merge feature, allowing download of candidate partitions for local execution. Feature must be enabled with new tunable
merge_download_partitions_enabled.
Core
- Improved resilience when encountering corrupt partitions during query. The new behavior skips corrupted blocks, dropping rows read from affected blocks and resynchronizes on block boundaries. Errors will continue to be returned to merge systems when encountering corrupt blocks.
- You can now set custom column and row delimiters when using ClickHouse dictionaries. Supports the following formats:
CustomSeparatedCustomSeparatedWithNamesCustomSeparatedWithNamesAndTypes- See Custom Dictionaries for more information.
- Fixed segfault when a cancellation thread releases
telem::SpanScopewhile a query thread is still running. Let the query thread release the resource. - Deduplicated different declarations of HdxQueryInfo struct. Earlier, the declaration used would be determined during the linking phase of build.
- Fixed incorrect handling of summary table SQL expressions using
ASto create column aliases. By converting computed alias columns to ClickHouse identifiers, summary table construction occurs correctly. This fixes errors likeCode: 47. DB::Exception: Missing columns:. - Corrected build time SHA mismatch by always using exactly 8 characters of the SHA. Earlier, a spurious mismatch was suffixing the unnecessarily frightening string
-dirtyto a version output returned under some error conditions.
Operator
- Added anomaly detection tenant which defaults off but supports configuration. Introduced new tunable
hdx_anomaly_detection. - Fixed startup looping bug for in-cluster Superset visualization tool.
- Introduced the database connection pooling tool
pgbouncerat version 1.24.1 into the cluster. - Corrected a logic error that disabled basic authentication on certain endpoints when unified authentication was disabled. Setting
unified_auth: falseno longer allows unauthenticated access to these endpoints. - Disabled the MySQL listener on tcp/9004 by default. Individual clusters may still support plaintext MySQL query interface by setting tunables.
- Supported fallback logic for incoming connections to service IP on tcp/9444. If
chproxyis unavailable, queries will be passed directly to thequery-head. This means that grafana installations can use the same public URL without change. - Provided a Hydrolix cluster resource object validator for use with Kubernetes tooling. This allows detection of incorrectly spelled tunables and other misconfigurations.
- Disabled unified auth for in-cluster Grafana to work around a constant 401 response interaction with Google oAuth. Adjusted development scale profile and allow developers to suppress TLS requirement, by respecting the existing
pg_ssl_modetunable.
(xref and). - Set the
intake_head_raw_data_spill_config.enabledtunable to act as a string and a boolean type to better support its use in all versions of Hydrolix.
API resources like tables and transforms can be modified and queried. Defaults tooff. See Scale Your Cluster for more details. - Ensured that
cool_down_secondsis transmitted to thehdxscalerk8s ConfigMap. Corrects a condition in which cluster was scaling down more rapidly than expected.
UI
- Improved Search on Data page to select matches from both project and table names, to improve searchability on clusters with many projects. Earlier, only table names were searched for match.
- Introduced safety measure to prevent deletion of the default transform. If a user accidentally tries to delete the current default transform on a table, a modal dialog will prevent this, advising to set another transform as a default before deleting.
- Improved management of query options and switched to project- and table-level query options API calls. Now, it's possible to remove a query option from a project or table in the UI.
- Added a field displaying the stream ingestion URL into the Table > stream settings sidebar.
- Added nine new Linode entries to the Regions selector on the New bucket page. New regions:
de-fra-1,us-ord-10,us-sea-9,us-iad-10,in-bom-1,jp-tyo-1,sg-sin-1,gb-lon-1, andau-mel-1. - Added a preview of output columns on the New transform page. This allows users to visualize output structure before finalizing transformation.