29 July 2025 - v5.4.0
Column aliasing, service accounts, pagination in Config API, database connection pooling support
Notable new features
Column aliasing
Column aliasing allows the definition of columns as the result of a calculation from other columns or queries. The underlying partition doesn't contain the data. To use a column alias, at least one view must refer to the defined alias column.
Service accounts
Service Accounts provide variable-lifetime, revocable, long-lived access tokens for programmatic use. This feature facilitates authentication for automated workflows. Service accounts, tokens, and associated roles can be created, modified, and deleted in the UI under Security > Service Accounts. Tokens and names associated with service accounts created prior to release v5.4 will not work. See Breaking changes.
Pagination in Config API
The Config API implements fully-featured, consistent pagination and a different response object to support pagination details. Client applications will need to adapt to the pagination schemes for most endpoints. All services inside a Hydrolix cluster are already migrated to the new pagination styles. See Breaking changes.
Option for database connection pooling
Introduced pgbouncer, a lightweight database connection pooling application, into the cluster along with controlling tunables. It isn't enabled by default.
Breaking changes
Pagination in Config API
Pagination is now required for interacting with most Config API endpoints. Affected endpoints include those for
- catalog endpoints, which switched to cursor pagination
- viewing authentication logs
- managing storages, projects, tables, jobs, credentials, and dictionaries
- listing users, roles, tasks, activity, transform templates, and invites
All internal cluster activity now uses the paginated API calls.
For a detailed description see Pagination Change in v5.4.
Earlier service account tokens invalidated
Prior releases included early support for service accounts and tokens. Tokens issued to those accounts cease working. These older tokens had a static lifetime (TTL) and were not revocable – they are incompatible with the variable lifetime, revocable service account tokens.
Service accounts created prior to the v5.4 release will also lose their presentational name.
Upgrade instructions
GKE
kubectl apply -f "https://www.hydrolix.io/operator/v5.4.0/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&gcp-storage-sa=${GCP_STORAGE_SA}"
EKS
kubectl apply -f "https://www.hydrolix.io/operator/v5.4.0/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&aws-storage-role=${AWS_STORAGE_ROLE}"
LKE
kubectl apply -f "https://www.hydrolix.io/operator/v5.4.0/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}"
Rollback instructions
See rollback instructions in Upgrade to v5.4.x.
Changelog
Updates
These changes include version upgrades and internal dependency bumps.
Intake
- Upgraded the
rust-opensslRust library from v0.10.68 to v0.10.72 to address Use-After-Free vulnerability. See RUSTSEC-2025-0022 and CVE-2025-24898.
Cluster Operations
- Upgraded
traefikreverse proxy from v2.11 to v3.4. - Upgraded
Quesmafrom v1.0.2 to v1.1.12. - Upgraded
kopffrom v1.37.1 to v1.38.0.
Improvements
Core
- Supported TLS termination for MySQL interface. MySQL server implementation requires TLS termination to occur on the query head, not in the
traefikreverse proxy. - Removed build information from log messages, and changed from srncpy() to memcpy() to address security concerns.
API
- Added
no-cache,no-store, andmust-revalidateheaders to responses containing authentication cookies. This addresses a potential security issue by advising intermediate proxies to avoid caching. - The
/config/v1/orgs/{org_id}/summary/API endpoint now supports ashow_summary_tablesquery parameter that can be set totrueorfalse. If set tofalse, only non-summary tables will be returned.
Intake
- The
intake-headnow implements a graceful backoff and more tolerant retry strategy for catalog inserts.
Merge
- The
merge-controllernow searches for merge candidates in a much more efficient manner, dramatically reducing CPU use in certain cases. - The
merge-controllerexposes some metrics, such as table efficiency and partition memory distribution, that require expensive queries applied to the catalog database. To reduce load on the catalog, those queries now happen less frequently than before. - Improved configurability of
merge-controllerby allowing tunables to specify maximum partitions to be combined and maximum candidates. Introduces a new tunablemerge_max_partitions.
Cluster operation
-
Hydrolix-managed Keycloak users can now authenticate via SSO. When visiting
https://{myhost}.hydrolix.live/grafana/login, use the new "Sign in with Hydrolix" button. -
For Kibana access, Kibana-configured users can now be used instead of Hydrolix users. Set the new
kibana_security_enabledtunable to enable this feature, and use thehdx-elastic-userKubernetes secret for theadminuser. Once logged in, create new users with Stack Management -> Security -> Users, and make sure those users have themonitoring_userrole assigned. -
Introduced production scaling properties to support the highly-available
chproxyHTTP proxy. -
Changed support for
chproxyto allow control over version in tunablehttp_proxy. Also extractedchproxyto external repository. -
chproxycan now use Redis as a cache back-end with themode: redisand otherhttp_proxyconfiguration options. The Redis server is not shipped with the Hydrolix cluster. More information can be found in the Tunables List. -
Implemented the "power of two random choices" algorithm for the
stream_load_balancer_algorithmtunable. Set top2cto enable this alternative to the default round-robin (rr) option. Withp2ctwo servers are selected at random, assigning the current request to the server with the fewest active connections. This could more evenly distribute traffic and reduceintake-headresource needs. -
A new
terminate_tls_at_lbtunable allows TLS termination at an Amazon Network Load Balancer, rather than at the Hydrolix internal Traefik instance. More documentation can be found in External TLS on AWS NLB . -
The
overcommittunable now allows two settings other than the pre-existingtrueandfalse. Setting it torequestswill set all requested resources to 0, while keeping the limits. Setting it tolimitswill keep requests intact, but remove the limits. -
The Hydrolix configuration validator can now be configured with the option to warn upon validation failures, rather than exiting.
-
Cluster configuration information can now be sent to a central fleet database on an hourly basis.
argus_fleet_url,argus_fleet_table, andargus_fleet_transformtunables have been added to support this. -
Introduced intake pool routing, allowing
traefikreverse proxy to route to pools based on HTTP header values or query parameters. This is useful for cluster configuration of intake pool selection, without reconfiguring the sending software. -
Added tunable
vector_extra_namespacesso that logs can be collected from namespaces other than the Hydrolix cluster namespace. -
Added support for time-based and scheduled changes to the Hydrolix spec. The new tunable
overridesallows both cron-like and one-time scaling expressions. -
hdx-scalercan now filter metrics by attributes of the service that produced them. -
Improved
hdxscaleruser interface by showing tabbed output of only the configured hdxscalers, handling scrolling better and displaying relevant config values. -
Added
log_leveltunable to control HDX scaler logging level -
Allow external Kibana instances to reach the in-cluster Quesma. This requires the reverse proxy to be configured with a wildcard TLS certificate of the form
*.{myhost}.hydrolix.live.
UI
- Introduced consistent pagination to many endpoints. Related to the introduction of pagination in the Config API in this release.
- Improved friendliness of input fields capable of accepting multiple values. Now, a user can enter a single value without having to enter a delimiter, too.
- In batch jobs UI, adds an editor (disabled) to the ‘view’ sidebar to view the full JSON job object and a copy button.
Bug fixes
API
- Fixed a bug that allowed the
intake_head_urlfor a table to point to a deleted transform. - Addressed a potential denial of service attack against storage configurations and batch jobs by tightening up hostname and storage path validation functions.
- Corrected consistency of conversion for boolean data types when constructing views. Now correctly handles elements in a complex data type.
- Closed race conditions on token validity when users' accounts are disabled or deleted. Earlier, a token would incorrectly remain valid.
Cluster operation
- Added a liveness check to the version service. This should prevent an occasional freezing issue with watch operations in the Python Kubernetes client.
- Allowed the
.well-known/acme-challengepath to be served over TCP/80 to match HTTP validation expectations across all ACME providers. Plaintext HTTP is used by both Lets Encrypt and Buypass. Earlier, Buypass wasn't following redirects to HTTPS. - Disabled visibility of
traefikdashboards by default.
Core
- Fixed a bug and improved performance when both sides of a filter in a SQL statement are non-constants.
- Fixed broken catalog database retry logic. This prevents
CatalogErrors due to lost database connections and overlapping transactions. - Tightened response to any detected corruption in a partition by skipping all subsequent blocks. The
turbineandturbine_summarytable functions occasionally display different responses to corruption.
Intake and merge
- Switched to using a concurrency-friendly map for metric data storage and retrieval. This avoids a rare crash scenario.
- Avoided panic on out-of-bounds error when Amazon Data Firehose response error message is shorter than maximum length allowed.
- Fixed a bug where the
merge-controllerwouldn't start if there was anothermerge-controlleror amerge-headin acompletedstate. - Fixed a bug where the
merge-controllerwould become "stuck" due to an internal blocking issue. - Fixed a bug where the
merge-controllerwould stop creating partitions after receiving an error from the Postgres catalog database. - Fixed a bug where a project's
rate_limitwas being ignored.
~