29 July 2025 - v5.4.0
Column aliasing, service accounts, pagination in Config API, database connection pooling support
Notable new features
Column aliasing
Column aliasing allows the definition of columns as the result of a calculation from other columns or queries. The underlying partition doesn't contain the data. To use a column alias, at least one view must refer to the defined alias column.
Service accounts
Service Accounts provide variable-lifetime, revocable, long-lived access tokens for programmatic use. This feature facilitates authentication for automated workflows. Service accounts, tokens, and associated roles can be created, modified, and deleted in the UI under Security > Service Accounts. Tokens and names associated with service accounts created prior to release v5.4 will not work. See Breaking changes.
Pagination in Config API
The Config API implements fully-featured, consistent pagination and a different response object to support pagination details. Client applications will need to adapt to the pagination schemes for most endpoints. All services inside a Hydrolix cluster are already migrated to the new pagination styles. See Breaking changes.
Option for database connection pooling
Introduced pgbouncer
, a lightweight database connection pooling application, into the cluster along with controlling tunables. It isn't enabled by default.
Breaking changes
Pagination in Config API
Pagination is now required for interacting with most Config API endpoints. Affected endpoints include those for
- catalog endpoints, which switched to cursor pagination
- viewing authentication logs
- managing storages, projects, tables, jobs, credentials, and dictionaries
- listing users, roles, tasks, activity, transform templates, and invites
All internal cluster activity now uses the paginated API calls.
For a detailed description see Pagination Change in v5.4.
Earlier service account tokens invalidated
Prior releases included early support for service accounts and tokens. Tokens issued to those accounts cease working. These older tokens had a static lifetime (TTL) and were not revocable – they are incompatible with the variable lifetime, revocable service account tokens.
Service accounts created prior to the v5.4 release will also lose their presentational name.
Upgrade instructions
GKE
kubectl apply -f "https://www.hydrolix.io/operator/v5.4.0/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&gcp-storage-sa=${GCP_STORAGE_SA}"
EKS
kubectl apply -f "https://www.hydrolix.io/operator/v5.4.0/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&aws-storage-role=${AWS_STORAGE_ROLE}"
LKE
kubectl apply -f "https://www.hydrolix.io/operator/v5.4.0/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}"
Rollback instructions
See rollback instructions in Upgrade to v5.4.x.
Changelog
Updates
These changes include version upgrades and internal dependency bumps.
Intake
- Upgraded the
rust-openssl
Rust library from v0.10.68 to v0.10.72 to address Use-After-Free vulnerability. See RUSTSEC-2025-0022 and CVE-2025-24898.
Cluster Operations
- Upgraded
traefik
reverse proxy from v2.11 to v3.4. - Upgraded
Quesma
from v1.0.2 to v1.1.12. - Upgraded
kopf
from v1.37.1 to v1.38.0.
Improvements
Core
- Supported TLS termination for MySQL interface. MySQL server implementation requires TLS termination to occur on the query head, not in the
traefik
reverse proxy. - Removed build information from log messages, and changed from srncpy() to memcpy() to address security concerns.
API
- Added
no-cache
,no-store
, andmust-revalidate
headers to responses containing authentication cookies. This addresses a potential security issue by advising intermediate proxies to avoid caching. - The
/config/v1/orgs/{org_id}/summary/
API endpoint now supports ashow_summary_tables
query parameter that can be set totrue
orfalse
. If set tofalse
, only non-summary tables will be returned.
Intake
- The
intake-head
now implements a graceful backoff and more tolerant retry strategy for catalog inserts.
Merge
- The
merge-controller
now searches for merge candidates in a much more efficient manner, dramatically reducing CPU use in certain cases. - The
merge-controller
exposes some metrics, such as table efficiency and partition memory distribution, that require expensive queries applied to the catalog database. To reduce load on the catalog, those queries now happen less frequently than before. - Improved configurability of
merge-controller
by allowing tunables to specify maximum partitions to be combined and maximum candidates. Introduces a new tunablemerge_max_partitions
.
Cluster operation
-
Hydrolix-managed Keycloak users can now authenticate via SSO. When visiting
https://{myhost}.hydrolix.live/grafana/login
, use the new "Sign in with Hydrolix" button. -
For Kibana access, Kibana-configured users can now be used instead of Hydrolix users. Set the new
kibana_security_enabled
tunable to enable this feature, and use thehdx-elastic-user
Kubernetes secret for theadmin
user. Once logged in, create new users with Stack Management -> Security -> Users, and make sure those users have themonitoring_user
role assigned. -
Introduced production scaling properties to support the highly-available
chproxy
HTTP proxy. -
Changed support for
chproxy
to allow control over version in tunablehttp_proxy
. Also extractedchproxy
to external repository. -
chproxy
can now use Redis as a cache back-end with themode: redis
and otherhttp_proxy
configuration options. The Redis server is not shipped with the Hydrolix cluster. More information can be found in the Tunables List. -
Implemented the "power of two random choices" algorithm for the
stream_load_balancer_algorithm
tunable. Set top2c
to enable this alternative to the default round-robin (rr
) option. Withp2c
two servers are selected at random, assigning the current request to the server with the fewest active connections. This could more evenly distribute traffic and reduceintake-head
resource needs. -
A new
terminate_tls_at_lb
tunable allows TLS termination at an Amazon Network Load Balancer, rather than at the Hydrolix internal Traefik instance. More documentation can be found in External TLS on AWS NLB . -
The
overcommit
tunable now allows two settings other than the pre-existingtrue
andfalse
. Setting it torequests
will set all requested resources to 0, while keeping the limits. Setting it tolimits
will keep requests intact, but remove the limits. -
The Hydrolix configuration validator can now be configured with the option to warn upon validation failures, rather than exiting.
-
Cluster configuration information can now be sent to a central fleet database on an hourly basis.
argus_fleet_url
,argus_fleet_table
, andargus_fleet_transform
tunables have been added to support this. -
Introduced intake pool routing, allowing
traefik
reverse proxy to route to pools based on HTTP header values or query parameters. This is useful for cluster configuration of intake pool selection, without reconfiguring the sending software. -
Added tunable
vector_extra_namespaces
so that logs can be collected from namespaces other than the Hydrolix cluster namespace. -
Added support for time-based and scheduled changes to the Hydrolix spec. The new tunable
overrides
allows both cron-like and one-time scaling expressions. -
hdx-scaler
can now filter metrics by attributes of the service that produced them. -
Improved
hdxscaler
user interface by showing tabbed output of only the configured hdxscalers, handling scrolling better and displaying relevant config values. -
Added
log_level
tunable to control HDX scaler logging level -
Allow external Kibana instances to reach the in-cluster Quesma. This requires the reverse proxy to be configured with a wildcard TLS certificate of the form
*.{myhost}.hydrolix.live
.
UI
- Introduced consistent pagination to many endpoints. Related to the introduction of pagination in the Config API in this release.
- Improved friendliness of input fields capable of accepting multiple values. Now, a user can enter a single value without having to enter a delimiter, too.
- In batch jobs UI, adds an editor (disabled) to the ‘view’ sidebar to view the full JSON job object and a copy button.
Bug fixes
API
- Fixed a bug that allowed the
intake_head_url
for a table to point to a deleted transform. - Addressed a potential denial of service attack against storage configurations and batch jobs by tightening up hostname and storage path validation functions.
- Corrected consistency of conversion for boolean data types when constructing views. Now correctly handles elements in a complex data type.
- Closed race conditions on token validity when users' accounts are disabled or deleted. Earlier, a token would incorrectly remain valid.
Cluster operation
- Added a liveness check to the version service. This should prevent an occasional freezing issue with watch operations in the Python Kubernetes client.
- Allowed the
.well-known/acme-challenge
path to be served over TCP/80 to match HTTP validation expectations across all ACME providers. Plaintext HTTP is used by both Lets Encrypt and Buypass. Earlier, Buypass wasn't following redirects to HTTPS. - Disabled visibility of
traefik
dashboards by default.
Core
- Fixed a bug and improved performance when both sides of a filter in a SQL statement are non-constants.
- Fixed broken catalog database retry logic. This prevents
CatalogError
s due to lost database connections and overlapping transactions. - Tightened response to any detected corruption in a partition by skipping all subsequent blocks. The
turbine
andturbine_summary
table functions occasionally display different responses to corruption.
Intake and merge
- Switched to using a concurrency-friendly map for metric data storage and retrieval. This avoids a rare crash scenario.
- Avoided panic on out-of-bounds error when Amazon Data Firehose response error message is shorter than maximum length allowed.
- Fixed a bug where the
merge-controller
wouldn't start if there was anothermerge-controller
or amerge-head
in acompleted
state. - Fixed a bug where the
merge-controller
would become "stuck" due to an internal blocking issue. - Fixed a bug where the
merge-controller
would stop creating partitions after receiving an error from the Postgres catalog database. - Fixed a bug where a project's
rate_limit
was being ignored.
~