29 July 2025 - v5.4.0

Column aliasing, service accounts, pagination in Config API, database connection pooling support


Notable new features

Column aliasing

Column aliasing allows the definition of columns as the result of a calculation from other columns or queries. The underlying partition doesn't contain the data. To use a column alias, at least one view must refer to the defined alias column.

Service accounts

Service Accounts provide variable-lifetime, revocable, long-lived access tokens for programmatic use. This feature facilitates authentication for automated workflows. Service accounts, tokens, and associated roles can be created, modified, and deleted in the UI under Security > Service Accounts. Tokens and names associated with service accounts created prior to release v5.4 will not work. See Breaking changes.

Pagination in Config API

The Config API implements fully-featured, consistent pagination and a different response object to support pagination details. Client applications will need to adapt to the pagination schemes for most endpoints. All services inside a Hydrolix cluster are already migrated to the new pagination styles. See Breaking changes.

Option for database connection pooling

Introduced pgbouncer, a lightweight database connection pooling application, into the cluster along with controlling tunables. It isn't enabled by default.


Breaking changes

Pagination in Config API

Pagination is now required for interacting with most Config API endpoints. Affected endpoints include those for

  • catalog endpoints, which switched to cursor pagination
  • viewing authentication logs
  • managing storages, projects, tables, jobs, credentials, and dictionaries
  • listing users, roles, tasks, activity, transform templates, and invites

All internal cluster activity now uses the paginated API calls.

For a detailed description see Pagination Change in v5.4.

Earlier service account tokens invalidated

Prior releases included early support for service accounts and tokens. Tokens issued to those accounts cease working. These older tokens had a static lifetime (TTL) and were not revocable – they are incompatible with the variable lifetime, revocable service account tokens.

Service accounts created prior to the v5.4 release will also lose their presentational name.


Upgrade instructions

GKE

kubectl apply -f "https://www.hydrolix.io/operator/v5.4.0/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&gcp-storage-sa=${GCP_STORAGE_SA}"

EKS

kubectl apply -f "https://www.hydrolix.io/operator/v5.4.0/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&aws-storage-role=${AWS_STORAGE_ROLE}"

LKE

kubectl apply -f "https://www.hydrolix.io/operator/v5.4.0/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}"

Rollback instructions

See rollback instructions in Upgrade to v5.4.x.

Changelog

Updates

These changes include version upgrades and internal dependency bumps.

Intake

Cluster Operations

  • Upgraded traefik reverse proxy from v2.11 to v3.4.
  • Upgraded Quesma from v1.0.2 to v1.1.12.
  • Upgraded kopf from v1.37.1 to v1.38.0.

Improvements

Core

  • Supported TLS termination for MySQL interface. MySQL server implementation requires TLS termination to occur on the query head, not in the traefik reverse proxy.
  • Removed build information from log messages, and changed from srncpy() to memcpy() to address security concerns.

API

  • Added no-cache, no-store, and must-revalidate headers to responses containing authentication cookies. This addresses a potential security issue by advising intermediate proxies to avoid caching.
  • The /config/v1/orgs/{org_id}/summary/ API endpoint now supports a show_summary_tables query parameter that can be set to true or false. If set to false, only non-summary tables will be returned.

Intake

  • The intake-head now implements a graceful backoff and more tolerant retry strategy for catalog inserts.

Merge

  • The merge-controller now searches for merge candidates in a much more efficient manner, dramatically reducing CPU use in certain cases.
  • The merge-controller exposes some metrics, such as table efficiency and partition memory distribution, that require expensive queries applied to the catalog database. To reduce load on the catalog, those queries now happen less frequently than before.
  • Improved configurability of merge-controller by allowing tunables to specify maximum partitions to be combined and maximum candidates. Introduces a new tunable merge_max_partitions.

Cluster operation

  • Hydrolix-managed Keycloak users can now authenticate via SSO. When visiting https://{myhost}.hydrolix.live/grafana/login, use the new "Sign in with Hydrolix" button.

  • For Kibana access, Kibana-configured users can now be used instead of Hydrolix users. Set the new kibana_security_enabled tunable to enable this feature, and use the hdx-elastic-user Kubernetes secret for the admin user. Once logged in, create new users with Stack Management -> Security -> Users, and make sure those users have the monitoring_user role assigned.

  • Introduced production scaling properties to support the highly-available chproxy HTTP proxy.

  • Changed support for chproxy to allow control over version in tunable http_proxy. Also extracted chproxy to external repository.

  • chproxy can now use Redis as a cache back-end with the mode: redis and other http_proxy configuration options. The Redis server is not shipped with the Hydrolix cluster. More information can be found in the Tunables List.

  • Implemented the "power of two random choices" algorithm for the stream_load_balancer_algorithm tunable. Set to p2c to enable this alternative to the default round-robin (rr) option. With p2c two servers are selected at random, assigning the current request to the server with the fewest active connections. This could more evenly distribute traffic and reduce intake-head resource needs.

  • A new terminate_tls_at_lb tunable allows TLS termination at an Amazon Network Load Balancer, rather than at the Hydrolix internal Traefik instance. More documentation can be found in External TLS on AWS NLB .

  • The overcommit tunable now allows two settings other than the pre-existing true and false. Setting it to requests will set all requested resources to 0, while keeping the limits. Setting it to limits will keep requests intact, but remove the limits.

  • The Hydrolix configuration validator can now be configured with the option to warn upon validation failures, rather than exiting.

  • Cluster configuration information can now be sent to a central fleet database on an hourly basis. argus_fleet_url, argus_fleet_table, and argus_fleet_transform tunables have been added to support this.

  • Introduced intake pool routing, allowing traefik reverse proxy to route to pools based on HTTP header values or query parameters. This is useful for cluster configuration of intake pool selection, without reconfiguring the sending software.

  • Added tunable vector_extra_namespaces so that logs can be collected from namespaces other than the Hydrolix cluster namespace.

  • Added support for time-based and scheduled changes to the Hydrolix spec. The new tunable overrides allows both cron-like and one-time scaling expressions.

  • hdx-scaler can now filter metrics by attributes of the service that produced them.

  • Improved hdxscaler user interface by showing tabbed output of only the configured hdxscalers, handling scrolling better and displaying relevant config values.

  • Added log_level tunable to control HDX scaler logging level

  • Allow external Kibana instances to reach the in-cluster Quesma. This requires the reverse proxy to be configured with a wildcard TLS certificate of the form *.{myhost}.hydrolix.live.

UI

  • Introduced consistent pagination to many endpoints. Related to the introduction of pagination in the Config API in this release.
  • Improved friendliness of input fields capable of accepting multiple values. Now, a user can enter a single value without having to enter a delimiter, too.
  • In batch jobs UI, adds an editor (disabled) to the ‘view’ sidebar to view the full JSON job object and a copy button.

Bug fixes

API

  • Fixed a bug that allowed the intake_head_url for a table to point to a deleted transform.
  • Addressed a potential denial of service attack against storage configurations and batch jobs by tightening up hostname and storage path validation functions.
  • Corrected consistency of conversion for boolean data types when constructing views. Now correctly handles elements in a complex data type.
  • Closed race conditions on token validity when users' accounts are disabled or deleted. Earlier, a token would incorrectly remain valid.

Cluster operation

  • Added a liveness check to the version service. This should prevent an occasional freezing issue with watch operations in the Python Kubernetes client.
  • Allowed the .well-known/acme-challenge path to be served over TCP/80 to match HTTP validation expectations across all ACME providers. Plaintext HTTP is used by both Lets Encrypt and Buypass. Earlier, Buypass wasn't following redirects to HTTPS.
  • Disabled visibility of traefik dashboards by default.

Core

  • Fixed a bug and improved performance when both sides of a filter in a SQL statement are non-constants.
  • Fixed broken catalog database retry logic. This prevents CatalogErrors due to lost database connections and overlapping transactions.
  • Tightened response to any detected corruption in a partition by skipping all subsequent blocks. The turbine and turbine_summary table functions occasionally display different responses to corruption.

Intake and merge

  • Switched to using a concurrency-friendly map for metric data storage and retrieval. This avoids a rare crash scenario.
  • Avoided panic on out-of-bounds error when Amazon Data Firehose response error message is shorter than maximum length allowed.
  • Fixed a bug where the merge-controller wouldn't start if there was another merge-controller or a merge-head in a completed state.
  • Fixed a bug where the merge-controller would become "stuck" due to an internal blocking issue.
  • Fixed a bug where the merge-controller would stop creating partitions after receiving an error from the Postgres catalog database.
  • Fixed a bug where a project's rate_limit was being ignored.
    ~