28 October 2025 - v5.7.4

Route multiple tables to other intake pools; override container images to specific versions

Notable new features

Intake routing from one pool to multiple tables

  • Using the dynamic ingest routing feature, traffic can now be routed to multiple tables from a single ingest pool.

Per-component container image overrides

  • Users can now specify custom images and tags for various components (Deployments, StatefulSets, Jobs, DaemonSets).

Breaking changes

Hydrolix previously used Buypass to handle automated certificate challenges when a customer specified an ip_allowlist.

Buypass ceased issuing TLS certificates on October 15, 2025. Certificates issued prior to that date will continue to work correctly until expiration. Clusters depending on Buypass certificates will need attention to avoid service disruption.

This change affects all cloud environments with acme_enabled and ip_allowlist configurations. Temporary workarounds:

  • Provide your own certificate in the traefik-tls Kubernetes secret and set acme_enabled to false
  • Use DNS-based challenges by setting issue_wildcard_cert to true (requires extra credentials)
  • Temporarily set the allowlist to 0.0.0.0/0 during certificate requests or renewals

Upgrade instructions

Upgrade and downgrade restrictions

Hydrolix clusters installed initially at the v5.7 series cannot be downgraded below that version.

Clusters originally installed at versions prior to the v5.7 series can be upgraded and downgraded. See upgrade to v5.7.4.

ℹ️

Automatic Keycloak upgrade included

A fully-automated Keycloak upgrade occurs during the cluster upgrade. The Hydrolix operator runs multiple Keycloak instances during the upgrade.

If your cluster runs on a single node, see special instructions in Upgrade to v5.7.4.

Normal production cluster deployments should use standard upgrade instructions in this page.

Apply the new Hydrolix operator

If you have a self-managed installation, apply the new operator directly with the kubectl command examples below. If you're using Hydrolix-supplied tools to manage your installation, follow the procedure prescribed by those tools.

GKE

kubectl apply -f "https://www.hydrolix.io/operator/v5.7.4/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&gcp-storage-sa=${GCP_STORAGE_SA}"

EKS

kubectl apply -f "https://www.hydrolix.io/operator/v5.7.4/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&aws-storage-role=${AWS_STORAGE_ROLE}"

LKE and AKS

kubectl apply -f "https://www.hydrolix.io/operator/v5.7.4/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}"

Monitor the upgrade process

Kubernetes jobs named init-cluster and init-turbine-api will automatically run to upgrade your entire installation to match the new operator's version number. This will take a few minutes, during which time you can observe your pods' restarts with your Kubernetes monitor tool.

Ensure both the init-cluster and init-turbine-api jobs have completed successfully and that the turbine-api pod has restarted without errors. After that, view the UI and use the API of your new installation as a final check.

If the turbine-api pod doesn't restart successfully, or other functionality is missing, check the logs of the init-cluster and init-turbine-api jobs for details about failures. This can be done using the k9s utility or with the kubectl command:

% kubectl logs -l app=init-cluster
% kubectl logs -l app=init-turbine-api

If you need help, contact Hydrolix Support.

Changelog

Updates

These changes include version upgrades and dependency bumps.

Config API updates

  • Upgraded Keycloak to 26.3.0.
  • Pinned google-cloud-storage dependency to version 2.3.0.

Cluster operations updates

  • Updated Chproxy in the operator to 0.5.0.

Intake updates

  • Upgraded tracing-subscriber crate to mitigate ANSI escape sequence injection (GHSA-xwfj-jgwm-7wp5).

UI updates

  • Upgraded Axios to 1.12.2 (and wait-on to 7.2.0) to resolve a high-severity DoS vulnerability (CVE-2025-58754 / GHSA-4hjh-wcwx-xvwj) where large data: URIs could exhaust memory and crash Node.js processes.
  • Upgraded Next.js to 14.2.32.

Improvements

These changes improve behavior, resilience, or usability.

Config API improvements

  • Updated the Config API to reference Storage by its unique, case-insensitive name instead of a cluster-specific ID.
    • Simplifies GitOps workflows by allowing storage and dependent table objects to be created in a single commit without knowing storage UUIDs in advance.
    • storage_map table settings now accept names for default_storage_id, column_value_mapping, and spread_list. Internally, names are resolved to UUIDs.
  • Prevented deletion of a table when a child summary table exists.
  • Auth logs endpoint now accepts a timestamp instead of a date and returns an id for each object. Init now waits for Keycloak to load.
  • Exposed a content hash (etag) of the cluster dictionary file to enable comparison before upload.
  • Improved Gunicorn configuration: reduced timeout to 30s, switched to gevent workers, and made worker count dynamic by CPU.
  • Validated that credentials set on credential_id are compatible with the attached resource.

Cluster operations improvements

  • Removed Buypass certificate support when ACME is enabled.
    Buypass has announced they will stop issuing SSL certificates on October 15, 2025.
    Hydrolix previously used Buypass to handle certificate challenges when a customer specified an ip_allowlist. This release removes Buypass support.
    Impact: Affects all cloud environments with acme_enabled and ip_allowlist configurations.
    Temporary workarounds:

    • Provide your own certificate in the traefik-tls Kubernetes secret and set acme_enabled to false
    • Use DNS-based challenges by setting issue_wildcard_cert to true (requires extra credentials)
    • Temporarily set the allowlist to 0.0.0.0/0 during certificate requests or renewals
  • Fixed cleanup of partitions from deleted projects that were previously skipped.

    SRE note: This change may trigger cleanup of many old partitions on deployment, increasing Reaper and RabbitMQ load temporarily.

  • Changed in-cluster Grafana log format to JSON.

  • Added support for vertical autoscaling in HDX Scaler. This is an optional beta feature and is subject to change
    Ranges for CPU, memory, and ephemeral storage resources are now configurable in the Hydrolix custom resource. When enabled, HDX Scaler adjusts pod limits based on observed usage, reducing the need for manual resource tuning and minimizing risk of OOM events.

  • Added the vector_custom_fields tunable, allowing users to specify custom key/value pairs that are automatically added to all Vector logs. These fields appear in the catchall column in hydro.logs unless they match a non-null field already defined in the transform.
    Vector reloads configuration automatically, no restart required, when this tunable changes.

Core improvements

  • Added HTTP endpoint for memory tracker and jemalloc metrics: http://{myhost}.hydrolix.live/query/memory-tracker.
  • Improved time filter performance by using index information.

Intake improvements

  • Exposed read_timeout and read_header_timeout for HTTP requests as tunables.

Operator improvements

  • Added support for routing to multiple tables from an ingest pool.

  • Added support for overriding container images for any Hydrolix workload.

    • Users can now specify custom images and tags for components (Deployments, StatefulSets, Jobs, DaemonSets).
    • If the image key doesn't include a registry path, it defaults to us-docker.pkg.dev/hdx-art/t.
    • Supports decoupling operator version upgrades from component images by letting users override individual workloads as needed.
    • Added Prometheus annotation support to track image overrides with hydrolix.io/image-override.
  • Improved monitoring for ingestion by adding an intake-pool field to hydro.monitor packets and allowing configuration of which intake pools receive them.

  • Introduced the monitor_ingest_pool_exemptions tunable to exclude specific pools from monitoring.
    Also excluded the default /ingest/event endpoint to reduce noise and improve visibility into real ingest activity.

  • Added lockpart, an SRE CLI for locking/unlocking problematic catalog partitions during remediation. Supports detection using hydro.logs query or explicit -p partition list, dry-run mode, safety caps (--max-changes), and unlock/reactivate flows. Requires kube context/namespace and port-forward access to catalog-rw.

    Usage highlights

    • Detect and lock active partitions from logs (last day): lockpart -C <ctx> -N <ns>
    • Dry run with limits: lockpart -C <ctx> -N <ns> -D 7 -M 25 -n
    • Unlock/reactivate by lock ID: lockpart -C <ctx> -N <ns> -U -R -L <id>
    • List active locks: lockpart -C <ctx> -N <ns> -l

    Notes

    • Lock sets active=false in catalog; verify targets in --dry-run before applying
    • Lock ID 1 is reserved by merge and is rejected
  • Kibana Gateway (formerly Quesma) config now supports multiple tables (additional_tables) instead of a single table.

    Defaults clarified (project hydro, fallback to logs, duplicates removed). Updated Kibana Gateway to v1.1.20 (license check disabled).

  • Introduced the extra_loadbalancers tunable to create or remove additional Traefik load balancer services.
    For example:

    • extra_loadbalancers: 2 → creates traefik, traefik-extra-1, traefik-extra-2
    • Reducing to extra_loadbalancers: 1 removes the highest numbered service, leaving traefik and traefik-extra-1.

UI improvements

  • Improved searchability in Security > Auth Logs by adding more context and data to the log field in the audit log. Also added search field for IP address and made the date range optional.
  • Added conditional region and endpoint fields in batch job creation. For AWS and Linode, only one of region or endpoint must be specified. For GCP and Azure, region must not be specified.
  • Propagated into the UI environment the turbine_api_require_table_default_storage cluster tunable and exposed several related environment variables: ENABLE_PASSWORD_COMPLEXITY_POLICY, PASSWORD_EXPIRATION_POLICY, REQUIRE_TABLE_DEFAULT_STORAGE. These influence the UI behavior directly rather than requiring an API call.

Bug fixes

Config API fixes

  • Fixed bug to avoid overzealous audit log pruning. If no purge date is set, don't purge anything.
  • Corrected incomplete deletion for credential secrets. Earlier, credentials secrets would remain in the Kubernetes Secret user-credentials after deletion.

Core fixes

  • Fixed double counting in memory tracker.

Intake fixes

  • Fixed nil pointer in autoingest when a user deleted an Azure resource.
  • Fixed Kafka client crash with incompletely specified data source. Now, it refuses to use a Kafka data source without a credential ID.
  • Corrected the environment variable name for injecting deployment IDs for usagemeter and periodic service. The envar is now DEPLOYMENT_ID.
  • Decreased row lock contention when the catalog is constrained by adjusting SQL operations for merge cleanup operations to minimize UPSERT commands.

Operator fixes

  • Fixed several computational estimation bugs affecting HDX Scaler's ability to hit a target value. Now, the computations account better for replica count changes and scrape interval jitter.
  • Excluded cloudsqladmin, rdsadmin and a few template PostgreSQL databases during initialization. Services other than the Hydrolix operator use databases in the PostgreSQL cluster.
  • Improved CPU consumption when processing Prometheus metrics in the Horizontal Pod Autoscaler (HPA). HDX Scaler now receives only metrics and labels of interest.
  • Fixes insufficient default permissions on Azure Database for PostgreSQL Flexible Server by granting CREATE and USAGE on the public schema to all database users during cluster initialization.

UI fixes

  • Fixed password change form: error no longer persists after correcting confirmation mismatch, and password inputs clear after successful submission.
  • Fixed an issue where the Kinesis form didn't show an error message when replicas weren't set.
  • Fixed column search filtering for table views to avoid false matches from scattered character sequences. For example, searching “ALL” no longer matches rows like user_admin.
    Updated filtering to use strict substring search (rankings.CONTAINS) instead of loose character matching.
  • Removed the deprecated summary-peer option from the pool creation workflow. This option is no longer supported and was cleaned up to simplify pool setup.