Skip to content

v6.0.8

Notable new features⚓︎

Syslog ingest⚓︎

Added an optional syslog ingest service for receiving syslog messages into Hydrolix tables over mutual TLS (mTLS). Currently, this is only for TrafficPeak customers. The service accepts RFC 5424 and RFC 3164 syslog, authenticates mTLS clients against a customer-supplied CA, and routes each connection to a destination table based on the client certificate's Common Name. Configuration lives in the HydrolixCluster spec under spec.syslog_ingest, and routing can be updated without a pod restart. Disabled by default.

Breaking changes⚓︎

If your cluster contains user accounts that predate the introduction of RBAC in v4.6 (early 2024) depending on the is_superuser flag, ensure that the accounts are granted appropriate permissions. To assign permissions identical to the removed is_superuser flag, add the role super_admin to the account.

  • The legacy merge-head daemon has been removed. If your configuration explicitly uses merge-head, migrate to merge-controller before upgrading to v6.0. merge-controller, originally introduced in v5.3, has been the default for all clusters since v5.10, when merge-head was deprecated.

Clusters that still explicitly configure merge-head services or pools must migrate to merge-controller before upgrading. The merge_head_batch_size tunable has been removed along with the daemon.

Upgrade instructions⚓︎

Don't skip minor versions when upgrading or downgrading

Skipping versions when upgrading or downgrading Hydrolix can result in database schema inconsistencies and cluster instability. Always upgrade or downgrade sequentially through each minor version.

Example:
Upgrade from 5.9.75.10.95.11.8, not 5.9.75.11.8.

Apply the new Hydrolix operator⚓︎

If you have a self-managed installation, apply the new operator directly with the kubectl command examples below. If you're using Hydrolix-supplied tools to manage your installation, follow the procedure prescribed by those tools.

GKE⚓︎

Apply Operator on GKE
kubectl apply -f "https://www.hydrolix.io/operator/v6.0.8/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&gcp-storage-sa=${GCP_STORAGE_SA}"

EKS⚓︎

Apply Operator on EKS
kubectl apply -f "https://www.hydrolix.io/operator/v6.0.8/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&aws-storage-role=${AWS_STORAGE_ROLE}"

LKE and AKS⚓︎

Apply Operator on LKE and AKS
kubectl apply -f "https://www.hydrolix.io/operator/v6.0.8/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}"

Monitor the upgrade process⚓︎

Kubernetes jobs named init-cluster and init-turbine-api will automatically run to upgrade your entire installation to match the new operator's version number. This will take a few minutes, during which time you can observe your pods' restarts with your Kubernetes monitor tool.

Ensure both the init-cluster and init-turbine-api jobs have completed successfully and that the turbine-api pod has restarted without errors. After that, view the UI and use the API of your new installation as a final check.

If the turbine-api pod doesn't restart successfully, or other functionality is missing, check the logs of the init-cluster and init-turbine-api jobs for details about failures. This can be done using the k9s utility or with the kubectl command:

% kubectl logs -l app=init-cluster
% kubectl logs -l app=init-turbine-api

If you still need help, contact Hydrolix support.

Downgrade restrictions⚓︎

If you create service accounts with names longer than 64 characters in v6.0, you must remove or rename those service accounts before downgrading to v5.11.

Service account names are limited to 64 characters in v5.11, so the downgrade migration will fail if any longer names exist.

If you need to downgrade from v6.0 to v5.11, run the schema rollback before applying the v5.11 operator:

  1. Connect to the Kubernetes cluster using k9s and select the namespace holding the Hydrolix cluster.
  2. Select the turbine-api pod and turbine-api container.
  3. Invoke a shell by entering s.
  4. Run the following command, which produces STDERR to the terminal and must exit cleanly:

    ./manage.py release_5_11
    
  5. Exit k9s by pressing Ctrl-C.

Then apply the v5.11 operator using the standard kubectl apply procedure shown above, substituting the v5.11 operator URL.

Changelog⚓︎

Updates⚓︎

Cluster operations updates⚓︎

  • Updated default HTTP proxy (Chproxy) version to 0.6.3.

  • Updated Python dependencies across operator-managed services to address security findings:

  • Updated the following dependencies across operator-managed services:

    • lego v4.31.0 → v4.34.0 (Go)
    • OpenTelemetry SDK v1.39.0 → v1.43.0 (Go)
    • matplotlib 3.9.4 → 3.10.9 (Python)
    • pillow 11.3.0 → 12.2.0 (Python)
    • pytest 8.4.2 → 9.0.3 (Python)
    • pygments 2.18.0–2.19.2 → 2.20.0 (Python)
    • openssl 0.10.72 → 0.10.78 (Rust)
    • once_cell 1.20.2–1.21.3 → 1.21.4 (Rust)
    • rand 0.8.5 → 0.8.6 (Rust)
    • Dockerfiles now validate sha512 checksums for certain downloaded packages.
  • Bumped the Go version from 1.24 → 1.25 for hdx-scaler-go and hdx-traefik-auth. Go 1.24 is no longer receiving security updates from the Go team.

  • Hardened operator-managed container images. Dockerfiles now use non-root users where possible, validate sha512 checksums for downloaded artifacts, and pin base images to specific tags from the Hydrolix CI base-image registry instead of the latest tag.

  • Bumped the operator-managed Kibana Gateway from v1.1.23 to v2.0.0. Operator-managed clusters upgrade automatically with the v6.0 operator. Kibana Gateway deployments now use the multi-project projects field (default ["hydro"]) where they previously used the single-project project / table / additional_tables fields, and the operator-generated deployment switched from QUESMA_* to KGW_* environment variable names internally.

Config API updates⚓︎

  • Updated third-party Python dependencies for the Config API:

Improvements⚓︎

Core improvements⚓︎

  • Tables with multiple JSON columns benefit from improved compression at ingest, since schema-based sorting now operates across all JSON columns in a block, not only the first.

  • Queries on deeply nested paths run faster and use less memory, since path-based subgrouping now recurses through multiple nesting levels and reads only the relevant subgroup instead of materializing the whole column. Earlier, only root-level JSON columns were subdivided.

  • Implemented path syntax translation for array-of-JSON columns. ClickHouse subcolumn syntax is now normalized to the Hydrolix [*] convention. JSON column datatype access now works correctly for matching, subgroup selectivity, and index lookup.

  • Added the hdx_allow_experimental_analyzer query option (default off) to opt into ClickHouse's new query analyzer. This preview has known limitations: column-level privileges may not be enforced on alias-expanded columns, and queries against summary tables or with cross-table IN subqueries automatically fall back to the legacy analyzer.

UI improvements⚓︎

  • Introduced sortable display columns in the Jobs section of the Hydrolix UI. Alter and batch jobs are now sortable on creation time or modification time.

  • Added a Transform payload panel to the transform editor page, allowing users to switch between raw JSON editing and form input modes. Users can now edit or paste a JSON payload definition of a transform before publishing from the Hydrolix UI.

  • Added searchability to the Spread List Storages text entry box in the Table - bucket settings left flyout. This improves discoverability for storage locations in larger clusters. Also, corrected several dialogs to use the standard display label "File name" instead of "Filename."

Config API improvements⚓︎

  • Introduced an endpoint to allow applications to check permissions for an authenticated user. The endpoint /config/v1/users/check_perm/ requires a permission code name (for example, add_table) to check and allows optional scope parameters for flexibility. The boolean response indicates True when the user has the permission and False otherwise. This is an informational endpoint accessible only to authenticated accounts.

  • Removed the legacy is_superuser property from all account objects. The feature predated and allowed circumvention of the RBAC system.

  • Added a new /config/v1/orgs/{org_id}/config_blob_anomaly/ endpoint that returns a JSON summary of the published anomaly detector configuration for an org. The endpoint mirrors the existing /config/v1/orgs/{org_id}/config_blob/ endpoint and uses the same org-level RBAC permissions.

  • Increased the maximum length of service account names from 64 to 256 characters. Service accounts with names longer than 64 characters block automatic downgrade to v5.11; remove any long-named service accounts before downgrading.

  • Added support for column descriptions in tables. Column descriptions are stored alongside other column metadata and appear on the autoview when present.

  • Added service account token tracking to the Config API. New endpoints list (GET /service_account_tokens/, GET /service_accounts/{uuid}/tokens/) and revoke (DELETE /service_account_tokens/{uuid}/, DELETE /service_accounts/{uuid}/tokens/) tokens, and each service account response now includes an untracked_token_count for tokens issued before the upgrade. The renamed permission tokens_serviceaccountissue_serviceaccounttoken is migrated automatically on upgrade.

  • The Kinesis checkpointer configuration now accepts GCP Datastore URLs. The API validates and stores Datastore-style checkpoint URLs alongside the existing options.

  • Primary columns can now be marked as virtual. Previously, the API didn't allow marking a primary column as virtual.

  • Enabled health checks on turbine-api and worker PostgreSQL connections. The API now detects and recovers from stale database connections without requiring a application restart.

Intake improvements⚓︎

  • Removed the legacy merge-head daemon, which merge-controller has replaced. merge-controller was introduced in v5.3 and became the default in v5.10. Retiring merge-head removes obstacles to improving the summary-table system with backfill support.

  • Added HTTP/2 and TCP keepalive to merge-controller to detects merge-peer disappearance and unreachability.

Cluster Operations improvements⚓︎

  • Avoided pgbouncer pod restarts when database users are added or changed. PgBouncer now authenticates users through a lookup function in a dedicated pgbouncer_auth database, rather than reading individual passwords from the pg-users Kubernetes secret. The new pgbouncer_auth_type tunable selects the authentication type to match the database server's configuration; valid values are scram-sha-256 (default), md5, and plain.

  • Bumped the operator-managed Grafana default image version from grafana/grafana-enterprise:12.3.1 to grafana/grafana-enterprise:12.4.2. Includes Grafana Labs security updates announced in the v12.4.0+security-01 release.

  • Added per-service engine: python|shadow|go field on the hdx-scaler spec (default python) to support rolling migration from the Python hdx-scaler to the Go hdx-scaler-go. The shadow mode runs both scalers in parallel with the Go scaler in dry-run, so its decisions can be compared against the live Python scaler without affecting workloads.

  • Enabled Prometheus metrics on the operator-managed MCP server by default, exposing /metrics on the MCP service's port (8000). Disable with metrics_enabled: false in the mcp_hydrolix spec; multi-worker deployments use a shared PROMETHEUS_MULTIPROC_DIR volume to aggregate metrics across gunicorn workers.

  • Enabled Prometheus metrics on the operator-managed Anomaly Detection service by default, exposing /metrics on port 9090 with an automatic Service and ServiceMonitor for in-cluster scraping. Disable with metrics_enabled: false (or change the port with metrics_port) in the hdx_anomaly_detection spec.

  • Added the prometheus_extra_args tunable, which accepts a list of CLI flags to pass to the Prometheus binary. This lets operators configure flags such as --storage.tsdb.max-block-duration without rebuilding images or manually editing manifests. User-supplied flags that match wrapper-set defaults take precedence, preventing duplicate-flag startup errors.

Security improvements⚓︎

  • Added SQL parameter sanitization across multiple UI query builders to prevent potential SQL injection through string-based query concatenation. Includes new assertSafeSqlIdentifier, assertSafeInteger, and assertSafeSqlExpression validators applied to alias test SQL, summary table analysis, column analysis, and project health summary queries.

Bug Fixes⚓︎

Core fixes⚓︎

  • Blocked all non-deterministic SQL functions from use anywhere in summary table SQL. The table function hdx_summary_verify now inspects all SQL clauses for any non-deterministic functions. Now the GROUP BY and SELECT clauses are checked, not only the WHERE clause. Earlier, summary table SQL definition with some non-deterministic SQL functions could cause crashes.

  • Fixed a race during storage configuration updates that could cause transient errors during credential rotations or storage reconfigurations.

  • Fixed intermittent segfault when reading JSON subpath queries. The fix re-orders mmap() results to match the original request order.

  • Fixed race condition related to dictionary loading causing CANNOT_PARSE_INPUT_ASSERTION_FAILED and Attempt to read after eof errors during intake-head startup.

  • Fixed WHERE filters on date columns returning empty results when compared against expressions such as toDate(now()), date_add(...), or makeDate(...). Earlier, both equality and BETWEEN comparisons against these expressions failed to match any rows.

  • Prevented timestamp queries which specified a single primary timestamp from producing an empty time filter. When hdx_query_timerange_required was enabled, these queries were rejected because the timestamp-range check returned zero for equality comparisons; the check now uses the effective range.

  • Improved HTTP error messages to include the full request URL while redacting presigned query parameters, making storage errors easier to diagnose without leaking credentials.

Intake fixes⚓︎

  • Added missing support for decompression and Base64 decoding for JSON payloads to the validator endpoint used by the Hydrolix UI. Now, the behavior matches the intake-head. Also accepts bzip as a synonym for bzip2 in the Content-Encoding header.

  • Fixed a bug preventing reaper and partition-cleaner services from deleting inactive partitions when the partitions contain many additional files. When the UPLOAD_ALL debugging environment variable is set, all contributing data files are uploaded to partitions. Affected partitions are now deleted, and remaining unknown filenames produce a warning in the log rather than being silently skipped.

  • Fixed a bug where the reaper would nack a message that repeatedly failed processing (a "poison pill"), causing it to be re-queued and reprocessed indefinitely. The reaper now logs a warning for these messages instead.

Config API fixes⚓︎

  • Tightened protections on accounts used for cluster-internal communications. Blocked deletion of these internal user accounts and prevented modification of their roles. Affected account names are recognizable by the string prefix internal.

  • Corrected the refresh interval and grace period for internal service account tokens. Tokens now expire one week and 12 hours after issuance, matching the intended rotation cadence. Earlier, two related bugs left tokens valid for roughly twice as long: the grace period (intended for brief old/new token overlap during refresh) was set to a week plus 12 hours instead of 12 hours, and the refresh task used that grace value as its scheduling interval.

  • Corrected the API to resolve and return the service account username in the created_by_user field for jobs created with a service account token. Previously, the username value was returned as null.

  • Fixed a bug where database migrations could run out of order. RBAC migrations are now applied using multiple sequential operations to ensure correct ordering.

Anomaly Detection fixes⚓︎

  • Tightened Anomaly Detection job config validation to require column names appear as standalone words in the query_template, not as substrings. Previously, configs like metric_keys=["cdn"] would silently pass validation if the query referenced an unrelated column such as cdn_extra.