6 October 2025

Operator Prometheus metrics, JSON subtype chaining

Notable new features

Prometheus metrics endpoint and status metrics

The operator now exposes a Prometheus metrics endpoint with initial health and status metrics, such as o6r_up and o6r_hdx_ready.
This enables richer dashboards and alerting to monitor Hydrolix clusters.

Chain JSON subtypes with `pretransforms`

Added the ability to chain JSON subtypes with a new pretransforms transform property. This allows processing Amazon CloudWatch logs delivered over Amazon Data Firehose without aid from a preprocessing Lambda. [Team Data]

For example, using "pretransforms": [ "firehose/gzip", "cloudwatch" ] in the format_details of a JSON transform will allow processing of Amazon CloudWatch messages as produced by Amazon Data Firehose.

Valid combinations must end with cloudwatch or mPulse as the final subtype.
Supported subtypes are:
- firehose
- firehose/gzip
- cloudwatch
- mPulse

Upgrade instructions

Apply the new Hydrolix operator

If you have a self-managed installation, apply the new operator directly with the kubectl command examples below. If you're using Hydrolix-supplied tools to manage your installation, follow the procedure prescribed by those tools.

GKE

kubectl apply -f "https://www.hydrolix.io/operator/v5.6.2/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&gcp-storage-sa=${GCP_STORAGE_SA}"

EKS

kubectl apply -f "https://www.hydrolix.io/operator/v5.6.2/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&aws-storage-role=${AWS_STORAGE_ROLE}"

LKE and AKS

kubectl apply -f "https://www.hydrolix.io/operator/v5.6.2/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}"

Monitor the upgrade process

Kubernetes jobs named init-cluster and init-turbine-api will automatically run to upgrade your entire installation to match the new operator's version number. This will take a few minutes, during which time you can observe your pods' restarts with your preferred Kubernetes monitoring tool.

Ensure both the init-cluster and init-turbine-api jobs have completed successfully and that the turbine-api pod has restarted without errors. After that, view the UI and use the API of your new installation as a final check.

If the turbine-api pod doesn't restart properly or other functionality is missing, check the logs of the init-cluster and init-turbine-api jobs for details about failures. This can be done using the k9s utility or with the kubectl command:
```
% kubectl logs -l app=init-cluster
% kubectl logs -l app=init-turbine-api
```
If you still need help, contact Hydrolix support.

Rollback considerations

If you need to roll back to the previously released version, roll back to version v5.4.4 and perform database migrations. Any version released before v5.4.4 will cause errors after rollback.

For more information, follow our detailed rollback instructions.

Changelog

Updates

Cluster operations

Upgraded tokio for hdx-node v1.42.0 to library v1.43.1. The fix avoids possible memory corruption for objects passed over the broadcast channel which do not implement both Send and Sync. Addresses RUSTSEC-2025-0023.

UI

Upgraded axios and related libraries to fix an HTTP parameter pollution vulnerability. Addresses CVE-2025-54371.
- axios v1.8.2 → v1.11.0
- form-data v4.0.0 → v4.0.4
- follow-redirects v1.15.9 → v1.15.11
Upgraded cypress and related libraries to fix an arbitrary temporary file directory write through a symbolic link vulnerability. Addresses CVE-2025-54798.
- cypress v3.0.8 → v3.0.9
- json-file v6.1.0 → v6.2.0
- tmp v0.2.3 → v0.2.5
- types/node v24.1.0 → v24.3.0
- undici-types v7.8.0 → v7.10.0

Improvements

Config API

Introduced more validation of bucket access credentials and permissions during storage and batch jobs creation. These steps reduce the risk of data loss by preventing definition of resources with inadequate permissions or incorrect credentials.
Updated the list of SQL reserved words forbidden in project and table names. The additions were ASC, BEGIN, CASE, CAST, COMMIT, CREATE, CROSS, DESC, END, FOREIGN, IF, INTERSECT, LIMIT, OFFSET, PRIMARY, REVOKE, ROLLBACK, SAVEPOINT, THEN, TRANSACTION, TRIGGER, and WHEN.
Updated the default maximum time difference between newest and oldest primary rows of a partition. The value hot_data_max_minutes_per_partition was 5, and is now 55.
Added header Cache-Control: max-age=0, no-cache, no-store, must-revalidate to all Config API responses. This forbids HTTP proxies from caching all responses, some of which could contain sensitive data.
Changed initialization logic for turbine-api to avoid retries and surface errors more obviously. Before this change, the retry and restart logic occasionally masked problems like Keycloak initialization failures or database migration failures.
Allowed clients to omit nullable fields for PUT endpoints in the Config API. If a client omits fields, nullable fields are now stored as NULL rather than left untouched.
Moved authentication audit logging from a PostgreSQL table to a Hydrolix table named hydro.audit_logs. This change improves performance of audit log reporting for large clusters and decreases contention on the PostgreSQL database system.
The hydrolix_url tunable may now be an IP address rather than a full URL with hostname, allowing easier testing and deployment without DNS available.
Tightened access privileges to the /auth_logs endpoint so that only users with view_auth_logs_user or User.all permissions can view this content.
Updated the default transform for the hydro.monitor table to include the field monitor_request_timestamp sent from the monitor-ingest service.
The Config API now enforces stricter validation on resource names to prevent conflicts in Git-based configuration-as-code workflows, especially on case-insensitive filesystems like macOS or Windows.

Cluster Operations

Introduced a user-defined metric definition tool called hdx-pg-monitor. Arbitrary queries run on user-specified frequency to emit metrics into the Prometheus system. This feature allows flexible observability for PostgreSQL settings, and especially catalog state.
Enhanced the hdx-scaler command to accept configurable --metric_labels and --path options.
Improved the rules for hdx-scaler attributes in the Hydrolix Spec Configuration Validator. Fields identifying service names must match existing services, ranges must be valid, and required fields can no longer be omitted.
Added support in Traefik for the use of service account JWT to authenticate to Amazon Data Firehose.
Introduced status fields on custom resources and health check logic within the Operator to extend visibility into the readiness state of a cluster.
Enforced exit and error logging for RPC services failing to start up properly to troubleshoot issues with gRPC at startup.
The hdx-scaler now exports configuration and decision metrics through Prometheus. This enables Grafana dashboards that mirror the scaler UI and supports monitoring multiple scaler pods from a single view.
New metrics include:
- hdxscaler_decision_total (with reason)
- hdxscaler_scrape_errors_total
- hdxscaler_scrape_latency_seconds
Added validating webhook checks for spec.overrides. Each override must include:
- A timezone
- A patch
- Exactly one schedule (cron, weekly, or window) with valid fields and formats.
  The webhook also validates the patch against the HDX spec and returns clear warnings and errors for bad fields or values, preventing invalid configs from being applied.
Added support for aggregating multiple metrics in hdx-scaler. A new op field controls aggregation (sum, avg, min, max).
The TUI and webhook validation now support this field, and a metric matching bug has been fixed.

Core

Introduced automatic data type promotion for numeric SQL functions returning uint16 or int16 types. Since Hydrolix doesn't support these types, they are promoted to uint32 or int32 instead. This is particularly important to support summary table aggregation calculations.
Added a method to invalidate cached tokens when an auth error occurs.
You can now remove columns from a summary table without deleting and recreating it. This helps avoid costly table rebuilds and backfills for large deployments.
- Set a column to NULL in the summary SQL, and it will drop from the summary schema going forward.
- Existing aggregates aren’t recomputed, but removed columns can be reintroduced later. Historical rows will return NULL.
Added built-in, read-only users for Grafana, Quesma, and Superset datasources. This removes the need to create the users manually.
- Credentials are now available in the general secret, similar to the super-admin user.

Intake and Merge

Routine job partition-cleaner now runs weekly, instead of daily, since listings even on large clusters are unnecessary. We also added the bulk_delete_bytes metric.
Improved datatype model and testing in rust intake system, particularly around indexable data types.
Added field pretransform to transforms, which supports listing multiple JSON subtypes. These subtypes are then applied in order in the intake pipeline, allowing the processing of messages of different sources in the same pipeline. See Notable new features for more information.

UI

Introduced a search bar to the +Add New menu which limits the items in response to user input. There are many objects to manage in a Hydrolix cluster, and this improves responsiveness and usability, especially on small screens.
The UI now displays the cloud provider name for each entry in page Security > Credentials. We corrected the detail page title to View credential instead of incorrect Edit credential.
robots.txt now blocks AdsBot-Google, AdsBot-Google-Mobile, and AdIdxBot. Hydrolix clusters do not serve content intended for web crawler consumption.

Bug Fixes

Cluster operations

Switched to a more secure temporary file library function to avoid a possible symlink attack in the Traefik reverse proxy configuration setup.
ip_allowlist is now correctly installed into Linode Kubernetes Engine (LKE) load balancers using service.beta.kubernetes.io/linode-loadbalancer-firewall-acl annotations. They do not use spec.loadBalancerSourceRanges. This corrects a missing firewall automation feature for the LKE platform.
Fixed pgMonitor role creation logic. pgMonitor now has select and view permissions on all schemas by default.
Correctly installs ip_allowlist values into Linode Kubernetes Engine (LKE) load balancers by using the service.beta.kubernetes.io/linode-loadbalancer-firewall-acl annotation.
Earlier, specifying an ip_allowlist on LKE did not apply any firewall rules.
Note: When acme_enabled is set, certificate issuance can fail if access is restricted by ip_allowlistbecause ACME HTTP challenge providers do not publish fixed IP ranges.
A temporary workaround is to allow 0.0.0.0/0 during certificate issuance. This is not a long-term solution; improvements are planned for a future release.

Config API

Removed deprecated attribute KeycloakAuthLogPermissions from user permissions class.
Fixed a bug where deleting a project could fail with a 500 error if a source’s secret had already been removed. Project deletion now succeeds even when related secrets are missing.
Fixed a 500 error when creating or updating transform templates that include a shadow transform. POST /v1/transform_templates and PUT/PATCH /v1/transform_templates/:id now validate and save shadow transform settings correctly.

Core

Fixed an initialization ordering bug that prevented use of custom functions with summary tables. Earlier, the summary table definition would fail with Unknown function.

Intake

Fixed an issue where a cluster with a Kafka source without a related credential caused the kafka-peer to emit an error to the log:
```
{"error":"failed to retrieve credential : credential  not found","file":"kafka_source.go:407","level":"error","message":"Unable to authenticate Kafka connection.","timestamp":"2025-09-22T16:57:52.409+00:00"}
```
The behavior now uses the cluster's credentials if other credentials are omitted, and no error is logged.

UI

The UI now handles custom column and row delimiters correctly in all situations. Display labels have been refined to use English-language names for these common delimiters: Comma, Semicolon, Tabulation, Pipe, Newline, Return, and Return and Newline.
Summary tables have been exluded from the table selection list for new Table Transform and Transform Template workflows. This avoids presenting the user with an invalid option.

Notable new features

Prometheus metrics endpoint and status metrics

Chain JSON subtypes with pretransforms

Upgrade instructions

GKE

EKS

LKE and AKS

Rollback considerations

Changelog

Updates

Cluster operations

UI

Improvements

Config API

Cluster Operations

Core

Intake and Merge

UI

Bug Fixes

Cluster operations

Config API

Core

Intake

UI

Chain JSON subtypes with `pretransforms`