6 October 2025 - v5.6.2
Operator Prometheus metrics, JSON subtype chaining
Notable new features
Prometheus metrics endpoint and status metrics
- The operator now exposes a Prometheus metrics endpoint with initial health and status metrics, such as
o6r_up
ando6r_hdx_ready
.
This enables richer dashboards and alerting to monitor Hydrolix clusters.
Chain JSON subtypes with pretransforms
pretransforms
-
Added the ability to chain JSON subtypes with a new
pretransforms
transform property. This allows processing Amazon CloudWatch logs delivered over Amazon Data Firehose without aid from a preprocessing Lambda. [Team Data]For example, using
"pretransforms": [ "firehose/gzip", "cloudwatch" ]
in theformat_details
of a JSON transform will allow processing of Amazon CloudWatch messages as produced by Amazon Data Firehose.Valid combinations must end with
cloudwatch
ormPulse
as the final subtype.
Supported subtypes are:firehose
firehose/gzip
cloudwatch
mPulse
Upgrade instructions
-
Apply the new Hydrolix operator
If you have a self-managed installation, apply the new operator directly with the
kubectl
command examples below. If you're using Hydrolix-supplied tools to manage your installation, follow the procedure prescribed by those tools.GKEkubectl apply -f "https://www.hydrolix.io/operator/v5.6.2/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&gcp-storage-sa=${GCP_STORAGE_SA}"
EKSkubectl apply -f "https://www.hydrolix.io/operator/v5.6.2/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&aws-storage-role=${AWS_STORAGE_ROLE}"
LKE and AKSkubectl apply -f "https://www.hydrolix.io/operator/v5.6.2/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}"
-
Monitor the upgrade process
Kubernetes jobs named
init-cluster
andinit-turbine-api
will automatically run to upgrade your entire installation to match the new operator's version number. This will take a few minutes, during which time you can observe your pods' restarts with your preferred Kubernetes monitoring tool.Ensure both the
init-cluster
andinit-turbine-api
jobs have completed successfully and that theturbine-api
pod has restarted without errors. After that, view the UI and use the API of your new installation as a final check.If the
turbine-api
pod doesn't restart properly or other functionality is missing, check the logs of theinit-cluster
andinit-turbine-api
jobs for details about failures. This can be done using thek9s
utility or with thekubectl
command:% kubectl logs -l app=init-cluster % kubectl logs -l app=init-turbine-api
If you still need help, contact Hydrolix support.
Rollback considerations
If you need to roll back to the previously released version, roll back to version v5.4.4 and perform database migrations. Any version released before v5.4.4 will cause errors after rollback.
For more information, follow our detailed rollback instructions.
Changelog
Updates
Cluster operations
- Upgraded tokio for
hdx-node
v1.42.0 to library v1.43.1. The fix avoids possible memory corruption for objects passed over the broadcast channel which do not implement bothSend
andSync
. Addresses RUSTSEC-2025-0023.
UI
- Upgraded axios and related libraries to fix an HTTP parameter pollution vulnerability. Addresses CVE-2025-54371.
axios
v1.8.2 → v1.11.0form-data
v4.0.0 → v4.0.4follow-redirects
v1.15.9 → v1.15.11
- Upgraded cypress and related libraries to fix an arbitrary temporary file directory write through a symbolic link vulnerability. Addresses CVE-2025-54798.
cypress
v3.0.8 → v3.0.9json-file
v6.1.0 → v6.2.0tmp
v0.2.3 → v0.2.5types/node
v24.1.0 → v24.3.0undici-types
v7.8.0 → v7.10.0
Improvements
Config API
- Introduced more validation of bucket access credentials and permissions during storage and batch jobs creation. These steps reduce the risk of data loss by preventing definition of resources with inadequate permissions or incorrect credentials.
- Updated the list of SQL reserved words forbidden in project and table names. The additions were
ASC
,BEGIN
,CASE
,CAST
,COMMIT
,CREATE
,CROSS
,DESC
,END
,FOREIGN
,IF
,INTERSECT
,LIMIT
,OFFSET
,PRIMARY
,REVOKE
,ROLLBACK
,SAVEPOINT
,THEN
,TRANSACTION
,TRIGGER
, andWHEN
. - Updated the default maximum time difference between newest and oldest primary rows of a partition. The value
hot_data_max_minutes_per_partition
was 5, and is now 55. - Added header
Cache-Control: max-age=0, no-cache, no-store, must-revalidate
to all Config API responses. This forbids HTTP proxies from caching all responses, some of which could contain sensitive data. - Changed initialization logic for
turbine-api
to avoid retries and surface errors more obviously. Before this change, the retry and restart logic occasionally masked problems like Keycloak initialization failures or database migration failures. - Allowed clients to omit nullable fields for PUT endpoints in the Config API. If a client omits fields, nullable fields are now stored as NULL rather than left untouched.
- Moved authentication audit logging from a PostgreSQL table to a Hydrolix table named
hydro.audit_logs
. This change improves performance of audit log reporting for large clusters and decreases contention on the PostgreSQL database system. - The
hydrolix_url
tunable may now be an IP address rather than a full URL with hostname, allowing easier testing and deployment without DNS available. - Tightened access privileges to the
/auth_logs
endpoint so that only users withview_auth_logs_user
orUser.all
permissions can view this content. - Updated the default transform for the hydro.monitor table to include the field
monitor_request_timestamp
sent from themonitor-ingest
service. - The Config API now enforces stricter validation on resource names to prevent conflicts in Git-based configuration-as-code workflows, especially on case-insensitive filesystems like macOS or Windows.
Cluster Operations
- Introduced a user-defined metric definition tool called
hdx-pg-monitor
. Arbitrary queries run on user-specified frequency to emit metrics into the Prometheus system. This feature allows flexible observability for PostgreSQL settings, and especially catalog state. - Enhanced the
hdx-scaler
command to accept configurable--metric_labels
and--path
options. - Improved the rules for
hdx-scaler
attributes in the Hydrolix Spec Configuration Validator. Fields identifying service names must match existing services, ranges must be valid, and required fields can no longer be omitted. - Added support in Traefik for the use of service account JWT to authenticate to Amazon Data Firehose.
- Introduced status fields on custom resources and health check logic within the Operator to extend visibility into the readiness state of a cluster.
- Enforced exit and error logging for RPC services failing to start up properly to troubleshoot issues with gRPC at startup.
- The
hdx-scaler
now exports configuration and decision metrics through Prometheus. This enables Grafana dashboards that mirror the scaler UI and supports monitoring multiple scaler pods from a single view.
New metrics include:hdxscaler_decision_total
(withreason
)hdxscaler_scrape_errors_total
hdxscaler_scrape_latency_seconds
- Added validating webhook checks for
spec.overrides
. Each override must include:- A timezone
- A patch
- Exactly one schedule (
cron
,weekly
, orwindow
) with valid fields and formats.
The webhook also validates the patch against the HDX spec and returns clear warnings and errors for bad fields or values, preventing invalid configs from being applied.
- Added support for aggregating multiple metrics in
hdx-scaler
. A newop
field controls aggregation (sum
,avg
,min
,max
).
The TUI and webhook validation now support this field, and a metric matching bug has been fixed.
Core
- Introduced automatic data type promotion for numeric SQL functions returning
uint16
orint16
types. Since Hydrolix doesn't support these types, they are promoted touint32
orint32
instead. This is particularly important to support summary table aggregation calculations. - Added a method to invalidate cached tokens when an auth error occurs.
- You can now remove columns from a summary table without deleting and recreating it. This helps avoid costly table rebuilds and backfills for large deployments.
- Set a column to
NULL
in the summary SQL, and it will drop from the summary schema going forward. - Existing aggregates aren’t recomputed, but removed columns can be reintroduced later. Historical rows will return
NULL
.
- Set a column to
- Added built-in, read-only users for Grafana, Quesma, and Superset datasources. This removes the need to create the users manually.
- Credentials are now available in the general secret, similar to the super-admin user.
Intake and Merge
- Routine job
partition-cleaner
now runs weekly, instead of daily, since listings even on large clusters are unnecessary. We also added thebulk_delete_bytes
metric. - Improved datatype model and testing in rust intake system, particularly around indexable data types.
- Added field
pretransform
to transforms, which supports listing multiple JSON subtypes. These subtypes are then applied in order in the intake pipeline, allowing the processing of messages of different sources in the same pipeline. See Notable new features for more information.
UI
- Introduced a search bar to the +Add New menu which limits the items in response to user input. There are many objects to manage in a Hydrolix cluster, and this improves responsiveness and usability, especially on small screens.
- The UI now displays the cloud provider name for each entry in page Security > Credentials. We corrected the detail page title to View credential instead of incorrect Edit credential.
robots.txt
now blocksAdsBot-Google
,AdsBot-Google-Mobile
, andAdIdxBot
. Hydrolix clusters do not serve content intended for web crawler consumption.
Bug Fixes
Cluster operations
- Switched to a more secure temporary file library function to avoid a possible symlink attack in the Traefik reverse proxy configuration setup.
ip_allowlist
is now correctly installed into Linode Kubernetes Engine (LKE) load balancers usingservice.beta.kubernetes.io/linode-loadbalancer-firewall-acl
annotations. They do not usespec.loadBalancerSourceRanges
. This corrects a missing firewall automation feature for the LKE platform.- Fixed
pgMonitor
role creation logic.pgMonitor
now has select and view permissions on all schemas by default. - Correctly installs
ip_allowlist
values into Linode Kubernetes Engine (LKE) load balancers by using theservice.beta.kubernetes.io/linode-loadbalancer-firewall-acl
annotation.
Earlier, specifying anip_allowlist
on LKE did not apply any firewall rules.
Note: Whenacme_enabled
is set, certificate issuance can fail if access is restricted byip_allowlist
because ACME HTTP challenge providers do not publish fixed IP ranges.
A temporary workaround is to allow 0.0.0.0/0 during certificate issuance. This is not a long-term solution; improvements are planned for a future release.
Config API
- Removed deprecated attribute
KeycloakAuthLogPermissions
from user permissions class. - Fixed a bug where deleting a project could fail with a
500
error if a source’s secret had already been removed. Project deletion now succeeds even when related secrets are missing. - Fixed a
500
error when creating or updating transform templates that include a shadow transform.POST /v1/transform_templates
andPUT/PATCH /v1/transform_templates/:id
now validate and save shadow transform settings correctly.
Core
- Fixed an initialization ordering bug that prevented use of custom functions with summary tables. Earlier, the summary table definition would fail with
Unknown function
.
Intake
-
Fixed an issue where a cluster with a Kafka source without a related credential caused the
kafka-peer
to emit an error to the log:{"error":"failed to retrieve credential : credential not found","file":"kafka_source.go:407","level":"error","message":"Unable to authenticate Kafka connection.","timestamp":"2025-09-22T16:57:52.409+00:00"}
The behavior now uses the cluster's credentials if other credentials are omitted, and no error is logged.
UI
- The UI now handles custom column and row delimiters correctly in all situations. Display labels have been refined to use English-language names for these common delimiters:
Comma
,Semicolon
,Tabulation
,Pipe
,Newline
,Return
, andReturn and Newline
. - Summary tables have been exluded from the table selection list for new Table Transform and Transform Template workflows. This avoids presenting the user with an invalid option.