v5.6.2
Operator Prometheus metrics, JSON subtype chaining
Notable new features⚓︎
Prometheus metrics endpoint and status metrics⚓︎
- The operator now exposes a Prometheus metrics endpoint with initial health and status metrics, such as
o6r_upando6r_hdx_ready. Visit Prometheus Operator Metrics for the list of available metrics.
This enables richer dashboards and alerting to monitor Hydrolix clusters.
Chain JSON subtypes with pretransforms⚓︎
- Added the ability to chain JSON subtypes with a new
pretransformstransform property This allows processing Amazon CloudWatch logs delivered over Amazon Data Firehose without aid from a preprocessing Lambda.
For example, using "pretransforms": [ "firehose/gzip", "cloudwatch" ] in the format_details of a JSON transform will allow processing of Amazon CloudWatch messages as produced by Amazon Data Firehose.
Valid combinations must end with cloudwatch or mPulse as the final subtype.
Supported subtypes are:
firehosefirehose/gzipcloudwatchmPulse
Upgrade instructions⚓︎
⚠️ Do not skip release versions.
Skipping versions during upgrades may result in system instability. You must upgrade sequentially through each release version.
Example: Upgrade from 5.4 → 5.5 → 5.6, not 5.4 → 5.6
- Apply the new Hydrolix operator
If you have a self-managed installation, apply the new operator directly with the kubectl command examples below. If you're using Hydrolix-supplied tools to manage your installation, follow the procedure prescribed by those tools.
### GKE
### EKS
### LKE and AKS
- Monitor the upgrade process
Kubernetes jobs named init-cluster and init-turbine-api will automatically run to upgrade your entire installation to match the new operator's version number. This will take a few minutes, during which time you can observe your pods' restarts with your preferred Kubernetes monitoring tool.
Ensure both the init-cluster and init-turbine-api jobs have completed successfully and that the turbine-api pod has restarted without errors. After that, view the UI and use the API of your new installation as a final check.
If the turbine-api pod doesn't restart properly or other functionality is missing, check the logs of the init-cluster and init-turbine-api jobs for details about failures. This can be done using the k9s utility or with the kubectl command:
If you still need help, contact Hydrolix support.
Rollback considerations⚓︎
If you need to roll back to the previously released version, roll back to version v5.4.4 and perform database migrations. Any version released before v5.4.4 will cause errors after rollback.
For more information, follow our detailed rollback instructions.
Changelog⚓︎
Updates⚓︎
Cluster operations⚓︎
- Upgraded tokio for
hdx-nodev1.42.0 to library v1.43.1. The fix avoids possible memory corruption for objects passed over the broadcast channel which do not implement bothSendandSync. Addresses RUSTSEC-2025-0023.
UI⚓︎
- Upgraded axios and related libraries to fix an HTTP parameter pollution vulnerability. Addresses CVE-2025-54371.
axiosv1.8.2 → v1.11.0form-datav4.0.0 → v4.0.4follow-redirectsv1.15.9 → v1.15.11- Upgraded cypress and related libraries to fix an arbitrary temporary file directory write through a symbolic link vulnerability. Addresses CVE-2025-54798.
cypressv3.0.8 → v3.0.9json-filev6.1.0 → v6.2.0tmpv0.2.3 → v0.2.5types/nodev24.1.0 → v24.3.0undici-typesv7.8.0 → v7.10.0
Improvements⚓︎
Config API⚓︎
- Introduced more validation of bucket access credentials and permissions during storage and batch jobs creation. These steps reduce the risk of data loss by preventing definition of resources with inadequate permissions or incorrect credentials.
- Updated the list of SQL reserved words forbidden in project and table names. The additions were
ASC,BEGIN,CASE,CAST,COMMIT,CREATE,CROSS,DESC,END,FOREIGN,IF,INTERSECT,LIMIT,OFFSET,PRIMARY,REVOKE,ROLLBACK,SAVEPOINT,THEN,TRANSACTION,TRIGGER, andWHEN. - Updated the default maximum time difference between newest and oldest primary rows of a partition. The value
hot_data_max_minutes_per_partitionwas 5, and is now 55. - Added header
Cache-Control: max-age=0, no-cache, no-store, must-revalidateto all Config API responses. This forbids HTTP proxies from caching all responses, some of which could contain sensitive data. - Changed initialization logic for
turbine-apito avoid retries and surface errors more obviously. Before this change, the retry and restart logic occasionally masked problems like Keycloak initialization failures or database migration failures. - Allowed clients to omit nullable fields for PUT endpoints in the Config API. If a client omits fields, nullable fields are now stored as NULL rather than left untouched.
- Moved authentication audit logging from a PostgreSQL table to a Hydrolix table named
hydro.audit_logs. This change improves performance of audit log reporting for large clusters and decreases contention on the PostgreSQL database system. - The
hydrolix_urltunable may now be an IP address rather than a full URL with hostname, allowing easier testing and deployment without DNS available. - Tightened access privileges to the
/auth_logsendpoint so that only users withview_auth_logs_userorUser.allpermissions can view this content. - Updated the default transform for the hydro.monitor table to include the field
monitor_request_timestampsent from themonitor-ingestservice. - The Config API now enforces stricter validation on resource names to prevent conflicts in Git-based configuration-as-code workflows, especially on case-insensitive filesystems like macOS or Windows.
Cluster Operations⚓︎
- Introduced a user-defined metric definition tool called
hdx-pg-monitor. Arbitrary queries run on user-specified frequency to emit metrics into the Prometheus system. This feature allows flexible observability for PostgreSQL settings, and especially catalog state. - Enhanced the
hdx-scalercommand to accept configurable--metric_labelsand--pathoptions. - Improved the rules for
hdx-scalerattributes in the Hydrolix Spec Configuration Validator. Fields identifying service names must match existing services, ranges must be valid, and required fields can no longer be omitted. - Added support in Traefik for the use of service account JWT to authenticate to Amazon Data Firehose.
- Introduced status fields on custom resources and health check logic within the Operator to extend visibility into the readiness state of a cluster.
- Enforced exit and error logging for RPC services failing to start up properly to troubleshoot issues with gRPC at startup.
- The
hdx-scalernow exports configuration and decision metrics through Prometheus. This enables Grafana dashboards that mirror the scaler UI and supports monitoring multiple scaler pods from a single view.
New metrics include: hdxscaler_decision_total(withreason)hdxscaler_scrape_errors_totalhdxscaler_scrape_latency_seconds- Added validating webhook checks for
spec.overrides. Each override must include: - A timezone
- A patch
- Exactly one schedule (
cron,weekly, orwindow) with valid fields and formats.
The webhook also validates the patch against the HDX spec and returns clear warnings and errors for bad fields or values, preventing invalid configs from being applied. - Added support for aggregating multiple metrics in
hdx-scaler. A newopfield controls aggregation (sum,avg,min,max).
The TUI and webhook validation now support this field, and a metric matching bug has been fixed.
Core⚓︎
- Introduced automatic data type promotion for numeric SQL functions returning
uint16orint16types. Since Hydrolix doesn't support these types, they are promoted touint32orint32instead. This is particularly important to support summary table aggregation calculations. - Added a method to invalidate cached tokens when an auth error occurs.
- You can now remove columns from a summary table without deleting and recreating it. This helps avoid costly table rebuilds and backfills for large deployments.
- Set a column to
NULLin the summary SQL, and it will drop from the summary schema going forward. - Existing aggregates aren’t recomputed, but removed columns can be reintroduced later. Historical rows will return
NULL. - Added built-in, read-only users for Grafana, Quesma, and Superset datasources. This removes the need to create the users manually.
- Credentials are now available in the general secret, similar to the super-admin user.
Intake and Merge⚓︎
- Routine job
partition-cleanernow runs weekly, instead of daily, since listings even on large clusters are unnecessary. We also added thebulk_delete_bytesmetric. - Improved datatype model and testing in rust intake system, particularly around indexable data types.
- Added field
pretransformto transforms, which supports listing multiple JSON subtypes. These subtypes are then applied in order in the intake pipeline, allowing the processing of messages of different sources in the same pipeline. See Notable new features for more information.
UI⚓︎
- Introduced a search bar to the +Add New menu which limits the items in response to user input. There are many objects to manage in a Hydrolix cluster, and this improves responsiveness and usability, especially on small screens.
- The UI now displays the cloud provider name for each entry in page Security > Credentials. We corrected the detail page title to View credential instead of incorrect Edit credential.
robots.txtnow blocksAdsBot-Google,AdsBot-Google-Mobile, andAdIdxBot. Hydrolix clusters do not serve content intended for web crawler consumption.
Bug Fixes⚓︎
Cluster operations⚓︎
- Switched to a more secure temporary file library function to avoid a possible symlink attack in the Traefik reverse proxy configuration setup.
ip_allowlistis now correctly installed into Linode Kubernetes Engine (LKE) load balancers usingservice.beta.kubernetes.io/linode-loadbalancer-firewall-aclannotations. They do not usespec.loadBalancerSourceRanges. This corrects a missing firewall automation feature for the LKE platform.- Fixed
pgMonitorrole creation logic.pgMonitornow has select and view permissions on all schemas by default. - Correctly installs
ip_allowlistvalues into Linode Kubernetes Engine (LKE) load balancers by using theservice.beta.kubernetes.io/linode-loadbalancer-firewall-aclannotation.
Earlier, specifying anip_allowliston LKE did not apply any firewall rules.
Note: Whenacme_enabledis set, certificate issuance can fail if access is restricted byip_allowlistbecause ACME HTTP challenge providers do not publish fixed IP ranges.
A temporary workaround is to allow 0.0.0.0/0 during certificate issuance. This is not a long-term solution; improvements are planned for a future release.
Config API⚓︎
- Removed deprecated attribute
KeycloakAuthLogPermissionsfrom user permissions class. - Fixed a bug where deleting a project could fail with a
500error if a source’s secret had already been removed. Project deletion now succeeds even when related secrets are missing. - Fixed a
500error when creating or updating transform templates that include a shadow transform.POST /v1/transform_templatesandPUT/PATCH /v1/transform_templates/:idnow validate and save shadow transform settings correctly.
Core⚓︎
- Fixed an initialization ordering bug that prevented use of custom functions with summary tables. Earlier, the summary table definition would fail with
Unknown function.
Intake⚓︎
- Fixed an issue where a cluster with a Kafka source without a related credential caused the
kafka-peerto emit an error to the log:
The behavior now uses the cluster's credentials if other credentials are omitted, and no error is logged.
UI⚓︎
- The UI now handles custom column and row delimiters correctly in all situations. Display labels have been refined to use English-language names for these common delimiters:
Comma,Semicolon,Tabulation,Pipe,Newline,Return, andReturn and Newline. - Summary tables have been exluded from the table selection list for new Table Transform and Transform Template workflows. This avoids presenting the user with an invalid option.