v5.9.5
Notable new features⚓︎
Column-level access control⚓︎
Added column-level access control to the RBAC system. Data administrators can restrict access to a table by constructing a lists of blocked columns. These column policies can be attached to Hydrolix roles. All access policies are enforced at query execution time in the query system, enabling fine-grained data security for compliance and data governance requirements.
RBAC-enabled ingest and service endpoints⚓︎
Enable RBAC authorization on various endpoints using the tunable enable_traefik_authorization which is false by default. This covers the /ingest, /prometheus, /version, /grafana, /superset, and /kibana endpoints.
Parquet support⚓︎
Added support for Parquet format support in HTTP streaming, Kafka, Kinesis, and TCP ingestion systems, with transform configuration "format": "parquet" and HTTP header Content-Type: application/vnd.apache.parquet. Ingestion supports flattening, pointers, and pretransforms.
Intelligent Pod Scheduling during low resources⚓︎
Added Kubernetes PriorityClass for all workloads to enable intelligent pod scheduling during resource constraints. Critical workloads like intake-head can now preempt lower-priority workloads during traffic spikes, preventing data loss while new nodes provision. Includes new priority_classes tunable for overriding default priority assignments.
Enhanced cluster health monitoring⚓︎
Enhanced operator cluster health monitoring to automate post-upgrade checks and provide detailed status reporting. The HydrolixCluster status now includes clusterStatus (Ready/Not Ready/Upgrading/Scaled Off), categorized issues (critical vs non-critical), and health checks for all managed resources. Includes new tunables for configuring which resources to ignore during health evaluation.
Breaking changes⚓︎
Renamed cluster_logs endpoint and added RBAC⚓︎
Renamed cluster_logs Config API endpoint to cluster_spec with JSON response format and added RBAC permissions. Accounts with view_clusterspec, user_admin, or super_admin permission can access the endpoint.
Quesma has been renamed to Kibana Gateway⚓︎
Service name
Occurrences of the name quesma have been renamed to kibana_gateway (or kibana-gateway, depending on context).
Tunable name
The quesma_config tunable has been renamed to kibana_gateway_config.
Tunable schema
The tunable schema remains unchanged except one new optional keyword. The version key was added to support custom image tag specification. For example:
Kibana Gateway upgrade instructions
- Update the
HydrolixCluster/hdxspec tunable name fromquesma_configtokibana_gateway_config. - In the
HydrolixCluster/hdxobject, if custom scaling is defined using thespec.scalekey, change the scaling key fromquesmatokibana-gateway. - If external access for Quesma is enabled, change the subdomain in
quesma.${HDX_HOSTNAME}.hydrolix.livetokibana-gateway.${HDX_HOSTNAME}.hydrolix.live. - After making these changes, ensure the Quesma pod is automatically terminated and Kibana Gateway pod is up and running. If you need help, contact Hydrolix support.
Upgrade instructions⚓︎
Upgrade and downgrade restrictions⚓︎
Do not skip minor versions when upgrading or downgrading
Skipping versions when upgrading or downgrading Hydrolix can result in database schema inconsistencies and cluster instability. Always upgrade or downgrade sequentially through each minor version.
Example:
Upgrade from 5.7.9 → 5.8.6 → 5.9.5, not 5.7.9 → 5.9.5.
Apply the new Hydrolix operator⚓︎
If you have a self-managed installation, apply the new operator directly with the kubectl command examples below. If you're using Hydrolix-supplied tools to manage your installation, follow the procedure prescribed by those tools.
GKE⚓︎
EKS⚓︎
LKE and AKS⚓︎
Monitor the upgrade process⚓︎
Kubernetes jobs named init-cluster and init-turbine-api will automatically run to upgrade your entire installation to match the new operator's version number. This will take a few minutes, during which time you can observe your pods' restarts with your Kubernetes monitor tool.
Ensure both the init-cluster and init-turbine-api jobs have completed successfully and that the turbine-api pod has restarted without errors. After that, view the UI and use the API of your new installation as a final check.
If the turbine-api pod doesn't restart successfully, or other functionality is missing, check the logs of the init-cluster and init-turbine-api jobs for details about failures. This can be done using the k9s utility or with the kubectl command:
If you still need help, contact Hydrolix support.
Changelog⚓︎
Updates⚓︎
API updates⚓︎
- Updated Python dependencies to include security fixes for CVE-2024-39330 (path traversal in django-storages) and CVE-2025-50181 (SSRF vulnerability in urllib3 via boto3).
django:5.0.14→5.2.8django-storages:1.14.3→1.14.6boto3:1.34.50→1.35.99
- Gunicorn upgrade from
20.1.0to23.0.0.
Intake updates⚓︎
- Upgraded Rust environment from
1.90.0to1.91.0to ensurecargo fmtis available during build.
Cluster Operations updates⚓︎
- Updated default HTTP proxy (Chproxy) version from
0.5.1to0.6.1.
Improvements⚓︎
Config API improvements⚓︎
-
Added Column-Level Access Control (CLAC) to the RBAC system, allowing per-table access policies.
-
Updated dictionary file upload endpoint help text to clarify that file must be a local path rather than URI. The filename field is now required.
-
Added query parameter filters to the
/columnsendpoint to support UI pagination for separate alias and additional names tables. -
Email validation now ensures user email addresses meet Keycloak username requirements, preventing account creation failures.
-
Prevented deletion of dictionary files that are currently in use by dictionaries. This avoids query-head "dictionary file syncing failed" errors.
-
Added ability to reference credentials by name in addition to ID across all Config API endpoints. Supports both
credentialandcredential_idfields on Storage, Table autoingest settings, and other credential references, enabling portable Configuration-as-Code workflows across clusters. -
Added endpoint
column_value_mappingfor each table for assigning column values to storage locations. Using PUT and PATCH to manage these mappings avoids the risks of using PUT on a full table. -
Renamed
cluster_logsendpoint tocluster_specwith JSON response format and added RBAC permissions (super_admins,user_admins, anduserswithview_clusterspecpermission can access). -
Updated API documentation for
credentialsendpoint with type-specific schemas and added missing tags to endpoints with unlabeled responses. -
Enabled service accounts to create other service accounts and issue tokens when they have appropriate RBAC permissions. This change removes previous restrictions that prevented service accounts from performing these operations.
-
Improved the
sqlpermsendpoint performance. -
Updated default flush settings for new tables to optimize partition sizing and data retention. Hot data partition width reduced from 55 to 5 minutes for better query performance, and cold data max age reduced from 10 years to 1 year. Existing tables retain their configured settings.
Cluster Operations improvements⚓︎
-
Enhanced operator cluster health monitoring to automate post-upgrade checks and provide detailed status reporting. Cluster status and health checks are available for all managed resources.
-
Improved ZooKeeper connection handling with connection and command timeouts to increase responsiveness in the face of bad nodes or unresolvable paths. Also added more detail in the INFO-level logs for troubleshooting.
-
Added conditional Keycloak podAntiAffinity that applies only when replicas > 1, preventing scheduling issues in single-replica deployments.
-
Scale settings for
intake-indexersidecar inintake-headpools can now be overridden, allowing customization through thescale_profileconfiguration. For example: -
Added debugging utilities to the
toolingpod image, including networking tools (nc,telnet,traceroute,tcpdump), process debugging utilities (lsof,strace,htop), JSON tooling (jq), and Kubernetes interaction tools (kubectl,k9s). -
Increased default memory allocation for
turbine-apito 2Gi across all scale profiles, addressing increased memory requirements from v5.7 configuration changes. -
Added host-based Traefik routing for Grafana using the
<hostname>-grafana.<domain>URL pattern to facilitate smooth migration by the operator. -
Added preferred pod anti-affinity rules and logic for ZooKeeper, RabbitMQ, Redpanda, and Traefik. This improves reliability by distributing pods across nodes while preserving custom node affinity configurations.
-
Added Kubernetes PriorityClasses for all workloads to enable intelligent pod scheduling during resource constraints. Critical workloads like
intake-headcan now preempt lower-priority workloads during traffic spikes, preventing data loss while new nodes provision. Includes newpriority_classestunable for overriding default priority assignments. -
Enhanced
kibana_gateway_configtunable to support multiple Hydrolix projects. -
Added Subject Alternate Names (SANs) support for ACME-generated SSL certificates through new
alt_namestunable. Enables a single certificate to cover multiple domain names, eliminating SSL errors when accessing services through different hostnames. -
A new
hdx-vpa-metricsservice can offload VPA metric collection from HDX Scaler to a dedicated service, improving performance. Controlled by newhdx_vpa_metricstunable with sub-keys forenabled,poll_interval,filter_monitored_pods, andmetrics_port. -
Enhanced HDX Scaler cooldown logic to only apply cooldowns when scaling actions are actually taken, preventing unnecessary delays between scale operations. Previously, cooldowns were applied even when no scaling occurred, which could delay subsequent scaling decisions.
-
Added events API permissions to the
hdx-scalerrole. Previously, thehdx-scalerservice account lacked the necessary RBAC permissions to watch Kubernetes events, resulting in continuous 403 Forbidden errors being logged to CloudWatch audit logs. This increased CloudWatch costs. -
Fixed the
hdxscalerexponentially weighted moving average (EWMA) calculation producing incorrect, negative values under CPU saturation. -
Changing
hdxscalersettings no longer requires a restart.
Security improvements⚓︎
-
New RBAC authorization can be added to the
/ingest,/prometheus,/version,/grafana,/superset, and/kibanaendpoints. Enable this authorization using the tunableenable_traefik_authorization, which isFalseby default. -
Added
runAsNonRootsecurity context to all Kubernetes workloads at both container and pod levels. -
Updated
hdx-pg-monitorandhdx-scalercontainers to run as a non-rootturbineuser. -
Updated
hdx-pod-metricscontainer to run as a non-root user. -
Fixed security vulnerability where Traefik metrics endpoints were publicly accessible when
ip_allowlistwas set to0.0.0.0/0. Metrics ports are now exposed through a separate internaltraefik-metricsservice, preventing unauthorized access while preserving Prometheus scraping capability. -
Converted indexer sidecar to
init-containerinintake-headpods to ensure proper termination order during scale-down events. This prevents data loss by ensuring the indexer remains available until the stream container terminates.
Core improvements⚓︎
-
Added
catalog_resp_time_mstoquery_detail_runtime_statsfor summary table queries, providing catalog read timing metrics previously only available for turbine storage queries. -
Lowered socket
receive_timeoutin ClickHouse settings from 1,000 seconds to 20 seconds to improve delays in cancel responses. -
Improved dictionary loading when dictionary files cannot be fetched. Turbine now logs errors and continues operating, allowing operations to work for tables that don't require the unavailable dictionary.
-
Added Phase 1 support for ClickHouse JSON data type. Includes read/write operations for JSON partitions and fixes for summary table compatibility.
-
Added observability columns to
hdx.active_queriestable includingquery_id,initial_query_id,memory_usage,peak_memory_usage, andhost_addr. These additions enable query-level memory tracking and correlation of distributed query execution acrossquery-headandquery-peernodes.
Intake improvements⚓︎
-
Enhanced cloud storage transfer error handling with retry logic for router communication failures and "target busy" responses, preventing data loss during transient failures.
-
intake-peersnow clean up abandoned transfers from unexpectedly terminated senders. -
Added initial Parquet format support to intake with transform configuration
"format": "parquet"and HTTP headerContent-Type: application/vnd.apache.parquet. Supports JSON gadgets (flattening, pointers, pretransforms). -
Extended Parquet format support to Kafka, Kinesis, and TCP ingestion mechanisms. Also fixed issue reading compressed data from Kafka.
-
Optimized query performance on the catalog table by adding a unique key and composite index.
UI improvements⚓︎
-
Added API and Docs links to sidebar on pages where they were previously missing.
-
Updated Query Options UI to support the new unified
hdx_query_max_before_external_group_byconfiguration option. -
Added complete UI support for column policies management, enabling administrators to modify Column-Level Access Control (CLAC) policies.
Bug Fixes⚓︎
Config API fixes⚓︎
-
Fixed
catalog_urlsendpoint to return400 Bad Requestinstead of500 Internal Server Errorwhen date parameters are incorrectly formatted. -
Fixed Internal Server Error when deleting projects with orphaned jobs.
-
Table columns and row policies are now deleted when their parent table is deleted.
-
Added validation to prevent duplicate row policy names within the same table, returning 400 errors if duplicate naming is attempted.
-
Fixed password complexity error handling to return proper 400 errors instead of 500 Internal Server Error when passwords don't satisfy password complexity checks.
-
Fixed 500 Internal Server Error when updating query options on summary tables via the
/query_optionsendpoint. -
Fixed credential migration logic to ensure unique credential names, regardless of case.
-
Updated guardian migration from
0002to0003in release file. -
Fixed a bug which would occur after a catalog (PostgreSQL) failover or restart.
turbine-apipods did not release stale database connections, causing connection pool exhaustion.
Cluster Operations fixes⚓︎
-
Removed REPLICATION permission from
hdx-pg-monitor role, preventing permission-related errors during PostgreSQL monitoring operations. -
Fixed a race condition where concurrent authentication requests could cause invalid token errors.
-
hdx_auth metricsis now disabled whenunified_authis disabled in Traefik plugin mode, preventing unnecessary metric collection. -
Added PostgreSQL
init-containerto fix persistent volume mount permissions for non-root PostgreSQL containers. -
Fixed
key_prefixstring formatting to correctly handle braces when using Python f-string format, preventing logs written to incorrect directories. -
Fixed HDX Scaler to properly terminate orphaned scaler tasks when configuration is reloaded. Previously, removed scaler configurations would continue running, causing unexpected scaling behavior.
-
Fixed HDX Scaler VPA to no longer fail with "duplicate container name" errors when attempting to scale deployments using initContainers.
-
Fixed HDX Scaler Kubernetes watcher to properly reconnect when the connection to the Kubernetes API server is lost, preventing scaler failures and ensuring continuous monitoring of cluster resources.
-
ACME certificate renewal job now supports alternative names for SSL certificates. The
acme-renewalcan now perform certificate renewal for multi-domain configurations using this format in the Hydrolix spec: -
Fixed TLS configuration for MySQL and Thanos routes by disabling TLS passthrough mode, which was not functioning correctly. MySQL (port 9004) and Thanos-sidecar (port 19091) routes now use
disable_tlsinstead ofpassthrough_tls, resolving connectivity issues to facilitate Tableau integration.
Core fixes⚓︎
-
Disallowed ORDER BY clauses in summary table SQL definitions, preventing configuration errors that silently caused missing columns during summary execution.
-
Fixed segmentation fault during query server shutdown.
-
Fixed deadlock occuring during
query-peerrestart during many conncurrent queries. QUERY_WAS_CANCELLED exceptions are now thrown to properly terminate. -
Fixed
query-peercrashes when operating with low OS file descriptor limits. Queries now fail gracefully with informative error messages.
Intake fixes⚓︎
-
Table name resolution now considers the project when resolving a table name to table ID, preventing incorrect table selection when table names are not unique across projects.
-
Fixed merge target overrides to persist on config reload, preventing overrides from being reset when configuration is refreshed.
-
Periodic file deletion now uses the filename path directly instead of converting to a string, ensuring deletion of log files.
-
Fixed batch job path construction in turbine-api to use correct base URL for all batch operations. This ensures batch job operations (commit, retry, cancel, status, errors) work correctly with legacy batch deployments.
-
Fixed a bug preventing the
job_purgeperiodic task from deleting jobs with aNULLvalue for theupdated_atfield.
Operator fixes⚓︎
-
Added default
priorityClassforhttp-head,intake-peer, andintake-routerto be the same asintake-head. -
Fixed
hdx-scalernot reloading configuration related to metrics aggregations.
UI fixes⚓︎
-
Fixed Alter Job page table rendering to wait for data load completion, preventing null values from sometimes appearing in table columns after page reload.
-
Fixed delete modal text overflow when resource names are too long. The "delete" modal titles no longer overlap the close button.
-
Fixed transform validation page to eliminate excess API requests for sources, dictionaries, and functions endpoints by fetching only project-specific data. This mitigates "disappearing SQL" events in the UI.
-
Added validation to ensure the Parent Table field matches the table name in the SQL query when creating summary tables, displaying an error message for mismatches.
-
Added validation to prevent invalid dash character entry in number input fields across all UI forms. This prevents strings like "1-5" from being entered.
-
Transforms may now be used with Shadow Tables.
-
The bulk user invite form now correctly displays error messages when inviting duplicate users.
-
Fixed summary table edit form to clear non-field validation errors on submit, preventing previous error messages from blocking form resubmission.