11 March 2025 - v5.0.0
ClickHouse upgrade, new Shadow Tables feature, Native MySQL client support, better API documentation, easier Kibana integration via Quesma, and Hydrolix Tunable Names
Notable New Features
- ClickHouse Upgrade
- The ClickHouse library has been upgraded to version 24.8.6.70. In addition to performance improvements and bugfixes, the following functions are now available.
- Top K:
approx_top_k
,topKWeighted
, andapprox_top_sum
- String related:
base64URLEncode
,base64URLDecode
,tryBase64URLDecode
, andgroupConcat
- Time related:
toMillisecond
- Set handling:
groupArrayIntersect
- Windowing:
percent_rank
- and more...
- Top K:
- The ClickHouse library has been upgraded to version 24.8.6.70. In addition to performance improvements and bugfixes, the following functions are now available.
- Native MySQL client support
- Client applications can now connect to and query a Hydrolix cluster with MySQL clients. The MySQL server listens on tcp/9004. This opens up more integration possibilities.
- Better API Documentation
- The API documentation available from the
/config/schema/
endpoint has been reorganized and improved, making way for more complete API documentation in future releases.
- The API documentation available from the
- Simplified Kibana, Quesma, and Elasticsearch Integration
- Deployment of Kibana, Quesma, and Elasticsearch is now provided automatically within Hydrolix.
- Hydrolix Tunable Names (HTN)
- Using a structured naming pattern, all tunables can be applied to multiple services, pools and containers. When defined, an HTN
htn:<service>:<pool>:<container>: <value>
takes precedence over traditional key-value tunables.
- Using a structured naming pattern, all tunables can be applied to multiple services, pools and containers. When defined, an HTN
Breaking Changes
GET table_query_options has been removed
The /table_query_options API path has been removed. The same functionality is now available at the more flexible
/query_options
path and also works on both tables and projects. See the API documentation for more information:
Upgrade
Upgrade on GKE
kubectl apply -f "https://www.hydrolix.io/operator/v5.0.0/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&gcp-storage-sa=${GCP_STORAGE_SA}"
Upgrade on EKS
kubectl apply -f "https://www.hydrolix.io/operator/v5.0.0/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&aws-storage-role=${AWS_STORAGE_ROLE}"
Upgrade on LKE
kubectl apply -f "https://www.hydrolix.io/operator/v5.0.0/operator-resources?namespace=$HDX_KUBERNETES_NAMESPACE"
(In the sections below, remove the temporary links before publishing them. Keep this version around, though, it's useful for quick research later.)
Changelog
General
-
Hydrolix Engine
- Updated ClickHouse from version 23.8.10.43 to 24.8.6.70. We also updated the following libraries:
- openssl 1.1.1n -> 3.3.2
- absl 20211102.0 -> 20240722.0
- fmt 8.1.1 -> 9.1.0
- libpq 16.4 -> 16.5
- Updated ClickHouse from version 23.8.10.43 to 24.8.6.70. We also updated the following libraries:
-
Data Transformation
- Introduced UI support for creating, copying, viewing and customizing transform templates.
- Allowed UI download of summary table transforms.
- Introduced support for display and edit of WURFL settings while managing transforms.
-
Ingest
- Added Shadow Tables feature. Shadow Tables only receive a certain percentage of randomly-sampled data from the input source.
- Amazon Data Firehose can now decode more than one data object from a single data segment.
- Improved the SaaS metering functionality ("Usagemeter") by adding better housekeeping and cleaning out reported data older than the number of days specified in the
usagemeter_preserve
tunable. - Updated the idna library from version 0.5.0 to 1.0.3 to address a security vulnerability.
-
Query
- Added TCP port 9004 listener for incoming queries using the MySQL dialect. This enables MySQL-compatible query tools. Queries are proxied to the ClickHouse query engine.
-
Cluster Operations
- Created a new tunable
turbine_api_require_table_default_storage
requiring explicit storage maps at table creation. This is especially useful for clusters using storage mapping. - Removed
partition-vacuum
. It was replaced with a lightweightpartition-cleaner
. - Switched
partition-cleaner
to a continuously running service instead of a scheduled job. Removed tunablespartition_cleaner_enabled
andpartition_cleaner_schedule
, decrease memory usage, and introducebulk_delete_*
metrics. - Converted hard-coded query peer liveness parameters into two new tunables,
query_peer_liveness_initial_delay
andquery_peer_liveness_probe_timeout
. - Added a tunable
prometheus_curated_configmap
to offer cluster operators dynamic control over which metrics are exported to Prometheus. - Added tunables
disable_traefik_mysql_port
,mysql_port
, andmysql_port_disable_tls
used by thetraefik
server for MySQL configuration. - Introduced Hydrolix Tunable Names (HTN), a flexible, hierarchical system for configuring tunables across an entire cluster.
- Introduced two new tunables to improve control over the behavior of rolling updates:
rollout_strategy_max_surge
androllout_strategy_max_unavailable
. These are especially useful for deployments running near maximum capacity. - Provided automatic deployment of Kibana, Quesma, and Elasticsearch within Hydrolix using the
data_visualization_tools
andquesma_config
tunables.
- Created a new tunable
-
Integrations
- Removed unnecessary references to shard keys throughout the Spark connector integration.
-
API
- The API documentation available from the
/config/schema/
endpoint has been reorganized and improved, making way for more complete API documentation in future releases.
- The API documentation available from the
-
UI
- Improved Advanced Options rows in tables, which can now be clicked, bringing up an editor in the sidebar for better visibility.
- Introduced a multi-selection dropdown on the System Health page that includes a new All option for logs.
- Added a refresh interval of three minutes for the table health widget.
- Added a Delete Project page to improve the deletion process. The page shows statistics about ingest latency and size, and warns before deletion. Your session must have
is_superuser=true
to be allowed to delete projects. - Added a Project Health page to show relevant information, such as statistics about ingest latency and size per table.
- Upgrade Next.js from 14.2.20 to 14.2.22 to address security issues.
Bug Fixes
-
API
- Handled null storage IDs correctly in API for the presigned URL feature.
- Allowed users with single project or table permissions to see projects and tables to which they have access. Formerly, they saw nothing.
- Allowed users with all permissions to a project to use the
/projects
endpoint for listing projects. Earlier, this was endpoint was forbidden. - Ensured only a single use for any invitation. This fixes an issue allowing a second claim on the same invitation.
- Introduced a script that automatically runs on upgrade to patch up existing views that were affected by a previously fixed bug involving
datetime
anddatetime64
. - Required
force_operation
to delete storage if it was used in acolumn_value_mapping
. - Accepted blank entries in the rust URL pre-signer for
endpoint_url
. - Fixed a bug involving
burst
andlimit
when creating projects, tables, and transforms. - Allowed the
?force_operation=true
query string parameter to bypassing verification when updating summary table SQL. This allows recovery from poor configuration.
-
Authentication and Permissions
- Fixed a breaking interaction between unified auth and query string auth.
- Removed visibility for with the
super_admin
role in Hydrolix will no longer see projects that have been marked as deleted.
-
Configuration and Control
- Accepted convertible types for core tunables via
query_options
endpoint. For example,hdx_query_unlimited_cnf
is a boolean but can be "0" (false) or "1", "99", etc (true). - Ensured that the Hydrolix operator monitors changes to
curated
and applies immediately to avoid k8s and operator desynchronization. This also obviates the need to restart the operator. - A liveness check has been added to
traefik-cfg
so it will terminate instead of hanging while waiting for timeouts.
- Accepted convertible types for core tunables via
-
Core and Query
- Corrected Spark connector integration to returns an empty ClickHouseRecord to the client, indicating absence of result. Earlier, it would return null, provoking a null pointer exception.
- Filtered storage IDs requiring presigning to fix Internal Server Error from
/catalog_urls
. - Added resilience improvement when a disconnected peer is canceling a query. No need to crash in sensitive routine under cancellation conditions.
- Corrected a rare thread safety issue in ClickHouse 24.8.6.70 upgrade.
- Reserved use of
hdx_query_output_file_enabled=1
as a query-level setting, not on table, project, or org. Improve consistency ifhdx_query_output_filename
is empty, by generating timestamped filenames. - Fixed a rare, load-induced segfault crash in
query-head
by wrapping cancellation detection in a mutex. - Replaced std::string with fixed-length char array to avoid deallocation failures under heavy memory pressure. This should reduce or eliminate exit codes of 139 also lacking stack trace output.
- Prevented full table scan when SELECT queries with LIMIT find no data in the specified time range.
- Fixed an issue where a segmentation fault occurred when running a catalog query if the
turbine
service was started before thepostgresql
service. - Fixed a segfault triggered when using the
empty()
SQL function with map columns. - Detected cases of INSERT INTO an invalid
hdx_storage_id
. Now exits instead of displaying a long error message. - Fixed a memory leak involving the AWS API. Aws::ShutdownAPI() is now called when Hydrolix is finished using it.
-
Merge and Data Lifecycle
- Included the shard key along with time range when determining eligibility for partition merging.
-
Ingest
- Avoided a possible segfault in partition handling during parse error checking. This was visible in both
intake-head
andmerge-peer
. - Corrected a reference counting error triggered in error-handling during batch ingestion with summary tables enabled. Symptom was infinite loop of "Outstanding partition status."
- Stopped leaking file descriptors inside connection pool management during batch, alter and merge, by always closing response bodies.
- Improved cleanup of orphaned files when upload to storage fails. Previously, a
batch-peer
might hang on shutdown. - Corrected a soft-delete interaction with Azure in which files set as 'permanent delete' remained. They will now be deleted.
- Introduced better reporting of replicas when using a
GET
pool endpoint. The response includes count ofcurrent_replicas
, and a range forreplicas
, matching the setting in thehydrolixcluster.yaml
file. - Allowed multiple autoingest sources to use the same transform. This reduces resource costs during ingestion.
- Updated autoingest to normalize URI filtering across Azure, GCP, and AWS. The overall behavior remains the same.
- Added richer error reporting during batch autoingest, including bucket name, error message, and other fields. Previously, only a short text message was returned.
- Corrected JSON-encoded storages escaping for the batch-head to read. This prevents data loss with autoingest.
- Updated golang.org/x/net from v0.28 to v0.33 to address a security vulnerability that could allow Denial of Service (DOS) attacks.
- Avoided a possible segfault in partition handling during parse error checking. This was visible in both
-
UI
- Avoid losing user input in the Table Health UI widget sidebar during re-rendering or when user switches to other tabs or windows.
- Omit browser locale information to avoid causing bad SSO redirect.
- Remove duplicated text entry block for schema definition when editing dictionaries.
- Only display WURFL fields if WURFL is enabled.
- Fix multiple display issues in sidebars when surfacing validation results from API to user. Pages affected were data lifecycle management and also stream, merge, flush, bucket, and rate limit settings.
- Upgrade octokit library which includes fix for a regular expression backtracking denial of service vulnerability, CVE-2025-25289.
- Display the replicas correctly in the scale page, by using current replicas count.
- Update serialize-javascript library from 6.0.1 to 6.0.2, as well as dependencies, to address CVE 2024-11831.
- When editing a role, the sidebar now behaves appropriately when adding or removing a policy.