ClickHouse upgrade, new Shadow Tables feature, Native MySQL client support, better API documentation, easier Kibana integration via Quesma, and Hydrolix Tunable Names
Notable New Features⚓︎
- ClickHouse Upgrade
- The ClickHouse library has been upgraded to version 24.8.6.70. In addition to performance improvements and bugfixes, the following functions are now available.
- Top K:
approx_top_k,topKWeighted, andapprox_top_sum - String related:
base64URLEncode,base64URLDecode,tryBase64URLDecode, andgroupConcat - Time related:
toMillisecond - Set handling:
groupArrayIntersect - Windowing:
percent_rank - and more...
- Top K:
- Native MySQL client support
- Client applications can now connect to and query a Hydrolix cluster with MySQL clients. The MySQL server listens on tcp/9004. This opens up more integration possibilities.
- Better API Documentation
- The API documentation available from the
/config/schema/endpoint has been reorganized and improved, making way for more complete API documentation in future releases. - Simplified Kibana, Quesma, and Elasticsearch Integration
- Deployment of Kibana, Quesma, and Elasticsearch is now provided automatically within Hydrolix.
- Hydrolix Tunable Names (HTN)
- Using a structured naming pattern, all tunables can be applied to multiple services, pools and containers. When defined, an HTN
htn:<service>:<pool>:<container>: <value>takes precedence over traditional key-value tunables.
Breaking Changes⚓︎
🚧 GET table_query_options has been removed
The /table_query_options API path has been removed. The same functionality is now available at the more flexible
/query_optionspath and also works on both tables and projects. See the API documentation for more information:
Upgrade⚓︎
⚠️ Do not skip release versions.
Skipping versions during upgrades may result in system instability. You must upgrade sequentially through each release version.
Example: Upgrade from 5.4 → 5.5 → 5.6, not 5.4 → 5.6
Upgrade on GKE⚓︎
Upgrade on EKS⚓︎
Upgrade on LKE⚓︎
(In the sections below, remove the temporary links before publishing them. Keep this version around, though, it's useful for quick research later.)
Changelog⚓︎
General⚓︎
- Hydrolix Engine
-
Updated ClickHouse from version 23.8.10.43 to 24.8.6.70. We also updated the following libraries:
- openssl 1.1.1n -> 3.3.2
- absl 20211102.0 -> 20240722.0
- fmt 8.1.1 -> 9.1.0
- libpq 16.4 -> 16.5
-
Data Transformation
- Introduced UI support for creating, copying, viewing and customizing transform templates.
- Allowed UI download of summary table transforms.
-
Introduced support for display and edit of WURFL settings while managing transforms.
-
Ingest
- Added Shadow Tables feature. Shadow Tables only receive a certain percentage of randomly-sampled data from the input source.
- Amazon Data Firehose can now decode more than one data object from a single data segment.
- Improved the SaaS metering functionality ("Usagemeter") by adding better housekeeping and cleaning out reported data older than the number of days specified in the
usagemeter_preservetunable. -
Updated the idna library from version 0.5.0 to 1.0.3 to address a security vulnerability.
-
Query
-
Added TCP port 9004 listener for incoming queries using the MySQL dialect. This enables MySQL-compatible query tools. Queries are proxied to the ClickHouse query engine.
-
Cluster Operations
- Created a new tunable
turbine_api_require_table_default_storagerequiring explicit storage maps at table creation. This is especially useful for clusters using storage mapping. - Removed
partition-vacuum. It was replaced with a lightweightpartition-cleaner. - Switched
partition-cleanerto a continuously running service instead of a scheduled job. Removed tunablespartition_cleaner_enabledandpartition_cleaner_schedule, decrease memory usage, and introducebulk_delete_*metrics. - Converted hard-coded query peer liveness parameters into two new tunables,
query_peer_liveness_initial_delayandquery_peer_liveness_probe_timeout. - Added a tunable
prometheus_curated_configmapto offer cluster operators dynamic control over which metrics are exported to Prometheus. - Added tunables
disable_traefik_mysql_port,mysql_port, andmysql_port_disable_tlsused by thetraefikserver for MySQL configuration. - Introduced Hydrolix Tunable Names (HTN), a flexible, hierarchical system for configuring tunables across an entire cluster.
- Introduced two new tunables to improve control over the behavior of rolling updates:
rollout_strategy_max_surgeandrollout_strategy_max_unavailable. These are especially useful for deployments running near maximum capacity. -
Provided automatic deployment of Kibana, Quesma, and Elasticsearch within Hydrolix using the
data_visualization_toolsandquesma_configtunables. -
Integrations
-
Removed unnecessary references to shard keys throughout the Spark connector integration.
-
API
-
The API documentation available from the
/config/schema/endpoint has been reorganized and improved, making way for more complete API documentation in future releases. -
UI
- Improved Advanced Options rows in tables, which can now be clicked, bringing up an editor in the sidebar for better visibility.
- Introduced a multi-selection dropdown on the System Health page that includes a new All option for logs.
- Added a refresh interval of three minutes for the table health widget.
- Added a Delete Project page to improve the deletion process. The page shows statistics about ingest latency and size, and warns before deletion. Your session must have
is_superuser=trueto be allowed to delete projects. - Added a Project Health page to show relevant information, such as statistics about ingest latency and size per table.
- Upgrade Next.js from 14.2.20 to 14.2.22 to address security issues.
Bug Fixes⚓︎
- API
- Handled null storage IDs correctly in API for the presigned URL feature.
- Allowed users with single project or table permissions to see projects and tables to which they have access. Formerly, they saw nothing.
- Allowed users with all permissions to a project to use the
/projectsendpoint for listing projects. Earlier, this was endpoint was forbidden. - Ensured only a single use for any invitation. This fixes an issue allowing a second claim on the same invitation.
- Introduced a script that automatically runs on upgrade to patch up existing views that were affected by a previously fixed bug involving
datetimeanddatetime64. - Required
force_operationto delete storage if it was used in acolumn_value_mapping. - Accepted blank entries in the rust URL pre-signer for
endpoint_url. - Fixed a bug involving
burstandlimitwhen creating projects, tables, and transforms. -
Allowed the
?force_operation=truequery string parameter to bypassing verification when updating summary table SQL. This allows recovery from poor configuration. -
Authentication and Permissions
- Fixed a breaking interaction between unified auth and query string auth.
-
Removed visibility for with the
super_adminrole in Hydrolix will no longer see projects that have been marked as deleted. -
Configuration and Control
- Accepted convertible types for core tunables via
query_optionsendpoint. For example,hdx_query_unlimited_cnfis a boolean but can be "0" (false) or "1", "99", etc (true). - Ensured that the Hydrolix operator monitors changes to
curatedand applies immediately to avoid k8s and operator desynchronization. This also obviates the need to restart the operator. -
A liveness check has been added to
traefik-cfgso it will terminate instead of hanging while waiting for timeouts. -
Core and Query
- Corrected Spark connector integration to returns an empty ClickHouseRecord to the client, indicating absence of result. Earlier, it would return null, provoking a null pointer exception.
- Filtered storage IDs requiring presigning to fix Internal Server Error from
/catalog_urls. - Added resilience improvement when a disconnected peer is canceling a query. No need to crash in sensitive routine under cancellation conditions.
- Corrected a rare thread safety issue in ClickHouse 24.8.6.70 upgrade.
- Reserved use of
hdx_query_output_file_enabled=1as a query-level setting, not on table, project, or org. Improve consistency ifhdx_query_output_filenameis empty, by generating timestamped filenames. - Fixed a rare, load-induced segfault crash in
query-headby wrapping cancellation detection in a mutex. - Replaced std::string with fixed-length char array to avoid deallocation failures under heavy memory pressure. This should reduce or eliminate exit codes of 139 also lacking stack trace output.
- Prevented full table scan when SELECT queries with LIMIT find no data in the specified time range.
- Fixed an issue where a segmentation fault occurred when running a catalog query if the
turbineservice was started before thepostgresqlservice. - Fixed a segfault triggered when using the
empty()SQL function with map columns. - Detected cases of INSERT INTO an invalid
hdx_storage_id. Now exits instead of displaying a long error message. -
Fixed a memory leak involving the AWS API. Aws::ShutdownAPI() is now called when Hydrolix is finished using it.
-
Merge and Data Lifecycle
-
Included the shard key along with time range when determining eligibility for partition merging.
-
Ingest
- Avoided a possible segfault in partition handling during parse error checking. This was visible in both
intake-headandmerge-peer. - Corrected a reference counting error triggered in error-handling during batch ingestion with summary tables enabled. Symptom was infinite loop of "Outstanding partition status."
- Stopped leaking file descriptors inside connection pool management during batch, alter and merge, by always closing response bodies.
- Improved cleanup of orphaned files when upload to storage fails. Previously, a
batch-peermight hang on shutdown. - Corrected a soft-delete interaction with Azure in which files set as 'permanent delete' remained. They will now be deleted.
- Introduced better reporting of replicas when using a
GETpool endpoint. The response includes count ofcurrent_replicas, and a range forreplicas, matching the setting in thehydrolixcluster.yamlfile. - Allowed multiple autoingest sources to use the same transform. This reduces resource costs during ingestion.
- Updated autoingest to normalize URI filtering across Azure, GCP, and AWS. The overall behavior remains the same.
- Added richer error reporting during batch autoingest, including bucket name, error message, and other fields. Previously, only a short text message was returned.
- Corrected JSON-encoded storages escaping for the batch-head to read. This prevents data loss with autoingest.
-
Updated golang.org/x/net from v0.28 to v0.33 to address a security vulnerability that could allow Denial of Service (DOS) attacks.
-
UI
- Avoid losing user input in the Table Health UI widget sidebar during re-rendering or when user switches to other tabs or windows.
- Omit browser locale information to avoid causing bad SSO redirect.
- Remove duplicated text entry block for schema definition when editing dictionaries.
- Only display WURFL fields if WURFL is enabled.
- Fix multiple display issues in sidebars when surfacing validation results from API to user. Pages affected were data lifecycle management and also stream, merge, flush, bucket, and rate limit settings.
- Upgrade octokit library which includes fix for a regular expression backtracking denial of service vulnerability, CVE-2025-25289.
- Display the replicas correctly in the scale page, by using current replicas count.
- Update serialize-javascript library from 6.0.1 to 6.0.2, as well as dependencies, to address CVE 2024-11831.
- When editing a role, the sidebar now behaves appropriately when adding or removing a policy.