11 March 2025 - v5.0.0

ClickHouse upgrade, new Shadow Tables feature, Native MySQL client support, better API documentation, easier Kibana integration via Quesma, and Hydrolix Tunable Names

Notable New Features

  • ClickHouse Upgrade
    • The ClickHouse library has been upgraded to version 24.8.6.70. In addition to performance improvements and bugfixes, the following functions are now available.
      • Top K: approx_top_k, topKWeighted, and approx_top_sum
      • String related: base64URLEncode, base64URLDecode, tryBase64URLDecode, and groupConcat
      • Time related: toMillisecond
      • Set handling: groupArrayIntersect
      • Windowing: percent_rank
      • and more...
  • Native MySQL client support
    • Client applications can now connect to and query a Hydrolix cluster with MySQL clients. The MySQL server listens on tcp/9004. This opens up more integration possibilities.
  • Better API Documentation
    • The API documentation available from the /config/schema/ endpoint has been reorganized and improved, making way for more complete API documentation in future releases.
  • Simplified Kibana, Quesma, and Elasticsearch Integration
    • Deployment of Kibana, Quesma, and Elasticsearch is now provided automatically within Hydrolix.
  • Hydrolix Tunable Names (HTN)
    • Using a structured naming pattern, all tunables can be applied to multiple services, pools and containers. When defined, an HTN htn:<service>:<pool>:<container>: <value> takes precedence over traditional key-value tunables.

Breaking Changes

🚧

GET table_query_options has been removed

The /table_query_options API path has been removed. The same functionality is now available at the more flexible /query_options path and also works on both tables and projects. See the API documentation for more information:

Table Query Options
Project Query Options

Upgrade

Upgrade on GKE

kubectl apply -f "https://www.hydrolix.io/operator/v5.0.0/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&gcp-storage-sa=${GCP_STORAGE_SA}"

Upgrade on EKS

kubectl apply -f "https://www.hydrolix.io/operator/v5.0.0/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&aws-storage-role=${AWS_STORAGE_ROLE}"

Upgrade on LKE

kubectl apply -f "https://www.hydrolix.io/operator/v5.0.0/operator-resources?namespace=$HDX_KUBERNETES_NAMESPACE"

(In the sections below, remove the temporary links before publishing them. Keep this version around, though, it's useful for quick research later.)

Changelog

General

  • Hydrolix Engine

    • Updated ClickHouse from version 23.8.10.43 to 24.8.6.70. We also updated the following libraries:
      • openssl 1.1.1n -> 3.3.2
      • absl 20211102.0 -> 20240722.0
      • fmt 8.1.1 -> 9.1.0
      • libpq 16.4 -> 16.5
  • Data Transformation

    • Introduced UI support for creating, copying, viewing and customizing transform templates.
    • Allowed UI download of summary table transforms.
    • Introduced support for display and edit of WURFL settings while managing transforms.
  • Ingest

    • Added Shadow Tables feature. Shadow Tables only receive a certain percentage of randomly-sampled data from the input source.
    • Amazon Data Firehose can now decode more than one data object from a single data segment.
    • Improved the SaaS metering functionality ("Usagemeter") by adding better housekeeping and cleaning out reported data older than the number of days specified in the usagemeter_preserve tunable.
    • Updated the idna library from version 0.5.0 to 1.0.3 to address a security vulnerability.
  • Query

    • Added TCP port 9004 listener for incoming queries using the MySQL dialect. This enables MySQL-compatible query tools. Queries are proxied to the ClickHouse query engine.
  • Cluster Operations

    • Created a new tunable turbine_api_require_table_default_storage requiring explicit storage maps at table creation. This is especially useful for clusters using storage mapping.
    • Removed partition-vacuum. It was replaced with a lightweight partition-cleaner.
    • Switched partition-cleaner to a continuously running service instead of a scheduled job. Removed tunables partition_cleaner_enabled and partition_cleaner_schedule, decrease memory usage, and introduce bulk_delete_* metrics.
    • Converted hard-coded query peer liveness parameters into two new tunables, query_peer_liveness_initial_delay and query_peer_liveness_probe_timeout.
    • Added a tunable prometheus_curated_configmap to offer cluster operators dynamic control over which metrics are exported to Prometheus.
    • Added tunables disable_traefik_mysql_port, mysql_port, and mysql_port_disable_tls used by the traefik server for MySQL configuration.
    • Introduced Hydrolix Tunable Names (HTN), a flexible, hierarchical system for configuring tunables across an entire cluster.
    • Introduced two new tunables to improve control over the behavior of rolling updates: rollout_strategy_max_surge and rollout_strategy_max_unavailable. These are especially useful for deployments running near maximum capacity.
    • Provided automatic deployment of Kibana, Quesma, and Elasticsearch within Hydrolix using the data_visualization_tools and quesma_config tunables.
  • Integrations

    • Removed unnecessary references to shard keys throughout the Spark connector integration.
  • API

    • The API documentation available from the /config/schema/ endpoint has been reorganized and improved, making way for more complete API documentation in future releases.
  • UI

    • Improved Advanced Options rows in tables, which can now be clicked, bringing up an editor in the sidebar for better visibility.
    • Introduced a multi-selection dropdown on the System Health page that includes a new All option for logs.
    • Added a refresh interval of three minutes for the table health widget.
    • Added a Delete Project page to improve the deletion process. The page shows statistics about ingest latency and size, and warns before deletion. Your session must have is_superuser=true to be allowed to delete projects.
    • Added a Project Health page to show relevant information, such as statistics about ingest latency and size per table.
    • Upgrade Next.js from 14.2.20 to 14.2.22 to address security issues.

Bug Fixes

  • API

    • Handled null storage IDs correctly in API for the presigned URL feature.
    • Allowed users with single project or table permissions to see projects and tables to which they have access. Formerly, they saw nothing.
    • Allowed users with all permissions to a project to use the /projects endpoint for listing projects. Earlier, this was endpoint was forbidden.
    • Ensured only a single use for any invitation. This fixes an issue allowing a second claim on the same invitation.
    • Introduced a script that automatically runs on upgrade to patch up existing views that were affected by a previously fixed bug involving datetime and datetime64.
    • Required force_operation to delete storage if it was used in a column_value_mapping.
    • Accepted blank entries in the rust URL pre-signer for endpoint_url.
    • Fixed a bug involving burst and limit when creating projects, tables, and transforms.
    • Allowed the ?force_operation=true query string parameter to bypassing verification when updating summary table SQL. This allows recovery from poor configuration.
  • Authentication and Permissions

    • Fixed a breaking interaction between unified auth and query string auth.
    • Removed visibility for with the super_admin role in Hydrolix will no longer see projects that have been marked as deleted.
  • Configuration and Control

    • Accepted convertible types for core tunables via query_options endpoint. For example, hdx_query_unlimited_cnf is a boolean but can be "0" (false) or "1", "99", etc (true).
    • Ensured that the Hydrolix operator monitors changes to curated and applies immediately to avoid k8s and operator desynchronization. This also obviates the need to restart the operator.
    • A liveness check has been added to traefik-cfg so it will terminate instead of hanging while waiting for timeouts.
  • Core and Query

    • Corrected Spark connector integration to returns an empty ClickHouseRecord to the client, indicating absence of result. Earlier, it would return null, provoking a null pointer exception.
    • Filtered storage IDs requiring presigning to fix Internal Server Error from /catalog_urls.
    • Added resilience improvement when a disconnected peer is canceling a query. No need to crash in sensitive routine under cancellation conditions.
    • Corrected a rare thread safety issue in ClickHouse 24.8.6.70 upgrade.
    • Reserved use of hdx_query_output_file_enabled=1 as a query-level setting, not on table, project, or org. Improve consistency if hdx_query_output_filename is empty, by generating timestamped filenames.
    • Fixed a rare, load-induced segfault crash in query-head by wrapping cancellation detection in a mutex.
    • Replaced std::string with fixed-length char array to avoid deallocation failures under heavy memory pressure. This should reduce or eliminate exit codes of 139 also lacking stack trace output.
    • Prevented full table scan when SELECT queries with LIMIT find no data in the specified time range.
    • Fixed an issue where a segmentation fault occurred when running a catalog query if the turbine service was started before the postgresql service.
    • Fixed a segfault triggered when using the empty() SQL function with map columns.
    • Detected cases of INSERT INTO an invalid hdx_storage_id. Now exits instead of displaying a long error message.
    • Fixed a memory leak involving the AWS API. Aws::ShutdownAPI() is now called when Hydrolix is finished using it.
  • Merge and Data Lifecycle

    • Included the shard key along with time range when determining eligibility for partition merging.
  • Ingest

    • Avoided a possible segfault in partition handling during parse error checking. This was visible in both intake-head and merge-peer.
    • Corrected a reference counting error triggered in error-handling during batch ingestion with summary tables enabled. Symptom was infinite loop of "Outstanding partition status."
    • Stopped leaking file descriptors inside connection pool management during batch, alter and merge, by always closing response bodies.
    • Improved cleanup of orphaned files when upload to storage fails. Previously, a batch-peer might hang on shutdown.
    • Corrected a soft-delete interaction with Azure in which files set as 'permanent delete' remained. They will now be deleted.
    • Introduced better reporting of replicas when using a GET pool endpoint. The response includes count of current_replicas, and a range for replicas, matching the setting in the hydrolixcluster.yaml file.
    • Allowed multiple autoingest sources to use the same transform. This reduces resource costs during ingestion.
    • Updated autoingest to normalize URI filtering across Azure, GCP, and AWS. The overall behavior remains the same.
    • Added richer error reporting during batch autoingest, including bucket name, error message, and other fields. Previously, only a short text message was returned.
    • Corrected JSON-encoded storages escaping for the batch-head to read. This prevents data loss with autoingest.
    • Updated golang.org/x/net from v0.28 to v0.33 to address a security vulnerability that could allow Denial of Service (DOS) attacks.
  • UI

    • Avoid losing user input in the Table Health UI widget sidebar during re-rendering or when user switches to other tabs or windows.
    • Omit browser locale information to avoid causing bad SSO redirect.
    • Remove duplicated text entry block for schema definition when editing dictionaries.
    • Only display WURFL fields if WURFL is enabled.
    • Fix multiple display issues in sidebars when surfacing validation results from API to user. Pages affected were data lifecycle management and also stream, merge, flush, bucket, and rate limit settings.
    • Upgrade octokit library which includes fix for a regular expression backtracking denial of service vulnerability, CVE-2025-25289.
    • Display the replicas correctly in the scale page, by using current replicas count.
    • Update serialize-javascript library from 6.0.1 to 6.0.2, as well as dependencies, to address CVE 2024-11831.
    • When editing a role, the sidebar now behaves appropriately when adding or removing a policy.