31 October 2024 - v4.20.1

Array indexing, Amazon Data Firehose support, regex dictionaries

Notable New Features

  • Array Indexing
    • Arrays can now be used as indices. This feature must be manually enabled. Exceptions: Arrays of doubles, maps, and arrays are not indexed. New metrics added.
  • Amazon Data Firehose Support
    • Amazon Data Firehose can send data to the Hydrolix streaming HTTP endpoint using the X-Amz-Firehose-Access-Key authorization header and a new response format. There's also a new JSON subtype for transforms to allow Amazon Data Firehose's specific data encoding.
  • Added support for ClickHouse Regex Tree Dictionary
    • Regular expression tree dictionaries are a special type of dictionary which represent the mapping from key to attributes using a tree of regular expressions. There are some use cases, e.g. parsing of user agent strings, which can be expressed elegantly with regexp tree dictionaries.

Upgrade

Upgrade on GKE:

kubectl apply -f "https://www.hydrolix.io/operator/v4.20.1/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&gcp-storage-sa=${GCP_STORAGE_SA}"

Upgrade on EKS:

kubectl apply -f "https://www.hydrolix.io/operator/v4.20.1/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&aws-storage-role=${AWS_STORAGE_ROLE}"

Upgrade on LKE

kubectl apply -f "https://www.hydrolix.io/operator/v4.20.1/operator-resources?namespace=$HDX_KUBERNETES_NAMESPACE"

Rollback Instructions

IMPORTANT: The following is required if you need to rollback after upgrading to 4.20.1.

To roll back from v4.20.1 to a previous version, first shell into
the turbine-api pod and run a command to modify the catalog. You can do so by following these steps:

  1. Shell into the turbine-api pod. This can be done using kubectl. Determine the pod name by running:

    kubectl get pods -n ${HDX_KUBERNETES_NAMESPACE}
    

    Find the pod named turbine-api-{id}.
    Then, shell into the pod with:

    kubectl exec -n $HDX_KUBERNETES_NAMESPACE --stdin --tty turbine-api-{id} -- /bin/bash
    
  2. Run the following command:

    ./manage.py migrate orgs 0010
    

    If you do not perform these steps prior to performing a version rollback,
    the init-turbine-pod will emit errors related to a missing column from the config_api database.

  3. Perform the rollback with commands similar to the "Upgrade on GKE/EKS/LKE" commands above, using the old version number instead of 4.20.1.

  4. If you have any arrays in transforms with index:true, set them all to index:false to ensure data ingestion without any issues.

Changelog

General

  • API

    • Only active jobs can be cancelled.
    • Added support for ClickHouse regexp dictionaries.
    • Fixed a bug in handling transforms containing escaped backslashes (\\) within settings.format_details. Also corrected validation of the quote, escape, comment, and delimiter fields within a transform's settings.format_details. Previously, any arbitrary string was able to be provided for these fields.
    • Added a check to disallow the modification of UUIDs for transforms and views.
    • Transform UUIDS are now checked during transform updates and rejected if they don't match the transform.
    • Added a cluster_logs API endpoint that returns pod logs and cluster configuration in a cluster-logs.zip file. This is to be used by an upcoming version of the hdxcli command-line tool.
  • Control

    • Two new tunables: max_concurrent_queries (int) and max_server_memory_usage_perc (int)
    • To better support fleet management, logs can now be sent to an external Hydrolix cluster as well as the default internal Kafka-based logging.
    • Reconciliation errors have been fixed in the case when silence_linode_alerts: true.
    • Added support for Amazon Data Firehose authorization via the x-amz-firehose-access-key header
  • Core

    • Added verbose stacktrace for segfaults.

    • Fixed indexing for arrays.

    • Added new metrics in query detail stats: hdx_blocks_read, hdx_blocks_skipped. These metrics are being already captured at system level. Now available at query level.

    • Added support for ClickHouse regexp dictionaries.

    • Upgraded AWS libraries:

      • aws-c-common -> 0.9.12
      • aws-c-cal -> 0.6.10
      • aws-c-io -> 0.14.1
      • aws-c-compression -> 0.2.17
      • aws-checksums -> 0.1.17
      • aws-c-event-stream -> 0.4.0
      • aws-c-http -> 0.8.0
      • aws-c-sdkutils -> 0.1.13
      • aws-c-auth -> 0.7.10
      • aws-c-mqtt -> 0.10.1
      • aws-c-s3 -> 0.4.9
      • aws-crt-cpp -> 0.26.2
      • aws-sdk-cpp -> 1.11.285
    • A backoff duration tunable for Akamai SIEM requests has been introduced to eliminate unnecessary calls to the SIEM API.

  • Data

    • Upgraded Go to 1.23.
    • Added a new JSON subtype and response format for handling Amazon Data Firehose data.
    • Added a simplified usagemeter that sends aggregated rows to central cluster.
    • Partition lineage may now be analyzed due to new log messages from the merge service. Logs include paths of incoming partitions to be merged and paths of new partitions resulting from merge.
    • When ingest rejects rows, messages are more informative. There should be fewer "produced no events from request" messages, and up to 10 of the last errors will be logged.
    • We eliminated deadlocks in intake on clusters with very high summary activity.
  • UI

    • Added support for ClickHouse regexp dictionaries.
    • Require user confirmation to truncate a table, delete a table, or delete a project.
    • Omit sample data from request object on create or edit transform if there is nothing in the sample_data field.
    • Display an error row in the Users tab of the Security page the in the event that the audit field is returned with a user object. This can happen in the event that user lists between Keycloak and PostgreSQL are out of sync.
    • Added a new organization settings page which contains query options for the organization.
    • Added access to update query options in the summary table UI.
    • Only display valid pool service types in resource pool list.
    • Fixed 503 response ("invalid URL") when trying to edit a transform with force_operation=true or when trying to delete a credential.
    • Alphabetized left navigation menu.
    • Added a new Credential ID field to the credential editing form.
    • Transforms: added Akamai mPulse, Firehose, and CloudWatch to format options sidebar (JSON only).
    • The Transform Output Columns editor now has the column view on the left, establishing it as the default.
    • Replaced purple login Hydrolix logo with new blue Hydrolix logo.
    • Updated alter jobs form: changed SQL field to a text editor, added some autocomplete items.

Bug Fixes

  • API

    • When creating or editing a kinesis source and kcl=false, validates stream_name and checkpointer.
    • Added auditing to various points of the API including the truncate and populate table catalog functionalities.
    • Added Login component to OpenAPI schema definition.
    • Fixed erroneous conflict failure when updating a view which does not have a conflict.
    • Fixed bug preventing JSON list type from being correctly parsed thus fixing an issue parsing the sample data.
    • Alter jobs are no longer created with the created_by_user field set to null.
    • 500 errors are no longer returned when PATCHing rate limits with empty request bodies.
    • Fixed bug causing duplicate table IDs to show up in the deleted_tables entry of an organizations's published configuration.
  • Control

    • Fixed missing user credentials on jobs, thereby allowing partition vacuum to operate on multi-bucket.
    • When using the older basic_auth authentication, browser authentication pop-ups no longer appear when logging into the UI.
  • Core

    • Disallowed filtering by non-deterministic functions (now(), yesterday(), etc.) in summary SQL.
    • Removed synchronization of legacy *_globals.json files, dramatically reducing the number of HTTP 400 responses from cloud storage.
  • Data

    • Fixed bug preventing load-sample-project from operating using Azure as the batch input source.
  • UI

    • Audit trail: Added parsing of additional request parameters to the axios interceptor.
    • Fixed a bug preventing users from executing INSERT queries via the query page by appending FORMAT JSON to these queries.
    • Tables in summary form and custom views don't jump to the top after actions like drag and drop are performed.
    • Upgraded Next.js to v14.2.15 to prevent cache poisoning.
    • Fixed a bug in which if the user created a table, configured batch auto-ingest, then added a transform, the UI would not display the configured batch ingest.
    • Upgrade rollup package from 2.79.1 to 2.79.2 to resolve DOM clobbering vulnerability.
    • Protections against server-side request forgery (SSRF).
    • Fixed a bug in which the error message indicating a type mismatch between a csv-type transform and json-type sample_data would disappear despite the mismatch persisting.
    • Fixed a 503 response code from a prometheus endpoint resulting in blank charts on the dashboard page.
    • Changed the form to edit SIEM sources: added copy buttons and made it easier to update client secret fields.