31 October 2024 - v4.20.1
Array indexing, Amazon Data Firehose support, regex dictionaries
Notable New Features
- Array Indexing
- Arrays can now be used as indices. This feature must be manually enabled. Exceptions: Arrays of doubles, maps, and arrays are not indexed. New metrics added.
- Amazon Data Firehose Support
- Amazon Data Firehose can send data to the Hydrolix streaming HTTP endpoint using the
X-Amz-Firehose-Access-Key
authorization header and a new response format. There's also a new JSON subtype for transforms to allow Amazon Data Firehose's specific data encoding.
- Amazon Data Firehose can send data to the Hydrolix streaming HTTP endpoint using the
- Added support for ClickHouse Regex Tree Dictionary
- Regular expression tree dictionaries are a special type of dictionary which represent the mapping from key to attributes using a tree of regular expressions. There are some use cases, e.g. parsing of user agent strings, which can be expressed elegantly with regexp tree dictionaries.
Upgrade
Upgrade on GKE:
kubectl apply -f "https://www.hydrolix.io/operator/v4.20.1/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&gcp-storage-sa=${GCP_STORAGE_SA}"
Upgrade on EKS:
kubectl apply -f "https://www.hydrolix.io/operator/v4.20.1/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&aws-storage-role=${AWS_STORAGE_ROLE}"
Upgrade on LKE
kubectl apply -f "https://www.hydrolix.io/operator/v4.20.1/operator-resources?namespace=$HDX_KUBERNETES_NAMESPACE"
Rollback Instructions
IMPORTANT: The following is required if you need to rollback after upgrading to 4.20.1.
To roll back from v4.20.1 to a previous version, first shell into
the turbine-api
pod and run a command to modify the catalog. You can do so by following these steps:
-
Shell into the
turbine-api
pod. This can be done usingkubectl
. Determine the pod name by running:kubectl get pods -n ${HDX_KUBERNETES_NAMESPACE}
Find the pod named
turbine-api-{id}
.
Then, shell into the pod with:kubectl exec -n $HDX_KUBERNETES_NAMESPACE --stdin --tty turbine-api-{id} -- /bin/bash
-
Run the following command:
./manage.py migrate orgs 0010
If you do not perform these steps prior to performing a version rollback,
the init-turbine-pod will emit errors related to a missing column from theconfig_api
database. -
Perform the rollback with commands similar to the "Upgrade on GKE/EKS/LKE" commands above, using the old version number instead of 4.20.1.
-
If you have any arrays in transforms with
index:true
, set them all toindex:false
to ensure data ingestion without any issues.
Changelog
General
-
API
- Only active jobs can be cancelled.
- Added support for ClickHouse regexp dictionaries.
- Fixed a bug in handling transforms containing escaped backslashes (
\\
) withinsettings.format_details
. Also corrected validation of thequote
,escape
,comment
, anddelimiter
fields within a transform'ssettings.format_details
. Previously, any arbitrary string was able to be provided for these fields. - Added a check to disallow the modification of UUIDs for transforms and views.
- Transform UUIDS are now checked during transform updates and rejected if they don't match the transform.
- Added a
cluster_logs
API endpoint that returns pod logs and cluster configuration in acluster-logs.zip
file. This is to be used by an upcoming version of thehdxcli
command-line tool.
-
Control
- Two new tunables:
max_concurrent_queries
(int) andmax_server_memory_usage_perc
(int) - To better support fleet management, logs can now be sent to an external Hydrolix cluster as well as the default internal Kafka-based logging.
- Reconciliation errors have been fixed in the case when
silence_linode_alerts: true
. - Added support for Amazon Data Firehose authorization via the
x-amz-firehose-access-key
header
- Two new tunables:
-
Core
-
Added verbose stacktrace for segfaults.
-
Fixed indexing for arrays.
-
Added new metrics in query detail stats:
hdx_blocks_read
,hdx_blocks_skipped
. These metrics are being already captured at system level. Now available at query level. -
Added support for ClickHouse regexp dictionaries.
-
Upgraded AWS libraries:
- aws-c-common -> 0.9.12
- aws-c-cal -> 0.6.10
- aws-c-io -> 0.14.1
- aws-c-compression -> 0.2.17
- aws-checksums -> 0.1.17
- aws-c-event-stream -> 0.4.0
- aws-c-http -> 0.8.0
- aws-c-sdkutils -> 0.1.13
- aws-c-auth -> 0.7.10
- aws-c-mqtt -> 0.10.1
- aws-c-s3 -> 0.4.9
- aws-crt-cpp -> 0.26.2
- aws-sdk-cpp -> 1.11.285
-
A backoff duration tunable for Akamai SIEM requests has been introduced to eliminate unnecessary calls to the SIEM API.
-
-
Data
- Upgraded Go to 1.23.
- Added a new JSON subtype and response format for handling Amazon Data Firehose data.
- Added a simplified usagemeter that sends aggregated rows to central cluster.
- Partition lineage may now be analyzed due to new log messages from the
merge
service. Logs include paths of incoming partitions to be merged and paths of new partitions resulting from merge. - When ingest rejects rows, messages are more informative. There should be fewer "produced no events from request" messages, and up to 10 of the last errors will be logged.
- We eliminated deadlocks in intake on clusters with very high summary activity.
-
UI
- Added support for ClickHouse regexp dictionaries.
- Require user confirmation to truncate a table, delete a table, or delete a project.
- Omit sample data from request object on create or edit transform if there is nothing in the
sample_data
field. - Display an error row in the Users tab of the Security page the in the event that the audit field is returned with a user object. This can happen in the event that user lists between Keycloak and PostgreSQL are out of sync.
- Added a new organization settings page which contains query options for the organization.
- Added access to update query options in the summary table UI.
- Only display valid pool service types in resource pool list.
- Fixed 503 response ("invalid URL") when trying to edit a transform with
force_operation=true
or when trying to delete a credential. - Alphabetized left navigation menu.
- Added a new Credential ID field to the credential editing form.
- Transforms: added Akamai mPulse, Firehose, and CloudWatch to format options sidebar (JSON only).
- The Transform Output Columns editor now has the column view on the left, establishing it as the default.
- Replaced purple login Hydrolix logo with new blue Hydrolix logo.
- Updated alter jobs form: changed SQL field to a text editor, added some autocomplete items.
Bug Fixes
-
API
- When creating or editing a kinesis source and
kcl=false
, validatesstream_name
andcheckpointer
. - Added auditing to various points of the API including the truncate and populate table catalog functionalities.
- Added Login component to OpenAPI schema definition.
- Fixed erroneous conflict failure when updating a view which does not have a conflict.
- Fixed bug preventing JSON list type from being correctly parsed thus fixing an issue parsing the sample data.
- Alter jobs are no longer created with the
created_by_user
field set to null. - 500 errors are no longer returned when PATCHing rate limits with empty request bodies.
- Fixed bug causing duplicate table IDs to show up in the
deleted_tables
entry of an organizations's published configuration.
- When creating or editing a kinesis source and
-
Control
- Fixed missing user credentials on jobs, thereby allowing partition vacuum to operate on multi-bucket.
- When using the older
basic_auth
authentication, browser authentication pop-ups no longer appear when logging into the UI.
-
Core
- Disallowed filtering by non-deterministic functions (now(), yesterday(), etc.) in summary SQL.
- Removed synchronization of legacy *_globals.json files, dramatically reducing the number of HTTP 400 responses from cloud storage.
-
Data
- Fixed bug preventing
load-sample-project
from operating using Azure as the batch input source.
- Fixed bug preventing
-
UI
- Audit trail: Added parsing of additional request parameters to the axios interceptor.
- Fixed a bug preventing users from executing INSERT queries via the query page by appending
FORMAT JSON
to these queries. - Tables in summary form and custom views don't jump to the top after actions like drag and drop are performed.
- Upgraded Next.js to v14.2.15 to prevent cache poisoning.
- Fixed a bug in which if the user created a table, configured batch auto-ingest, then added a transform, the UI would not display the configured batch ingest.
- Upgrade rollup package from 2.79.1 to 2.79.2 to resolve DOM clobbering vulnerability.
- Protections against server-side request forgery (SSRF).
- Fixed a bug in which the error message indicating a type mismatch between a csv-type transform and json-type
sample_data
would disappear despite the mismatch persisting. - Fixed a 503 response code from a prometheus endpoint resulting in blank charts on the dashboard page.
- Changed the form to edit SIEM sources: added copy buttons and made it easier to update client secret fields.