14 January 2024 - v4.22.1
8 days ago by Katherine Maack
Spread list, many smaller features and fixes
Notable New Feature
- Spread List
- Added
spread_list
feature to Storage Mapping section of the UI. This feature randomly distributes data among multiple storages to improve performance and help circumvent cloud storage throttling. Learn more about our Spread List feature.
- Added
Upgrade
Upgrade on GKE
kubectl apply -f "https://www.hydrolix.io/operator/v4.22.1/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&gcp-storage-sa=${GCP_STORAGE_SA}"
Upgrade on EKS
kubectl apply -f "https://www.hydrolix.io/operator/v4.22.1/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&aws-storage-role=${AWS_STORAGE_ROLE}"
Upgrade on LKE
kubectl apply -f "https://www.hydrolix.io/operator/v4.22.1/operator-resources?namespace=$HDX_KUBERNETES_NAMESPACE"
Changelog
General
-
API
- When configuring a source, the transform field is now optional when a default transform exists.
- Pools can be specified as a dictionary within a Hydrolix Cluster Resource. Previously they could only be specified as a list.
- The API can now generate presigned URLs to partitions upon request. This is to support a new version of the Hydrolix Spark Connector.
- Jobs are no longer Kubernetes cron jobs, but are handled by the procrastinate library to consolidate API processes. This change also decreased the memory usage of the
refresh_job_statuses
config API command, preventing the process from being OOM killed on clusters with many jobs. - Hydrolix's list of ClickHouse input and output formats has been updated, providing better support for dictionaries and output formats via
hdx_query_output_format
. - Validation is now provided for transform subtypes. Currently, the subtype must be one of
firehose
,akamai_mpulse
, orcloudwatch
.
-
Core
- Added setting
hdx_summary_override_indexes
which disables indexing within summary tables for the specified columns.
- Added setting
-
Control
- Added
traefik_service_annotations
tunable which allows a user to supply custom traefik service annotations - Preparation work to facilitate tunable overrides by entity (service, pool, container). Refactors the
hkt
service to removepods.py
and splits resources based on the stage in which they are created. - Implemented annotations on Kubernetes objects to instruct the operator not to overwrite specified manual user modifications.
- Added
-
Data
- Updated rejection index file paths to contain task type and partition id.
- Introduced the
autoingest_unique_file_paths
tunable which ensures Hydrolix ignores duplicate paths when ingesting data into a table via autoingest. - The Partition Cleaner has a new tunable,
partition_cleaner_grace_period
, designating the minimum age of a partition before it is considered for deactivation or deletion. - Deactivated catalog entries now have a recorded reason for deactivation:
merged
,decay
,orphan
,deleted
, orcorrupt
. - S3 autoingest now supports more types of S3 notifications when read by SNS and sent to SQS. This allows customers to de-select "raw message delivery" when configuring those services and still import data.
- Added the ability to decode more than one data object from a single data segment in firehose. This ensures proper handling of NDJSON data segments.
-
UI
- Added
spread_list
feature to Storage Mapping section. Details here. - Pools form updates: Hydrolix now only shows valid service options for pools and makes service type read-only when editing a pool.
- Upgraded libraries to address security concerns:
- eslint -> 8.57.1
- eslint-import-resolver-typescript -> 3.5.2
- cypress -> 3.0.6
- next -> 14.2.20
- css-loader -> 6.11.0
- nanoid -> 5.0.9
- Rate limits are now configurable for tables and transforms.
- Transforms may now be cloned across projects and tables.
- The batch auto-ingest form now supports cloud provider region.
- Added
Bug Fixes
-
API
- When inviting a user, Hydrolix allows inviting an existing user again if the invitation did not get saved in the first try.
- Removed
summary-peer
as a valid pool type. - Fixed a race condition when retrieving the current configuration. Hydrolix now retrieves the most recent version from cloud storage rather than checking the database first for the most current version.
- The
refresh_job_statuses
API command now uses much less memory, preventing out-of-memory scenarios on systems with many jobs.
-
Core
- Turbine containers now respect the
log_level
settings in the Hydrolix YAML configuration. - Hydrolix reports a greater variety of values for the
http_status_code
histogram. - Added tunables to control logging for the query executor, catalog directory, zookeeper, and filesystem cache.
- Fixed comparators for Array/Map/Tuple data types.
- Turbine containers now respect the
-
Control
- Fixed Traefik data loss on restart when
unified_auth: true
. - The
otel_endpoint
tunable can now be set: we've restored a missing ConfigMap volume for theturbine-api
pod. - The
postgresql-client
version in thetooling
pod has been updated to be compatible with the version of PostgreSQL server, currently 15.4.
- Fixed Traefik data loss on restart when
-
Data
- Fixed a bug in which
batch-head
could not read from Linode and write to AWS without theS3_ENDPOINT
andAWS_REGION
environment variables set. - Addressed a vulnerability related to a misuse of
ServerConfig.PublicKeyCallback
which could cause an authorization bypass. - Hydrolix cluster now retries failed configuration data loading up to three times with 500ms pauses between.
- The transform validator works now, due to a small nil reference bugfix in OOM handling.
- Kinesis logs now have
component
andtype
fields populated. - Correctly handle special characters in autoingest jobs. This change allows autoingest to create batch jobs successfully.
- Fixed a bug in which
-
UI
- Fixed a bug causing the UI to make requests to the invalid path
/projects/null/
. - The Table Health UI no longer rounds compression percentages to the nearest 10%.
- Fixed a bug causing the UI to make requests to the invalid path