30 August 2024 - v4.18.2

New API endpoints; a feature to support our upcoming observability product on AWS called “Cascade” and configurable worker pools for the ingest head services.

Notable New Features

New API for Background Tasks
- Configuration changes are now published asynchronously, and there are 4 new endpoints for monitoring asynchronous tasks.
  - GET /tasks/ - list all scheduled tasks
  - GET /tasks/:id - get details on a scheduled task, including any related events
  - GET /tasks/events - list all events
  - GET /tasks/events/:id - get details on an event, including expanded job information
New Kinesis Consumer
- To support our upcoming observability product on AWS called “Cascade,” an additional type of Kinesis source directly utilizes the AWS Kinesis Client Library to provide robust Kinesis ingest. Use of this new kinesis-kcl-consumer requires a manual switch-over from the older Kinesis ingest system.
Intake Heads and Stream Heads can now be assigned separate pools via the API
- Ingest heads (intake-head and stream-head) are now poolable via the API, and users can create, get, update, patch, and delete pools for both services. A new GET endpoint lists the available ingest URLs: https://{hostname}/config/v1/pools/ingest_endpoints

Upgrade on GKE

kubectl apply -f "https://www.hydrolix.io/operator/v4.18.2/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&gcp-storage-sa=${GCP_STORAGE_SA}"

Upgrade on EKS

kubectl apply -f "https://www.hydrolix.io/operator/v4.18.2/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&aws-storage-role=${AWS_STORAGE_ROLE}"

Upgrade on LKE

kubectl apply -f "<https://www.hydrolix.io/operator/v4.18.2/operator-resources?namespace=$HDX_KUBERNETES_NAMESPACE"

General

API
- Added support for using the force_operation query parameter when deleting buckets. Added a new endpoint which lists resources currently using a given bucket.
- Exposed emailVerified field to users
- Intake Heads and Stream Heads can now be assigned separate pools via the API.
- Config changes are published asynchronously, improving API performance. Added endpoints under /tasks to allow monitoring the status of background tasks which are:
  - GET /tasks/ - list all scheduled tasks
  - GET /tasks/:id - get details on a scheduled task, including any related events
  - GET /tasks/events - list all events
  - GET /tasks/events/:id - get details on an event, including expanded job information
- To avoid broken summary tables due to changes to transforms, automatically-generated summary transforms are now read-only. Modify a table’s summary SQL via the summary settings object in the table in the table patch API.
- Upgraded psycopg from version 2.9.9 to 3.2.1
- Batch job and auto ingest updates for cross-cloud support. We’ve updated the configured data source options for a batch job from a url to a json object containing the following:
```
{  
  "bucket_name": {string},  
  "bucket_path": {string} (default: "/"),  
  "region": {string},  
  "endpoint": {string},  
  "cloud": {string},  
  "credential_id": {string-uuid} (nullable: true)  
}
```
  Existing batch jobs created in versions prior to this change will continue to be returned from the config API in the format they were created with.
Control
- Reduced the size of init-* containers to reduce overall cluster load, especially when spinning up many peers.
Core
- Upgraded Zstandard library from version 1.5.0 to 1.5.6.
- Returns the following additional headers when hdx_query_debug=true:
  - limit_optimization
  - hdx_query_pool_name
  - query_detail_runtime_stats
  - result_rows
- Upgraded the c-ares library from version 1.16.1 to 1.32.2
Data
- To support our upcoming observability product on AWS called “Cascade,” an additional type of Kinesis source directly utilizes the AWS Kinesis Client Library to provide robust Kinesis ingest. Use of this new kinesis-kcl-consumer requires a manual switch-over from the older Kinesis ingest system.
- Hydrolix now verifies the Turbine indexer is running before issuing indexer requests. This prevents request failures in the event of temporary unavailability of the indexer e.g. while restarting.
UI
- Added support for dictionary load levels in the UI
- Added row count, data volume, and cardinality to raw (non-summary) tables. Reorganized summary table analysis tab within the Data page.
- Improvements to the dictionary page.
- Pool form has been modified so the name is read-only when editing
- Batch job and auto ingest updates for cross-cloud support. We’ve expanded the configured data source options when adding a batch job for a table.

Bug Fixes

API
- Fixed Transform updates that previously failed with a conflict when array- or map-type elements had the primary field added or removed on update
- Fixed erroneous limitations applied to the scale_profile field: allows this field to be used with any pool type, not just merge-peer; allows users to assign custom scale profiles to the small, medium, and large merge pools; any pool running workloads for the merge-peer service type can be assigned to small, medium, or large. Relaxed validation for pooling scale profiles, allowing names other than I, II, and III.
- Kinesis endpoint no longer responds to HTTP PATCH with 500 error
- Removed dictionary verification query during dictionary creation to avoid errors due to bad user query settings.
- Sample data submitted with a transform is now validated. This ensures it’s the same type as the transform handles, whether it’s CSV or JSON.
- The API now validates the memory_coefficient table setting. Non-numeric strings and negative numbers are disallowed.
- Mismatches between Keycloak and internal user databases are gracefully handled, rather than returning 500 errors. Users with this kind of mismatch are marked for auditing.
- Ensured that all Keycloak users are present in the config API’s database.
- Fixed HTTP 500 errors when attempting to upload a catalog file but not actually sending it. The upload is validated and will return an HTTP 400 if there’s a problem.
- Fixed a bug in which a valid PATCH request to https://{hostname}/config/v1/orgs/{org_id}/projects/{project_id}/tables/{id}/ would throw a 500 error
- When attempting to delete an org, the API will now send an HTTP 405 “Method not allowed” rather than an HTTP 500.
- When creating dictionaries, if there’s a data type array, it must have elements inside. Also, the API now accepts said arrays with elements in dictionary definitions.
- Fixed a bug causing summary tables to incorrectly retain their original source table as the specified value for the parents setting when updated to use a different source table. This change has made the parents setting read-only.
Core
- Fixed a bug returning erroneous results for IS NULL queries against non-nullable fields
- Partial fix for ‘hydro’ project cache not removing files as expected, causing some pods to enter a failed state after their container storage limits are exceeded. To that end, added tunables for disk cache removal: disk_cache_cull_start_perc (Percentage of cache disk space used before starting to remove files), disk_cache_cull_stop_perc (Percentage of cache disk space used before stopping removing files), and disk_cache_redzone_start_perc (Minimum percentage of cache disk space used to be considered as redzone)
- Correctly return values for query_id and initial_query_id columns in hydro.logs for Indexer and Alter queries
- Queries no longer produce “Unknown HDX SETTING” errors when user hdx_query_optimize_order_by_primary
- Fixed a bug resulting in Turbine server segfaults
Data
- This change enables users to decorate rows with the "basic" Scientia Mobile license. Previously, these decorators would only work with the "standard", and would error if a "basic" asset was specified in the transform configuration.
UI
- Fixed bug preventing Audit Trail results from displaying when the result was non-paginated
- Audit Trails tab on Security page: change query parameters date_min to min_date and date_max to max_date
- Removed expected_tb_per_day from the table and summary table UI
- Fixed Alter Jobs to ensure the ‘commit’ option displays when appropriate. When an Alter job has a pending status, the 'commit' option is now available. When the Alter job has the status done, the 'commit' option is unavailable.
- Dashboard now shows correct data for Queries per Second chart.
- Fixed a bug in which lengthy transform names would intersect with their respective descriptions
- Added two arguments to generate-stream related to sequencing. --seqColumn <string> which will designate a string-typed column for sequence numbers, and --seqPrefix <string> which will prepend the indicated string to the sequence for concurrent generate-stream runs.
- Fixed a bug that caused bucket values to be overwritten when multiple column mappings were used
- Fixed a bug when deleting a column from a transform with multiple columns. Previously, the first column would be deleted regardless of the column selected for deletion.
- Fixed a bug which prevented users from being able to edit the bucket settings of a table in the case of a table created without the settings defined.