3 October 2024 - v4.19.1
3 months ago by Rich Vanderwal
Cross-cloud settings batch sources, better SaaS metering, and better transform verification UI
Notable New Features
- Cross-Cloud settings are now available in batch sources
- Rather than simply specifying a credential ID and URL, batch jobs now allow a full specification of the data source in a way that’s compatible with the multi-cloud functionality available with storages. Examples are on our batch import page.
- Better SaaS metering functionality
- SaaS metering functionality (also known as "usagemeter") has been added, and we re-worked the usage API to support Hydrolix Cascade for AWS. Hydrolix now reports more granular and complete billing information.
- Better transform verification UI
- The transform verification UI has been enhanced to show all keys returned from transform verification when editing or adding a new transform. There’s also a new tab that will show you the complete results of the transform verification.
Breaking Changes
Batch Job API Change
Sources for batch jobs must be specified using the new format that supports multi-cloud credentials. In your batch source settings object, rather than just including
url
and perhapscredential_id
, include these fields:{ "bucket_name": "string", "bucket_path": "/", "region": "string", "endpoint": "string", "cloud": "string", "credential_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6" }
For example, if you’re using a local Minio server, your configuration would look like this:
{ "bucket_name": "qe-datasets", "bucket_path": "/", "region": "antartica-central1", "endpoint": "http://minio", "cloud": "aws", "credential_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6" }
See Batch Ingest for more details about the new job format.
Upgrade
Upgrade on GKE
kubectl apply -f "<https://www.hydrolix.io/operator/v4.19.1/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&gcp-storage-sa=${GCP_STORAGE_SA}">
Upgrade on EKS
kubectl apply -f "<https://www.hydrolix.io/operator/v4.19.1/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&aws-storage-role=${AWS_STORAGE_ROLE}">
Upgrade on LKE
kubectl apply -f "<https://www.hydrolix.io/operator/v4.19.1/operator-resources?namespace=$HDX_KUBERNETES_NAMESPACE">
Changelog
General
- API
- In preparation for UI table that shows calculated query options by global, project, and table settings, a new
{{base_orgs_url}}/{{org_uuid}}/projects/{{project_id}}/tables/{{table_id}}/query_options/
endpoint has been added to show this hierarchy for the table specified. - The API now returns a
scope_name
field when returning role information to reduce the number of queries the UI performs to display roles. - Updated Procrastinate library from 2.9.2 to 2.13.2. This resolves several bugs and introduces a new periodic task to automatically retry any stalled jobs.
- Rather than defining SQL functions individually, customers can now use a new API endpoint for bulk uploading SQL functions. See the API documentation for details.
- Batch ingest now supports multi-buckets across different cloud providers just like storages do. Specify more fields in the batch source settings by following the new examples on our batch import page.
- In the config API, a new
publish_task_id
is included in the response of requests that trigger a Hydrolix configuration publish. This identifier can be used to check the status of the publish usingGET /config/v1/tasks/:id
.
- In preparation for UI table that shows calculated query options by global, project, and table settings, a new
- Control
- SaaS metering functionality (also known as "usagemeter") has been added, and we re-worked the usage API to support Hydrolix Cascade for AWS. Reports more granular and complete billing information.
- We’ve added protection against misuse of
traefik-cfg
global variables and set client timeouts intraefik-cfg
,promwaltz
andoperator
services.
- Core
- Postgres-related libraries have been updated:
- libpq: 13.0 -> 16.4
- libpqxx: 7.7.2 -> 7.9.2
- crc32c: 1.1.1 -> 1.1.2
- Turbine (the Hydrolix indexer) errors are now captured and reported to the API and the UI. In particular, the user is warned when attempting to create an ALTER job when that same job is already running.
- Postgres-related libraries have been updated:
- Data
- The Decay service uses the catalog database more efficiently, taking better advantage of indexed columns and using a better algorithm overall. It performs work in batches controlled by new tuneables
decay_max_deactivate_iterations
,decay_reap_batch_size
, anddecay_max_reap_iterations
. It also more gracefully handles unset tuneables. - Updated jsonpointer library from v0.19.6 to v0.21.0 to avoid a panic involving bad paths, and also updated other libraries indirectly.
- When a batch import job encounters data at the beginning of a file that it can’t parse correctly according to the specified transform, the job is aborted, rather than filling the logs with rejection messages and creating lots of reject data. Jobs are also canceled if there’s a problem uploading files to object storage.
- The Decay service uses the catalog database more efficiently, taking better advantage of indexed columns and using a better algorithm overall. It performs work in batches controlled by new tuneables
- UI
- Security enhancements have been made to the login page: passwords aren’t displayed on the console, autofill is disabled, and username/password fields have random strings as names.
- The transform verification UI has been enhanced to show all keys returned from transform verification when editing or adding a new transform. There’s also a new tab that will show you the complete results of the transform verification.
- Security related patches were performed:
- cypress: 12.7.0 -> 13.14.1
- next.js: 13.2.0 -> 14.1.1
- wait-on: 6.0.1 -> 7.0.0
- Improvements to the summary table column analysis UI have been made. Some columns have been renamed, and the “Columns to Analyze” table no longer jumps to the top when an item is checked.
- The create and edit function pages return to the edit function page, rather than redirecting to the list of functions. This facilitates easier iterative editing and testing of functions.
- The “Users -> Invite new user” form has been extended to send multiple invitations, rather than just one at a time.
Bug Fixes
- API
- JSON transforms may now be published when submitted without sample data.
- When a transform is deleted, any associated sources and pools are now also deleted. The delete operation is aborted if there is a job in progress that uses the transform.
- All endpoints in the config API return the correct
Allow
header information when responding to an HTTP OPTIONS request. The only exceptions to this arechange_password
anddictionaries/files/{filename}
.
- Core
- Range indexes are now read correctly. Previously, there were edge cases in WHERE clauses on indexed columns that returned incorrect results.
- Internal LRU cache files weren’t being cleaned up correctly, causing space outages in local ephemeral storage. This has been fixed, and there’s now a
disk_cache_entry_max_ttl_minutes
tunable to change the default TTL (6 hours) for files in the cache. result_rows
is updated during queries even whenhdx_query_debug
is set to false. Now theX-HDX-Query-Stats
response header will include the correctresult_rows
value.- The ORDER BY LIMIT optimization has been disabled for queries that use OFFSET, as there is no way to know
min_timestamp
ormax_timestamp
based on the OFFSET.
- Data
- The Vacuum service is improved, including a fix for a malformed path bug and better deletion of rejected data. Previously, rejected data in the
unknown/unknown
andunknown/stream
namespace would be skipped by the vacuum service. - Azure files that are “soft deleted” no longer show up in internal file listings.
- The Vacuum service now considers both inactive and active partitions when examining the catalog, since both types of partitions exist in object storage. Failure to do so was causing partitions to be prematurely deleted from object storage.
- As tables and their transforms are modified, tables undergo a series of revisions. Each revision has its own time bucketer, and previously, these time bucketers and their associated goroutines weren’t being removed.
- The Vacuum service is improved, including a fix for a malformed path bug and better deletion of rejected data. Previously, rejected data in the
- UI
- The extraneous top-level “Values” setting has been removed from the bucket settings dialogue. All values that aren’t in the maps below the top-level setting will be sent to the default storage.
- Updated webpack from v5.75.0 to v5.94.0, resolving a cross-realm object access security issue in Webpack for versions below v5.76.0.
- We fixed an optional chaining operator that was keeping tables with no bucket settings from later having bucket settings defined.