3 October 2024 - v4.19.1

Cross-cloud settings batch sources, better SaaS metering, and better transform verification UI

Notable New Features

  • Cross-Cloud settings are now available in batch sources
    • Rather than simply specifying a credential ID and URL, batch jobs now allow a full specification of the data source in a way that’s compatible with the multi-cloud functionality available with storages. Examples are on our batch import page.
  • Better SaaS metering functionality
    • SaaS metering functionality (also known as "usagemeter") has been added, and we re-worked the usage API to support Hydrolix Cascade for AWS. Hydrolix now reports more granular and complete billing information.
  • Better transform verification UI
    • The transform verification UI has been enhanced to show all keys returned from transform verification when editing or adding a new transform. There’s also a new tab that will show you the complete results of the transform verification.

Breaking Changes

🚧

Batch Job API Change

Sources for batch jobs must be specified using the new format that supports multi-cloud credentials. In your batch source settings object, rather than just including url and perhaps credential_id, include these fields:

{
  "bucket_name": "string",
  "bucket_path": "/",
  "region": "string",
  "endpoint": "string",
  "cloud": "string",
  "credential_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6"
}

For example, if you’re using a local Minio server, your configuration would look like this:

{  
  "bucket_name": "qe-datasets",  
  "bucket_path": "/",  
  "region": "antartica-central1",  
  "endpoint": "http://minio",  
  "cloud": "aws",  
  "credential_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6"  
}

See Batch Ingest for more details about the new job format.

Upgrade

Upgrade on GKE

kubectl apply -f "<https://www.hydrolix.io/operator/v4.19.1/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&gcp-storage-sa=${GCP_STORAGE_SA}">

Upgrade on EKS

kubectl apply -f "<https://www.hydrolix.io/operator/v4.19.1/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&aws-storage-role=${AWS_STORAGE_ROLE}">

Upgrade on LKE

kubectl apply -f "<https://www.hydrolix.io/operator/v4.19.1/operator-resources?namespace=$HDX_KUBERNETES_NAMESPACE">

Changelog

General

  • API
    • In preparation for UI table that shows calculated query options by global, project, and table settings, a new {{base_orgs_url}}/{{org_uuid}}/projects/{{project_id}}/tables/{{table_id}}/query_options/ endpoint has been added to show this hierarchy for the table specified.
    • The API now returns a scope_name field when returning role information to reduce the number of queries the UI performs to display roles.
    • Updated Procrastinate library from 2.9.2 to 2.13.2. This resolves several bugs and introduces a new periodic task to automatically retry any stalled jobs.
    • Rather than defining SQL functions individually, customers can now use a new API endpoint for bulk uploading SQL functions. See the API documentation for details.
    • Batch ingest now supports multi-buckets across different cloud providers just like storages do. Specify more fields in the batch source settings by following the new examples on our batch import page.
    • In the config API, a new publish_task_id is included in the response of requests that trigger a Hydrolix configuration publish. This identifier can be used to check the status of the publish using GET /config/v1/tasks/:id.
  • Control
    • SaaS metering functionality (also known as "usagemeter") has been added, and we re-worked the usage API to support Hydrolix Cascade for AWS. Reports more granular and complete billing information.
    • We’ve added protection against misuse of traefik-cfg global variables and set client timeouts in traefik-cfg, promwaltz and operator services.
  • Core
    • Postgres-related libraries have been updated:
      • libpq: 13.0 -> 16.4
      • libpqxx: 7.7.2 -> 7.9.2
      • crc32c: 1.1.1 -> 1.1.2
    • Turbine (the Hydrolix indexer) errors are now captured and reported to the API and the UI. In particular, the user is warned when attempting to create an ALTER job when that same job is already running.
  • Data
    • The Decay service uses the catalog database more efficiently, taking better advantage of indexed columns and using a better algorithm overall. It performs work in batches controlled by new tuneables decay_max_deactivate_iterations, decay_reap_batch_size, and decay_max_reap_iterations. It also more gracefully handles unset tuneables.
    • Updated jsonpointer library from v0.19.6 to v0.21.0 to avoid a panic involving bad paths, and also updated other libraries indirectly.
    • When a batch import job encounters data at the beginning of a file that it can’t parse correctly according to the specified transform, the job is aborted, rather than filling the logs with rejection messages and creating lots of reject data. Jobs are also canceled if there’s a problem uploading files to object storage.
  • UI
    • Security enhancements have been made to the login page: passwords aren’t displayed on the console, autofill is disabled, and username/password fields have random strings as names.
    • The transform verification UI has been enhanced to show all keys returned from transform verification when editing or adding a new transform. There’s also a new tab that will show you the complete results of the transform verification.
    • Security related patches were performed:
      • cypress: 12.7.0 -> 13.14.1
      • next.js: 13.2.0 -> 14.1.1
      • wait-on: 6.0.1 -> 7.0.0
    • Improvements to the summary table column analysis UI have been made. Some columns have been renamed, and the “Columns to Analyze” table no longer jumps to the top when an item is checked.
    • The create and edit function pages return to the edit function page, rather than redirecting to the list of functions. This facilitates easier iterative editing and testing of functions.
    • The “Users -> Invite new user” form has been extended to send multiple invitations, rather than just one at a time.

Bug Fixes

  • API
    • JSON transforms may now be published when submitted without sample data.
    • When a transform is deleted, any associated sources and pools are now also deleted. The delete operation is aborted if there is a job in progress that uses the transform.
    • All endpoints in the config API return the correct Allow header information when responding to an HTTP OPTIONS request. The only exceptions to this are change_password and dictionaries/files/{filename}.
  • Core
    • Range indexes are now read correctly. Previously, there were edge cases in WHERE clauses on indexed columns that returned incorrect results.
    • Internal LRU cache files weren’t being cleaned up correctly, causing space outages in local ephemeral storage. This has been fixed, and there’s now a disk_cache_entry_max_ttl_minutes tunable to change the default TTL (6 hours) for files in the cache.
    • result_rows is updated during queries even when hdx_query_debug is set to false. Now the X-HDX-Query-Stats response header will include the correct result_rows value.
    • The ORDER BY LIMIT optimization has been disabled for queries that use OFFSET, as there is no way to know min_timestamp or max_timestamp based on the OFFSET.
  • Data
    • The Vacuum service is improved, including a fix for a malformed path bug and better deletion of rejected data. Previously, rejected data in the unknown/unknown and unknown/stream namespace would be skipped by the vacuum service.
    • Azure files that are “soft deleted” no longer show up in internal file listings.
    • The Vacuum service now considers both inactive and active partitions when examining the catalog, since both types of partitions exist in object storage. Failure to do so was causing partitions to be prematurely deleted from object storage.
    • As tables and their transforms are modified, tables undergo a series of revisions. Each revision has its own time bucketer, and previously, these time bucketers and their associated goroutines weren’t being removed.
  • UI
    • The extraneous top-level “Values” setting has been removed from the bucket settings dialogue. All values that aren’t in the maps below the top-level setting will be sent to the default storage.
    • Updated webpack from v5.75.0 to v5.94.0, resolving a cross-realm object access security issue in Webpack for versions below v5.76.0.
    • We fixed an optional chaining operator that was keeping tables with no bucket settings from later having bucket settings defined.