SSO, Query Option Hierarchy Display

Notable New Features

SSO
- Authenticate to Hydrolix with your preferred Single-Sign-On provider. Any Keycloak-supported identity provider can be used, such as Google Workspace or GitHub. See our SSO documentation for more details.
Query Option Hierarchy Display
- The query section of the UI has a new tab that shows the final calculated query options for the query. It also shows the hierarchy of values from organization, project, and table to illustrate how the final value was obtained.

Breaking Changes

EKS and GKE Clusters Need More Permission

To continue ingesting data, this version of Hydrolix needs access to the Kubernetes API to perform certain operations. For Amazon EKS and Google GKE deployments, perform these steps before you upgrade to version 4.21.2:
- EKS
  
  First, reset your your SA_POLICY_DOC environment variable you created during Hydrolix setup. The SA_POLICY_DOC now has a new line in it for ingest services:
```
"system:serviceaccount:${HDX_KUBERNETES_NAMESPACE}:ingest"
```
  Once you've set your SA_POLICY_DOC variable, us it in this command to set the new role policy:
```
update-assume-role-policy --role-name "${HDX_KUBERNETES_NAMESPACE}-bucket" \
--assume-role-policy-document "${SA_POLICY_DOC}"
```
- GKE
  
  A new service account policy binding needs to be added with this command:
```
gcloud iam service-accounts add-iam-policy-binding ${GCP_STORAGE_SA} \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:${PROJECT_ID}.svc.id.goog[${HDX_KUBERNETES_NAMESPACE}/ingest]" \
--project $PROJECT_ID
```
Web Browser Cookies May Need to Be Cleared

When visiting the Hydrolix user interface for the first time after upgrading, your web browser cookies may need to be cleared. Make sure you clear only the cookies for the hostname of your Hydrolix UI. Here are links to instructions for Google Chrome, Mozilla Firefox, and Safari.

Upgrade

Upgrade on GKE

kubectl apply -f "https://www.hydrolix.io/operator/v4.21.2/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&gcp-storage-sa=${GCP_STORAGE_SA}"

Upgrade on EKS

kubectl apply -f "https://www.hydrolix.io/operator/v4.21.2/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&aws-storage-role=${AWS_STORAGE_ROLE}"

Upgrade on LKE

kubectl apply -f "https://www.hydrolix.io/operator/v4.21.2/operator-resources?namespace=$HDX_KUBERNETES_NAMESPACE"

Changelog

General

API
- SSO Authentication is now possible through any Keycloak-supported indentity provider. See the SSO documentation for more information.
- We've added a new configuration API endpoint, /config/v1/users/current, that returns details about the currently authenticated user.
- To increase security and help prevent DOS attacks, timeouts have been added to all outgoing HTTP requests.
- A new /v1/orgs/$ORG_ID/config_blob/ API endpoint returns the configuration of your Hydrolix cluster.
Control
- To support running Kafka Peers on IPv6 on certain networks, kafka-peers can now run with hostNetwork: true.
- Pools can now be specified as a dictionary (keyed on pool name) within the Hydrolix cluster configuration. Previously, they could only be specified as a list.
- The turbine-api pod now deploys its own query-head container to allow it to make validation queries. If a user scales down their cluster’s query-head deployment to zero replicas, this will no longer have unintended side-effects outside of query functionality.
- The MaxMind Geo database is now better kept up-to-date with each release.
- To support SSO, unified authentication is now handled on a more granular per-route basis instead of a per-service basis.
- To increase resilience in unreliable Kubernetes environments, a new HDX-Node daemonset has been introduced which monitors node-to-node connectivity. It also automatically cordons and delete nodes when they're unreachable.
- The catalog PostgreSQL port can now be specified using the catalog_db_port tunable.
- The new hydrolix.io/ignore-diff=true annotation in the Hydrolix operator prevents manual changes in Kubernetes resources from being reverted by the operator. Read more about it in Manual Resource Configuration.
- A new prometheus_scrape_interval controls the Prometheus scrape_interval global value.
Core
- Added support for materialized views. Summary SQL statements only receive a GROUP BY expression if aggregation keys are found in the SQL.
- To support an upcoming release of the Spark connector, the API can now generate pre-signed URLs for partitions, simplifying object storage authentication.
- Stack traces during queries are now suppressed by default. To re-enable, append SETTINGS hdx_query_debug=1 to your query.
- The indexOf() function now has index-only scan support to increase performance.
- The turbine storage subsystem keeps an internal cache of authorization tokens. Log messages now differentiate between errors in the system and similar warnings involved in authorization token cache refreshing.
Data
- New optional Partition Cleaner works as a cron job, replacing Partition Vacuum to provide an alternate partition deletion system. It's disabled by default for now and controlled by three tunables: partition_cleaner_enabled, partition_cleaner_schedule, and partition_cleaner_dry_run.
- Implemented command-line report-usage tool to report and re-upload cluster usage data.
- To help troubleshoot and reduce out-of-memory conditions for intake services intake-head, kinesis-peer, kafka-peer, and akamai-siem-peer, we can optionally detect OOM scenarios in calls to the turbine indexer. If detected, we split the bucket of data into smaller files and retry. This is turned off by default, and can be enabled by using the k8s_oom_kill_detection_enabled tunable.
- Context has been added to merge log lines so both input and output partitions names occur in the same line. This will help debugging partition issues.
- The Rust SQL library sqlx has been updated from 0.7.4 to 0.8.1 to address SQL injection vulnerabilities.
- A new manifest.json file format and versioned config filename has been rolled out across Data, Core, and API. This versioning makes configuration file changes easier to detect.
- Upgraded Go from 1.21.0 to 1.23.2.
- All intake HTTP services now have a read/write timeout of 30 minutes to help guard against attacks that open many simultaneous connections.
- Incoming data can now be written to an array of storages in a random order using the spread_list directive in the hydrolixcluster.yaml file. This can help avoid cloud storage throttling.
UI
- The query section of the UI has a new tab that shows the final calculated query options for the query. It also shows the hierarchy of values from organization, project, and table to illustrate how the final value was obtained.
- The administrative UI's favicon.ico has been updated.
- Forms for SIEM have been updated, adding "copy" buttons and allowing client secret fields updates.
- Bucket creation and editing UI now supports the "westus3" Azure region.
- A robots.txt file has been added to the Hydrolix Administrative UI to keep web search indexers away.

Bug Fixes

API
- Transforms containing epochs were sometimes incorrectly identified as conflicting with existing transforms. Datetime64 and datetime handling has been changed to fix this.
Control
- Security issues from a recent penetration test were addressed.
- The default scale profiles now have intake-head enabled, and other scale profile cleanup has been done.
- The otel_endpoint tunable now works correctly.
Core
- The SQL url() and urlCluster() functions are now disabled to provide better cluster security. Use of these functions will now return a FUNCTION_NOT_ALLOWED error.
- Query Peers no longer attempt to perform catalog queries, fixing an occasional segfault if Query Head partition evaluation is incomplete.
- Query authentication now supports JWT tokens in HYDROLIX_TOKEN cookies, enabling SSO support.
- A configuration synchronization issue has been fixed, preventing unpredictable results when a project or table is added or deleted repetitively with the same name.
Data
- The partition cleaner now considers the PARTITION_GRACE_PERIOD at startup to ensure partitions created within the last day are not deleted.
- Zero-length uploads to Google Cloud Storage no longer result in hung uploads. Instead, it's logged as a failed attempt and then a new attempt is made.
- The cloud storage I/O library has had preparatory work for supporting batch delete jobs during periodic vacuums.
UI
- The Analyze Tables UI now works for non-summary tables.
- When creating or editing a batch job or bucket, the UI no longer requires an endpoint.
- Roles may now be edited, no matter what policies or permissions are assigned them.
- The Query statistics tab now works again, reading data from the API's x-hdx-query-stats header rather than from the response body.
- To improve security, dynamically-generated regular expressions are now created according to pre-defined patterns. Only known confirmation words (DELETE, TRUNCATE, FORCE) are now used for validation.
- The Axios interceptor now correctly extracts the path from the URL regardless of the domain, fixing an error in the dashboard section of the UI.