22 April 2025 - v2.0.0-5.1.x

Suggest Edits

Download the Spark Connector JAR.

[v2.0.0-5.1.x]

Added

Added additional Spark UI metrics:
- PartitionReader fetch time with statistics: min, max, median, and p90
- SparkScanBuilder.build time
- Partition Reader start and last timestamps (useful for monitoring and debugging)
  - Added statistics for Turbine query time metrics
Introduced a new optional configuration setting:
- hdx_partitions_per_task, which specifies the number of Hydrolix partitions per Spark partition.
Added pushdown support (performance optimization) for ORDER BY + LIMIT (top-N) queries
Added pushdown support (performance optimization) for OFFSET clause
Added shading rules for apache httpcore and httpclient libraries to avoid classpath conflicts on MS Fabric
Added a configuration setting hdx_partitions_per_task for number of hydrolix partitions per spark partition
Added BuildInfo generated class to log turbine version at startup
Added experimental columnar (transposition-based) partition reader for usage with force_columnar query mode. Transposition batch size may be set via the columnar_batch_size config
Added three config keys spark.sql.catalog.hydrolix.cluster_url, spark.sql.catalog.hydrolix.jdbc_protocol, spark.sql.catalog.hydrolix.jdbc_port as
optional alternative to spark.sql.catalog.hydrolix.api_url and spark.sql.catalog.hydrolix.jdbc_url
Added support for page number paginated Turbine API endpoints
- impacted endpoints:
  - orgs/$orgId/projects/
  - orgs/$orgId/projects/${project.uuid}/tables/
  - orgs/$orgId/storages
  - orgs/$orgId/projects/$projectId/tables/$tableId/views
  - orgs/$orgId/projects/$projectId/tables/$tableId/transforms/
- recursively fetch all pages if more than a page of data is available

Changed

Changed embedded turbine's listen port from 8088 to 8123
Reworked summary table interaction
- All summary aliases are now accessed by the syntax SELECT hdxAgg('my_summary_alias') FROM hydrolix.my_project.my_summary_table
- No user interaction is required to pre-register summary aliases before querying them
  - A SQL extension is now required to use summary tables, configured with spark.sql.extensions=io.hydrolix.connectors.spark.SummaryUdfExtension

Fixed

Fixed an issue where multiple uses of the same aggregator in a query would give incorrect results
Fixed an issue where queries with unexpected case-sensitivity would fail
Corrected backquote-escaping of column names containing backquotes

Updated 7 months ago