22 April 2025 - v2.0.0

Download the Spark Connector JAR.

[v2.0.0-5.1.x]⚓︎

Added⚓︎

Added additional Spark UI metrics:
PartitionReader fetch time with statistics: min, max, median, and p90
SparkScanBuilder.build time
Partition Reader start and last timestamps (useful for monitoring and debugging)
- Added statistics for Turbine query time metrics
Introduced a new optional configuration setting:
hdx_partitions_per_task, which specifies the number of Hydrolix partitions per Spark partition.
Added pushdown support (performance optimization) for ORDER BY + LIMIT (top-N) queries
Added pushdown support (performance optimization) for OFFSET clause
Added shading rules for apache httpcore and httpclient libraries to avoid classpath conflicts on MS Fabric
Added a configuration setting hdx_partitions_per_task for number of hydrolix partitions per spark partition
Added BuildInfo generated class to log turbine version at startup
Added experimental columnar (transposition-based) partition reader for usage with force_columnar query mode. Transposition batch size may be set through the columnar_batch_size config
Added three config keys spark.sql.catalog.hydrolix.cluster_url, spark.sql.catalog.hydrolix.jdbc_protocol, spark.sql.catalog.hydrolix.jdbc_port as optional alternative to spark.sql.catalog.hydrolix.api_url and spark.sql.catalog.hydrolix.jdbc_url
Added support for page number paginated Turbine API endpoints
impacted endpoints:
- orgs/$orgId/projects/
- orgs/$orgId/projects/${project.uuid}/tables/
- orgs/$orgId/storages
- orgs/$orgId/projects/$projectId/tables/$tableId/views
- orgs/$orgId/projects/$projectId/tables/$tableId/transforms/
recursively fetch all pages if more than a page of data is available

Changed⚓︎

Changed embedded turbine's listen port from 8088 to 8123
Reworked summary table interaction
All summary aliases are now accessed by the syntax SELECT hdxAgg('my_summary_alias') FROM hydrolix.my_project.my_summary_table
No user interaction is required to pre-register summary aliases before querying them
- A SQL extension is now required to use summary tables, configured with spark.sql.extensions=io.hydrolix.connectors.spark.SummaryUdfExtension

Fixed⚓︎

Fixed an issue where multiple uses of the same aggregator in a query would give incorrect results
Fixed an issue where queries with unexpected case-sensitivity would fail
Corrected backquote-escaping of column names containing backquotes