22 April 2025 - v2.0.0-5.1.x
Download the Spark Connector JAR.
[v2.0.0-5.1.x]
Added
- Added additional Spark UI metrics:
PartitionReader fetch timewith statistics: min, max, median, and p90SparkScanBuilder.build timePartition Reader startandlast timestamps(useful for monitoring and debugging)- Added statistics for
Turbine query timemetrics
- Added statistics for
- Introduced a new optional configuration setting:
hdx_partitions_per_task, which specifies the number of Hydrolix partitions per Spark partition.
- Added pushdown support (performance optimization) for ORDER BY + LIMIT (top-N) queries
- Added pushdown support (performance optimization) for OFFSET clause
- Added shading rules for apache httpcore and httpclient libraries to avoid classpath conflicts on MS Fabric
- Added a configuration setting
hdx_partitions_per_taskfor number of hydrolix partitions per spark partition - Added BuildInfo generated class to log turbine version at startup
- Added experimental columnar (transposition-based) partition reader for usage with
force_columnarquery mode. Transposition batch size may be set via thecolumnar_batch_sizeconfig - Added three config keys
spark.sql.catalog.hydrolix.cluster_url,spark.sql.catalog.hydrolix.jdbc_protocol,spark.sql.catalog.hydrolix.jdbc_portas
optional alternative tospark.sql.catalog.hydrolix.api_urlandspark.sql.catalog.hydrolix.jdbc_url - Added support for page number paginated Turbine API endpoints
- impacted endpoints:
- orgs/$orgId/projects/
- orgs/$orgId/projects/${project.uuid}/tables/
- orgs/$orgId/storages
- orgs/$orgId/projects/$projectId/tables/$tableId/views
- orgs/$orgId/projects/$projectId/tables/$tableId/transforms/
- recursively fetch all pages if more than a page of data is available
- impacted endpoints:
Changed
- Changed embedded turbine's listen port from 8088 to 8123
- Reworked summary table interaction
- All summary aliases are now accessed by the syntax
SELECT hdxAgg('my_summary_alias') FROM hydrolix.my_project.my_summary_table - No user interaction is required to pre-register summary aliases before querying them
- A SQL extension is now required to use summary tables, configured with
spark.sql.extensions=io.hydrolix.connectors.spark.SummaryUdfExtension
- A SQL extension is now required to use summary tables, configured with
- All summary aliases are now accessed by the syntax
Fixed
- Fixed an issue where multiple uses of the same aggregator in a query would give incorrect results
- Fixed an issue where queries with unexpected case-sensitivity would fail
- Corrected backquote-escaping of column names containing backquotes
Updated 7 months ago