22 April 2025 - v2.0.0-5.1.x
Download the Spark Connector JAR.
[v2.0.0-5.1.x]
Added
- Added additional Spark UI metrics:
PartitionReader fetch time
with statistics: min, max, median, and p90SparkScanBuilder.build time
Partition Reader start
andlast timestamps
(useful for monitoring and debugging)- Added statistics for
Turbine query time
metrics
- Added statistics for
- Introduced a new optional configuration setting:
hdx_partitions_per_task
, which specifies the number of Hydrolix partitions per Spark partition.
- Added pushdown support (performance optimization) for ORDER BY + LIMIT (top-N) queries
- Added pushdown support (performance optimization) for OFFSET clause
- Added shading rules for apache httpcore and httpclient libraries to avoid classpath conflicts on MS Fabric
- Added a configuration setting
hdx_partitions_per_task
for number of hydrolix partitions per spark partition - Added BuildInfo generated class to log turbine version at startup
- Added experimental columnar (transposition-based) partition reader for usage with
force_columnar
query mode. Transposition batch size may be set via thecolumnar_batch_size
config - Added three config keys
spark.sql.catalog.hydrolix.cluster_url
,spark.sql.catalog.hydrolix.jdbc_protocol
,spark.sql.catalog.hydrolix.jdbc_port
as
optional alternative tospark.sql.catalog.hydrolix.api_url
andspark.sql.catalog.hydrolix.jdbc_url
- Added support for page number paginated Turbine API endpoints
- impacted endpoints:
- orgs/$orgId/projects/
- orgs/$orgId/projects/${project.uuid}/tables/
- orgs/$orgId/storages
- orgs/$orgId/projects/$projectId/tables/$tableId/views
- orgs/$orgId/projects/$projectId/tables/$tableId/transforms/
- recursively fetch all pages if more than a page of data is available
- impacted endpoints:
Changed
- Changed embedded turbine's listen port from 8088 to 8123
- Reworked summary table interaction
- All summary aliases are now accessed by the syntax
SELECT hdxAgg('my_summary_alias') FROM hydrolix.my_project.my_summary_table
- No user interaction is required to pre-register summary aliases before querying them
- A SQL extension is now required to use summary tables, configured with
spark.sql.extensions=io.hydrolix.connectors.spark.SummaryUdfExtension
- A SQL extension is now required to use summary tables, configured with
- All summary aliases are now accessed by the syntax
Fixed
- Fixed an issue where multiple uses of the same aggregator in a query would give incorrect results
- Fixed an issue where queries with unexpected case-sensitivity would fail
- Corrected backquote-escaping of column names containing backquotes
Updated 1 day ago