22 April 2025 - v2.0.0-5.1.x

Download the Spark Connector JAR.

[v2.0.0-5.1.x]

Added

  • Added additional Spark UI metrics:
    • PartitionReader fetch time with statistics: min, max, median, and p90
    • SparkScanBuilder.build time
    • Partition Reader start and last timestamps (useful for monitoring and debugging)
      • Added statistics for Turbine query time metrics
  • Introduced a new optional configuration setting:
    • hdx_partitions_per_task, which specifies the number of Hydrolix partitions per Spark partition.
  • Added pushdown support (performance optimization) for ORDER BY + LIMIT (top-N) queries
  • Added pushdown support (performance optimization) for OFFSET clause
  • Added shading rules for apache httpcore and httpclient libraries to avoid classpath conflicts on MS Fabric
  • Added a configuration setting hdx_partitions_per_task for number of hydrolix partitions per spark partition
  • Added BuildInfo generated class to log turbine version at startup
  • Added experimental columnar (transposition-based) partition reader for usage with force_columnar query mode. Transposition batch size may be set via the columnar_batch_size config
  • Added three config keys spark.sql.catalog.hydrolix.cluster_url, spark.sql.catalog.hydrolix.jdbc_protocol, spark.sql.catalog.hydrolix.jdbc_port as
    optional alternative to spark.sql.catalog.hydrolix.api_url and spark.sql.catalog.hydrolix.jdbc_url
  • Added support for page number paginated Turbine API endpoints
    • impacted endpoints:
      • orgs/$orgId/projects/
      • orgs/$orgId/projects/${project.uuid}/tables/
      • orgs/$orgId/storages
      • orgs/$orgId/projects/$projectId/tables/$tableId/views
      • orgs/$orgId/projects/$projectId/tables/$tableId/transforms/
    • recursively fetch all pages if more than a page of data is available

Changed

  • Changed embedded turbine's listen port from 8088 to 8123
  • Reworked summary table interaction
    • All summary aliases are now accessed by the syntax SELECT hdxAgg('my_summary_alias') FROM hydrolix.my_project.my_summary_table
    • No user interaction is required to pre-register summary aliases before querying them
      • A SQL extension is now required to use summary tables, configured with spark.sql.extensions=io.hydrolix.connectors.spark.SummaryUdfExtension

Fixed

  • Fixed an issue where multiple uses of the same aggregator in a query would give incorrect results
  • Fixed an issue where queries with unexpected case-sensitivity would fail
  • Corrected backquote-escaping of column names containing backquotes