Skip to content

22 April 2025 - v2.0.0

Download the Spark Connector JAR.

[v2.0.0-5.1.x]⚓︎

Added⚓︎

  • Added additional Spark UI metrics:
  • PartitionReader fetch time with statistics: min, max, median, and p90
  • SparkScanBuilder.build time
  • Partition Reader start and last timestamps (useful for monitoring and debugging)
    • Added statistics for Turbine query time metrics
  • Introduced a new optional configuration setting:
  • hdx_partitions_per_task, which specifies the number of Hydrolix partitions per Spark partition.
  • Added pushdown support (performance optimization) for ORDER BY + LIMIT (top-N) queries
  • Added pushdown support (performance optimization) for OFFSET clause
  • Added shading rules for apache httpcore and httpclient libraries to avoid classpath conflicts on MS Fabric
  • Added a configuration setting hdx_partitions_per_task for number of hydrolix partitions per spark partition
  • Added BuildInfo generated class to log turbine version at startup
  • Added experimental columnar (transposition-based) partition reader for usage with force_columnar query mode. Transposition batch size may be set through the columnar_batch_size config
  • Added three config keys spark.sql.catalog.hydrolix.cluster_url, spark.sql.catalog.hydrolix.jdbc_protocol, spark.sql.catalog.hydrolix.jdbc_port as optional alternative to spark.sql.catalog.hydrolix.api_url and spark.sql.catalog.hydrolix.jdbc_url
  • Added support for page number paginated Turbine API endpoints
  • impacted endpoints:
    • orgs/$orgId/projects/
    • orgs/$orgId/projects/${project.uuid}/tables/
    • orgs/$orgId/storages
    • orgs/$orgId/projects/$projectId/tables/$tableId/views
    • orgs/$orgId/projects/$projectId/tables/$tableId/transforms/
  • recursively fetch all pages if more than a page of data is available

Changed⚓︎

  • Changed embedded turbine's listen port from 8088 to 8123
  • Reworked summary table interaction
  • All summary aliases are now accessed by the syntax SELECT hdxAgg('my_summary_alias') FROM hydrolix.my_project.my_summary_table
  • No user interaction is required to pre-register summary aliases before querying them
    • A SQL extension is now required to use summary tables, configured with spark.sql.extensions=io.hydrolix.connectors.spark.SummaryUdfExtension

Fixed⚓︎

  • Fixed an issue where multiple uses of the same aggregator in a query would give incorrect results
  • Fixed an issue where queries with unexpected case-sensitivity would fail
  • Corrected backquote-escaping of column names containing backquotes