Use Apache Spark

Overview⚓︎

The Hydrolix Connector for Apache Spark combines the cost and query efficiency of the Hydrolix platform with the rich data analysis and distributed computing power offered by Apache Spark. By integrating your Apache Spark ecosystem with Hydrolix, the Hydrolix Connector for Apache Spark enables the cost and performance efficiency gains of Hydrolix as the backing store while presenting the data in familiar notebooks and coding environments.

The latest Hydrolix Connector for Apache Spark JAR can be downloaded here.

Versioning⚓︎

The Hydrolix Connector for Apache Spark version is of the format:

Version Format

1	`{Hydrolix Connector for Apache Spark version: major.minor.patch}-{Embedded HDX version: major.minor.patch}`

An example of this formatting is the following:

Version Format Example

1	`v1.0.0-v4.22.1.jar`

Each Hydrolix Connector for Apache Spark version is compatible with a corresponding minimum Hydrolix cluster version and all more recent versions. The following is a compatibility matrix between Hydrolix Connector for Apache Spark versions and their compatible Hydrolix cluster versions:

Hydrolix Connector for Apache Spark Version	Hydrolix Versions	Changelog
v3.0.0-v5.6.0	v5.6.0+	September 2, 2025 - v3.0.0-v5.6.0
v2.0.0-v5.1.1	v4.22.1+	April 22, 2025 - v2.0.0-v5.1.1
v1.0.0-v4.22.1	v4.22.1 to v5.3.1	February 4, 2025 - v1.0.0-v4.22.1

Deployment Environments⚓︎

The Hydrolix Connector for Apache Spark can be deployed to multiple platforms. Follow the install instructions for your preferred platform.

Use Apache Spark

Overview⚓︎

Versioning⚓︎

Deployment Environments⚓︎

Databricks ⚓︎

Microsoft Fabric ⚓︎

AWS EMR ⚓︎

Use Apache Spark

Overview⚓︎

Versioning⚓︎

Deployment Environments⚓︎

Databricks⚓︎

Microsoft Fabric⚓︎

AWS EMR⚓︎

Databricks ⚓︎

Microsoft Fabric ⚓︎

AWS EMR ⚓︎