Component Version Override
Overview⚓︎
As of v5.10, Hydrolix supports decoupling the operator version from the Hydrolix platform version.
By default, the operator version and the image manifest version are the same. When spec.version is set to a different value, the operator downloads and deploys component images from that version's manifest without changing the version of the operator.
Use image manifest versioning to:
- Upgrade the operator independently. Get operator bug fixes and features without changing the Hydrolix platform version.
- Stage rollouts. Upgrade the operator first, then update
spec.versionto the new platform version in a separate step. - Test operator compatibility. Validate a newer operator against your current workload before committing to a full platform upgrade.
To override individual component images and tags rather than the full manifest, see Independent Container Versioning.
Configuration⚓︎
Set spec.version in the HydrolixCluster configuration to specify which image manifest the operator should use.
| image manifest version example | |
|---|---|
In this example, the operator runs its own version but deploys component images from the v5.9.0 manifest. Because this feature requires v5.10 or later, the operator itself must be at least v5.10. The spec.version value can target any published version.
Manifest resolution⚓︎
| Scenario | Behaviour |
|---|---|
version isn't set |
The operator uses its built-in manifest. Operator version and component image versions are the same. |
version matches the operator version |
The operator uses its built-in manifest. |
version differs from the operator version |
The operator downloads the manifest for the specified version and uses it as the baseline for component images. |
The operator version itself is never downgraded. Only the component container images change.
Manifest versioning applies to all workload types: When spec.version changes, the operator reconciles and patches all affected resources, including:
- Deployments
- StatefulSets
- Jobs
- DaemonSets
Deployments use a RollingUpdate strategy by default with maxSurge: 25% and maxUnavailable: 0, so pods are replaced gradually with no downtime during the transition. Certain controller components (batch controller, merge controller, catalog proxy) use a Recreate strategy and are briefly unavailable while they restart. The operator marks the cluster status as "Upgrading" while the current version differs from the target version.
Resolution order⚓︎
The operator resolves component images in a specific order.
- If
spec.versionis set and differs from the operator version, it downloads that version's manifest. Otherwise, it uses the built-in manifest. - It overrides the manifest's
OPERATOR_VERSIONentry with the locally built-in operator version. - If
spec.containersentries exist, it overrides the matching image or tag from the manifest baseline. See Independent Container Versioning. - It runs the HydrolixCluster spec through schema validation.
- It patches Kubernetes resources and rolls pods with updated images.
Upgrade workflow⚓︎
To upgrade the operator independently from the platform version:
- Set
spec.versionto the current cluster version. - Upgrade the operator to the target version.
- Verify the operator is healthy. Check that
operator_applyshowscompleteand the operator pod is running without errors. - Update
spec.versionto the target platform version. - Monitor the rollout.
To revert to a previous platform version, set spec.version back to the prior value. The operator treats a rollback the same as any other version change: it downloads the manifest for the specified version and re-applies all resources. No special rollback procedure is required. After reverting, monitor the rollout and confirm all pods have stabilized before considering the rollback complete.
Combine with container-level overrides⚓︎
spec.version and spec.containers compose (see Resolution order). The version field sets the baseline manifest with container-level overrides applied on top. Pods aren't re-deployed if their resolved image references haven't changed.
| Image Manifest Versioning with Container Versioning Override | |
|---|---|
In this example, the operator is deployed with v5.9.0. All components are running v5.9.0 images except turbine-api which uses v5.8.0.
When spec.containers overrides a component's image or tag, the operator annotates the pod with hydrolix.io/image-override, a comma-separated list of the overridden container names. Using spec.version alone doesn't set this annotation. See Monitor overrides with Prometheus for setup and query details.
Validation and error handling⚓︎
When spec.version differs from the operator version, the operator runs a full validation of the HydrolixCluster spec against the current operator's schema before applying changes. This validation covers tunables, configuration structure, and required fields.
If spec validation fails or the manifest can't be fetched, the operator retries on the next reconciliation cycle. See Check operator status for details.
In both cases, the operator doesn't fall back to the built-in manifest or apply a partial configuration.
Check operator status⚓︎
The operator logs its version and the spec.version at startup. It also logs spec validation and sync failures. On failure, the operator updates its operator_apply status file located at /home/turbine/operator_state to failed. When the apply cycle completes successfully, the operator updates this file to complete.
The operator logs manifest fetch failures, but doesn't update the status file. See manifest fetch fails.
Bypass validation⚓︎
In environments where you need to force deployment despite a validation failure, set the IGNORE_STARTUP_VALIDATION_ERRORS environment variable on the operator Deployment:
| Bypass Startup Validation | |
|---|---|
To remove the override after debugging:
| Remove Validation Bypass | |
|---|---|
Bypassing validation may result in missing or incompatible resources.
This is intended for debugging, not production use.
Troubleshooting⚓︎
Manifest fetch fails⚓︎
Symptom: The operator logs show a FailedToFetchManifest error. Pods remain on their previous images and the cluster doesn't update.
Common causes:
- The version string in
spec.versionis incorrect. - The manifest for the specified version hasn't been published yet.
- The operator pod can't reach the registry (network policy, firewall, or air-gapped environment).
Resolution: Verify that the manifest version matches a released version of Hydrolix. If it exists, the issue is likely network connectivity from the operator pod. The fetch timeout is 10 seconds.
Spec validation fails⚓︎
Symptom: The operator logs show a TemporaryError with a bulleted list of validation errors. The operator_apply status file shows failed. The cluster remains on its previous configuration.
Common causes:
- The HydrolixCluster spec uses tunables or configuration fields that aren't compatible with the current operator's schema.
- A very old manifest version is targeted and the operator's schema has changed significantly since that version.
Resolution: Review the specific validation errors in the operator logs. Each error identifies the field and the issue. Correct the spec or choose a version closer to the operator version. To temporarily bypass validation for debugging, see Bypass validation.
Pods not updating after changing spec.version⚓︎
Symptom: You updated spec.version but pods continue running the previous images.
Common causes:
- The operator is in a retry loop due to a validation or fetch failure. Check operator logs for errors.
- The resolved image references haven't actually changed.
Resolution: Check the operator_apply status file located at /home/turbine/operator_state in the operator container as well as the operator logs. If operator_apply shows complete but images look unchanged, use the commands in Verify running images to confirm the actual running images against the expected manifest.
Limitations⚓︎
- No hard-coded version bounds. Whether an older manifest works depends on whether it passes the current operator's schema validation. Very old manifests may fail if the schema has changed significantly.
- Sequential upgrade rules still apply. This feature decouples the operator from component images but doesn't change the requirement to upgrade sequentially through minor versions. Don't use
spec.versionto skip platform versions during an upgrade. - Network access required. The operator fetches manifests from a Hydrolix registry. This endpoint must be reachable from the operator pod when
spec.versiondiffers from the operator version. In air-gapped or network-restricted environments, the operator and platform versions must match so the built-in manifest is used.