Upgrading Hydrolix

Hydrolix continuously releases new features and updates. To stay up to date we suggest downloading the newest update and then using the hdxctl update command. New versions of the HDXCTL can be found here - hdxctl install page.

There are two methods that can be employed to update a Hydrolix architecture, in-place and Blue/Green. In most cases it is recommended that a Blue/Green update be used and especially where a production environment is involved.

Blue/Green Update

A Blue/Green update uses a method where the stateful core components are updated first and a new cluster of stateless compute components are established alongside the existing (old) compute components. Once the new compute components are created and running, traffic and workloads are "routed" to them and the old compute components can be retired. Data and configuration information will not be lost during the process.

Update Process

  1. (Optional) If you have production ingest traffic review the additional steps that may be needed to scale your ingest here

  2. Using the new version of hdxctl.

$ hdxctl update <client-id> --deploy
  1. Create the new stateless cluster, traffic will continue to be directed to the old cluster.
$ hdxctl --region <region> create-cluster --wait <client-id>
  1. Route traffic and workloads to the new stateless cluster.
$ hdxctl route <client-id> <new-cluster-id>

The route command can take 5 minutes to complete due to updates in DNS and waiting for HTTP sessions to die off.

  1. Delete the old cluster.
$ hdxctl delete --wait <client-id> <old cluster-id>
  1. (Optional) re-scale ingest components if required.

📘

Note:

Prometheus, Grafana and Superset use a storage volume to maintain state across updates needs to be moved from the old to the new cluster. As this volume can't be shared between instances you may may experience some 502/503 responses from these components until the old version is deleted (step 5). If you wish to avoid these 502/503 errors you can shut them down first, do the update, route to the new cluster and then restart them (so they attach to the new instances).

In-Place Update.

An In-place update, updates components of the infrastructure without the creation of a new compute cluster. The update is applied through a rolling restart approach. It therefore should be noted that, if there are a large number of compute components running an update can take a longer time to complete.

To run the command as an in-place update you must use both the <Client_id> and the <Cluster_id> in the update command.

Update Process

  1. (Optional) scale down components
$ hdxctl scale <client-id> <cluster-id> --<query-peer/batch-peer/stream-peer>-count 0
  1. Update the cluster
$ hdxctl update <client-id> <cluster-id> --deploy
  1. (Optional) scale up the components
$ hdxctl scale <client-id> <cluster-id> --<query-peer/batch-peer/stream-peer>-count <original count>

Updates with active ingest

Where an update is to be applied to an infrastructure that is actively loading data via Batch, Kafka or Stream the following is recommended.

Updates with Kafka Ingest utilized.

  1. Scale down the Hydrolix Kafka servers via the Portal or the /sources/kafka API endpoint

  2. Check the Grafana Kafka Dashboard or Prometheus to ensure Kafka-peer utilization drops to zero.

  3. Update the cluster using your chosen method.

  4. Scale up the Hydrolix Kafka servers.

  5. Confirm Data is flowing into the Kafka-peer servers by inspecting the Grafana Kafka dashboard.

📘

You may need to increase the number of Kafka-peer servers for a short period to catchup on any lag that may have accumulated on your Kafka infrastructure. This can be seen in the Grafana Kafka dashboards.

Updates with Stream Ingest utilized.

Where Streaming ingest is utilized a Blue/green deployment is suggested. When the hdxctl route command is issued the DNS of the infrastructure is switched from the old stateless cluster to the new stateless cluster. As the switch relies on DNS it is suggested that any client application honour DNS TTL as well as have sufficient re-try capabilities that include a DNS lookup should a load event fail.

Updates with Batch ingest utilized

Before commencing the update it is recommended to let all batch jobs finish their workloads and to pause any automated scripts that are utilizing the Batch ingest API. Batch workloads can be reviewed by either inspecting the portal, the API or via the Grafana dashboards supplied with the service. Once the update is complete, jobs can recommence.

Updates with Batch (Autoingest) utilized

Batch AutoIngest uses a storage notified queue to manage the incoming files to be batch ingested by batch-peers. This queue is compute cluster independent and is persistent across updates and means that in doing an update batch-peers should be reduced to 0 before the update is started. The recommended process is as follows:

  1. Reduce the batch-peer count to 0
$ hdxctl scale --batch-peer-count 0 <client-id> <cluster-id>
  1. Check the Grafana Batch Dashboard or Prometheus to ensure Batch-peer utilisation drops to zero.

  2. Update the cluster using your chosen method.

  3. Reinstate the batch-peer count to prior levels

hdxctl scale --batch-peer-count <count> <client-id> <cluster-id>

You may need to increase the number of batch servers for a short period to catchup on any lag that may have accumulated within the queues.

Rolling Back to an older version.

In rare occasions a roll back to an older version may be required. This is completed by downloading the older version of the HDXCTL tool and using the same update methods as described above to "update" to the old version. If you have concerns or need help please reach out to Hydrolix support.