Prepare a Cluster

In order to deploy the Hydrolix data platform on GCP, you first need to set up a Google Kubernetes Engine (GKE) cluster.

Set Environment Variables

To begin, set up some environment variables using an env.sh script. Replace the < > values below with your actual values:

export HDX_BUCKET_REGION=<e.g. us-central1> # should be all lowercase, even though google documentation lists as uppercase
export HDX_HYDROLIX_URL=<e.g. https://my.domain.com>
export HDX_KUBERNETES_NAMESPACE=<e.g. production-service>
export HDX_ADMIN_EMAIL=<e.g. [email protected]>
export HDX_DB_BUCKET_URL=gs://${HDX_KUBERNETES_NAMESPACE}

Load these environment variables with the following command:

source env.sh

Create a Google Project

👍

Already Created a Google Project?

If you've already created a Google Project for Hydrolix, skip this step.

Use the following command in gcloud to create a Google Project:

gcloud projects create hydrolix-cluster

Once you've created a project, enable the IAM Google API for the project.

Add Your Project ID to env.sh

We recommend adding your project ID to your environment variables. Run the following command to get the project ID for your current Google Project:

gcloud config get-value project

Add the following line to env.sh, replacing "your project id" with your actual project ID:

export PROJECT_ID=<your project id>

Finally, run the following command to load the new environment variable:

source env.sh

Create a Google Storage Bucket

👍

Already Created a Google Cloud Storage Bucket?

If you've already created a Google Cloud Storage Bucket for Hydrolix, skip this step.

Hydrolix needs access to a google cloud storage bucket for data storage. Run the following command to create a bucket via the google cloud console CLI:

gsutil mb -l ${HDX_BUCKET_REGION} $HDX_DB_BUCKET_URL

Deploy a Kubernetes Cluster

Next, you'll deploy a Kubernetes Cluster into the project. The following command creates a single node pool with the following characteristics:

default-pool

  • Count: Autoscale 0-20
  • Type: n2-standard-16

This pool provides sufficient node and capacity sizing for a basic load.

gcloud container --project "$PROJECT_ID" clusters create "$HDX_KUBERNETES_NAMESPACE" --region "$HDX_BUCKET_REGION" --no-enable-basic-auth --release-channel "regular" --machine-type "n2-standard-16" --image-type "COS_CONTAINERD" --disk-type "pd-ssd" --disk-size "128" --metadata disable-legacy-endpoints=true --scopes "https://www.googleapis.com/auth/devstorage.read_write","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" --max-pods-per-node "110" --num-nodes "3" --logging=SYSTEM,WORKLOAD --monitoring=SYSTEM --enable-ip-alias --network "projects/$PROJECT_ID/global/networks/default" --subnetwork "projects/$PROJECT_ID/regions/$HDX_BUCKET_REGION/subnetworks/default" --no-enable-intra-node-visibility --default-max-pods-per-node "110" --enable-autoscaling --min-nodes "0" --max-nodes "20" --no-enable-master-authorized-networks --addons HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver --enable-autoupgrade --enable-autorepair --max-surge-upgrade 1 --max-unavailable-upgrade 0 --enable-shielded-nodes --workload-pool="$PROJECT_ID.svc.id.goog" --workload-metadata=GKE_METADATA

📘

Account Scaling

Some users may have scale limits on their account. Ensure you have the ability to scale the instance sizes.

📘

Workload Identity

The deployment uses Googles Workload Identity to access Google Cloud services in a secure and manageable way. For more information, see the Workload Identity documentation. Any new node pools should have this enabled as it is used by a number of the services deployed.

For more information about the options specified in this command, see the Create Cluster Flags Reference.

Create a Service Account for the Cluster

The cluster uses a service account to get access to Google components like Google Cloud Storage. Run the following command to create a service account:

gcloud iam service-accounts create hdx-${HDX_KUBERNETES_NAMESPACE}-sa --project=${PROJECT_ID}

Next, add the service account name to your environment variables by adding the following line to env.sh:

export GCP_STORAGE_SA=hdx-${HDX_KUBERNETES_NAMESPACE}-sa@${PROJECT_ID}.iam.gserviceaccount.com

Load the new environment variable:

source env.sh

Grant the service account permissions to access the storage bucket:

gsutil iam ch serviceAccount:${GCP_STORAGE_SA}:roles/storage.objectAdmin $HDX_DB_BUCKET_URL

Run the following commands to grant the hydrolix, turbine-api, vector, merge-controller, and kinesis-coordinator services permission to use the service account you created:

gcloud iam service-accounts add-iam-policy-binding ${GCP_STORAGE_SA} \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:${PROJECT_ID}.svc.id.goog[${HDX_KUBERNETES_NAMESPACE}/hydrolix]" \
--project $PROJECT_ID
gcloud iam service-accounts add-iam-policy-binding ${GCP_STORAGE_SA} \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:${PROJECT_ID}.svc.id.goog[${HDX_KUBERNETES_NAMESPACE}/turbine-api]" \
--project $PROJECT_ID
gcloud iam service-accounts add-iam-policy-binding ${GCP_STORAGE_SA} \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:${PROJECT_ID}.svc.id.goog[${HDX_KUBERNETES_NAMESPACE}/vector]" \
--project $PROJECT_ID
gcloud iam service-accounts add-iam-policy-binding ${GCP_STORAGE_SA} \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:${PROJECT_ID}.svc.id.goog[${HDX_KUBERNETES_NAMESPACE}/merge-controller]" \
--project $PROJECT_ID
gcloud iam service-accounts add-iam-policy-binding ${GCP_STORAGE_SA} \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:${PROJECT_ID}.svc.id.goog[${HDX_KUBERNETES_NAMESPACE}/kinesis-coordinator]" \
--project $PROJECT_ID
gcloud iam service-accounts add-iam-policy-binding ${GCP_STORAGE_SA} \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:${PROJECT_ID}.svc.id.goog[${HDX_KUBERNETES_NAMESPACE}/ingest]" \
--project $PROJECT_ID

To manage access to your Kubernetes cluster in GCP, see Google's documentation on cluster access.

Create a k8s Namespace

Create a dedicated namespace in your Kubernetes Cluster:

kubectl create namespace $HDX_KUBERNETES_NAMESPACE

For ease of use, set your new namespace as a default:

kubectl config set-context --current --namespace="$HDX_KUBERNETES_NAMESPACE"

What’s Next