Prepare a Cluster

In order to deploy the Hydrolix data platform on GCP, you first need to set up a Google Kubernetes Engine (GKE) cluster.

Set Environment Variables

To begin, set up some environment variables using an env.sh script. Replace the < > values below with your actual values:

export HDX_BUCKET_REGION=<e.g. us-central1> # should be all lowercase, even though google documentation lists as uppercase
export HDX_HYDROLIX_URL=<e.g. https://my.domain.com>
export HDX_KUBERNETES_NAMESPACE=<e.g. production-service>
export HDX_ADMIN_EMAIL=<e.g. [email protected]>
export HDX_DB_BUCKET_URL=gs://${HDX_KUBERNETES_NAMESPACE}

Load these environment variables with the following command:

source env.sh

Create a Google Project

👍
Already Created a Google Project?
If you've already created a Google Project for Hydrolix, skip this step.

Use the following command in gcloud to create a Google Project:

gcloud projects create hydrolix-cluster

Once you've created a project, enable the IAM Google API for the project.

Add Your Project ID to `env.sh`

We recommend adding your project ID to your environment variables. Run the following command to get the project ID for your current Google Project:

gcloud config get-value project

Add the following line to env.sh, replacing "your project id" with your actual project ID:

export PROJECT_ID=<your project id>

Finally, run the following command to load the new environment variable:

source env.sh

Create a Google Storage Bucket

👍
Already Created a Google Cloud Storage Bucket?
If you've already created a Google Cloud Storage Bucket for Hydrolix, skip this step.

Hydrolix needs access to a google cloud storage bucket for data storage. Run the following command to create a bucket via the google cloud console CLI:

gsutil mb -l ${HDX_BUCKET_REGION} $HDX_DB_BUCKET_URL

Deploy a Kubernetes Cluster

Next, you'll deploy a Kubernetes Cluster into the project. The following command creates a single node pool with the following characteristics:

default-pool

Count: Autoscale 0-20
Type: n2-standard-16

This pool provides sufficient node and capacity sizing for a basic load.

gcloud container --project "$PROJECT_ID" clusters create "$HDX_KUBERNETES_NAMESPACE" --region "$HDX_BUCKET_REGION" --no-enable-basic-auth --release-channel "regular" --machine-type "n2-standard-16" --image-type "COS_CONTAINERD" --disk-type "pd-ssd" --disk-size "128" --metadata disable-legacy-endpoints=true --scopes "https://www.googleapis.com/auth/devstorage.read_write","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" --max-pods-per-node "110" --num-nodes "3" --logging=SYSTEM,WORKLOAD --monitoring=SYSTEM --enable-ip-alias --network "projects/$PROJECT_ID/global/networks/default" --subnetwork "projects/$PROJECT_ID/regions/$HDX_BUCKET_REGION/subnetworks/default" --no-enable-intra-node-visibility --default-max-pods-per-node "110" --enable-autoscaling --min-nodes "0" --max-nodes "20" --no-enable-master-authorized-networks --addons HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver --enable-autoupgrade --enable-autorepair --max-surge-upgrade 1 --max-unavailable-upgrade 0 --enable-shielded-nodes --workload-pool="$PROJECT_ID.svc.id.goog" --workload-metadata=GKE_METADATA

📘
Account Scaling
Some users may have scale limits on their account. Ensure you have the ability to scale the instance sizes.

📘
Workload Identity
The deployment uses Googles Workload Identity to access Google Cloud services in a secure and manageable way. For more information, see the Workload Identity documentation. Any new node pools should have this enabled as it is used by a number of the services deployed.

For more information about the options specified in this command, see the Create Cluster Flags Reference.

Create a Service Account for the Cluster

The cluster uses a service account to get access to Google components like Google Cloud Storage. Run the following command to create a service account:

gcloud iam service-accounts create hdx-${HDX_KUBERNETES_NAMESPACE}-sa --project=${PROJECT_ID}

Next, add the service account name to your environment variables by adding the following line to env.sh:

export GCP_STORAGE_SA=hdx-${HDX_KUBERNETES_NAMESPACE}-sa@${PROJECT_ID}.iam.gserviceaccount.com

Load the new environment variable:

source env.sh

Grant the service account permissions to access the storage bucket:

gsutil iam ch serviceAccount:${GCP_STORAGE_SA}:roles/storage.objectAdmin $HDX_DB_BUCKET_URL

Run the following commands to grant the hydrolix, ingest, kinesis-coordinator, merge-controller, turbine-api, and vector services permission to use the service account you created:

for HDX_SERVICE in \
                     hydrolix \
                     ingest \
                     kinesis-coordinator \
                     merge-controller \
                     turbine-api \
                     vector \
        ; do
  gcloud iam service-accounts add-iam-policy-binding "${GCP_STORAGE_SA}" \
    --role roles/iam.workloadIdentityUser \
    --member "serviceAccount:${PROJECT_ID}.svc.id.goog[${HDX_KUBERNETES_NAMESPACE}/${HDX_SERVICE}]" \
    --project "${PROJECT_ID}"

done

Verify successful application of grants.

$ gcloud iam service-accounts get-iam-policy ${GCP_STORAGE_SA}
bindings:
- members:
  - serviceAccount:{SERVICE_ACCOUNT}
  - user:{USERNAME}
  role: roles/iam.serviceAccountTokenCreator
- members:
  - serviceAccount:{PROJECT_ID}.svc.id.goog[{NAMESPACE}/hydrolix]
  - serviceAccount:{PROJECT_ID}.svc.id.goog[{NAMESPACE}/ingest]
  - serviceAccount:{PROJECT_ID}.svc.id.goog[{NAMESPACE}/kinesis-coordinator]
  - serviceAccount:{PROJECT_ID}.svc.id.goog[{NAMESPACE}/merge-controller]
  - serviceAccount:{PROJECT_ID}.svc.id.goog[{NAMESPACE}/turbine-api]
  - serviceAccount:{PROJECT_ID}.svc.id.goog[{NAMESPACE}/vector]
  role: roles/iam.workloadIdentityUser

 [... snipped ...]

To manage access to your Kubernetes cluster in GCP, see Google's documentation on cluster access.

Create a k8s Namespace

Create a dedicated namespace in your Kubernetes Cluster:

kubectl create namespace $HDX_KUBERNETES_NAMESPACE

For ease of use, set your new namespace as a default:

kubectl config set-context --current --namespace="$HDX_KUBERNETES_NAMESPACE"

Prepare a Cluster

Set Environment Variables

Create a Google Project

👍
Already Created a Google Project?

Add Your Project ID to `env.sh`

Create a Google Storage Bucket

👍
Already Created a Google Cloud Storage Bucket?

Deploy a Kubernetes Cluster

📘
Account Scaling

📘
Workload Identity

Create a Service Account for the Cluster

Create a k8s Namespace

Set Environment Variables

Create a Google Project

👍Already Created a Google Project?

Add Your Project ID to env.sh

Create a Google Storage Bucket

👍Already Created a Google Cloud Storage Bucket?

Deploy a Kubernetes Cluster

📘Account Scaling

📘Workload Identity

Create a Service Account for the Cluster

Create a k8s Namespace

👍
Already Created a Google Project?

Add Your Project ID to `env.sh`

👍
Already Created a Google Cloud Storage Bucket?

📘
Account Scaling

📘
Workload Identity