Prepare a Cluster
In order to deploy the Hydrolix data platform on GCP, you first need to set up a Google Kubernetes Engine (GKE) cluster.
Set Environment Variables
To begin, set up some environment variables using an env.sh
script. Replace the < >
values below with your actual values:
export HDX_BUCKET_REGION=<e.g. us-central1> # should be all lowercase, even though google documentation lists as uppercase
export HDX_HYDROLIX_URL=<e.g. https://my.domain.com>
export HDX_KUBERNETES_NAMESPACE=<e.g. production-service>
export HDX_ADMIN_EMAIL=<e.g. [email protected]>
export HDX_DB_BUCKET_URL=gs://${HDX_KUBERNETES_NAMESPACE}
Load these environment variables with the following command:
source env.sh
Create a Google Project
Already Created a Google Project?
If you've already created a Google Project for Hydrolix, skip this step.
Use the following command in gcloud
to create a Google Project:
gcloud projects create hydrolix-cluster
Once you've created a project, enable the IAM Google API for the project.
Add Your Project ID to env.sh
env.sh
We recommend adding your project ID to your environment variables. Run the following command to get the project ID for your current Google Project:
gcloud config get-value project
Add the following line to env.sh
, replacing "your project id" with your actual project ID:
export PROJECT_ID=<your project id>
Finally, run the following command to load the new environment variable:
source env.sh
Create a Google Storage Bucket
Already Created a Google Cloud Storage Bucket?
If you've already created a Google Cloud Storage Bucket for Hydrolix, skip this step.
Hydrolix needs access to a google cloud storage bucket for data storage. Run the following command to create a bucket via the google cloud console CLI:
gsutil mb -l ${HDX_BUCKET_REGION} $HDX_DB_BUCKET_URL
Deploy a Kubernetes Cluster
Next, you'll deploy a Kubernetes Cluster into the project. The following command creates a single node pool with the following characteristics:
default-pool
- Count: Autoscale 0-20
- Type:
n2-standard-16
This pool provides sufficient node and capacity sizing for a basic load.
gcloud container --project "$PROJECT_ID" clusters create "$HDX_KUBERNETES_NAMESPACE" --region "$HDX_BUCKET_REGION" --no-enable-basic-auth --release-channel "regular" --machine-type "n2-standard-16" --image-type "COS_CONTAINERD" --disk-type "pd-ssd" --disk-size "128" --metadata disable-legacy-endpoints=true --scopes "https://www.googleapis.com/auth/devstorage.read_write","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" --max-pods-per-node "110" --num-nodes "3" --logging=SYSTEM,WORKLOAD --monitoring=SYSTEM --enable-ip-alias --network "projects/$PROJECT_ID/global/networks/default" --subnetwork "projects/$PROJECT_ID/regions/$HDX_BUCKET_REGION/subnetworks/default" --no-enable-intra-node-visibility --default-max-pods-per-node "110" --enable-autoscaling --min-nodes "0" --max-nodes "20" --no-enable-master-authorized-networks --addons HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver --enable-autoupgrade --enable-autorepair --max-surge-upgrade 1 --max-unavailable-upgrade 0 --enable-shielded-nodes --workload-pool="$PROJECT_ID.svc.id.goog" --workload-metadata=GKE_METADATA
Account Scaling
Some users may have scale limits on their account. Ensure you have the ability to scale the instance sizes.
Workload Identity
The deployment uses Googles Workload Identity to access Google Cloud services in a secure and manageable way. For more information, see the Workload Identity documentation. Any new node pools should have this enabled as it is used by a number of the services deployed.
For more information about the options specified in this command, see the Create Cluster Flags Reference.
Create a Service Account for the Cluster
The cluster uses a service account to get access to Google components like Google Cloud Storage. Run the following command to create a service account:
gcloud iam service-accounts create hdx-${HDX_KUBERNETES_NAMESPACE}-sa --project=${PROJECT_ID}
Next, add the service account name to your environment variables by adding the following line to env.sh
:
export GCP_STORAGE_SA=hdx-${HDX_KUBERNETES_NAMESPACE}-sa@${PROJECT_ID}.iam.gserviceaccount.com
Load the new environment variable:
source env.sh
Grant the service account permissions to access the storage bucket:
gsutil iam ch serviceAccount:${GCP_STORAGE_SA}:roles/storage.objectAdmin $HDX_DB_BUCKET_URL
Run the following commands to grant the hydrolix
, turbine-api
, vector
, merge-controller
, and kinesis-coordinator
services permission to use the service account you created:
gcloud iam service-accounts add-iam-policy-binding ${GCP_STORAGE_SA} \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:${PROJECT_ID}.svc.id.goog[${HDX_KUBERNETES_NAMESPACE}/hydrolix]" \
--project $PROJECT_ID
gcloud iam service-accounts add-iam-policy-binding ${GCP_STORAGE_SA} \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:${PROJECT_ID}.svc.id.goog[${HDX_KUBERNETES_NAMESPACE}/turbine-api]" \
--project $PROJECT_ID
gcloud iam service-accounts add-iam-policy-binding ${GCP_STORAGE_SA} \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:${PROJECT_ID}.svc.id.goog[${HDX_KUBERNETES_NAMESPACE}/vector]" \
--project $PROJECT_ID
gcloud iam service-accounts add-iam-policy-binding ${GCP_STORAGE_SA} \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:${PROJECT_ID}.svc.id.goog[${HDX_KUBERNETES_NAMESPACE}/merge-controller]" \
--project $PROJECT_ID
gcloud iam service-accounts add-iam-policy-binding ${GCP_STORAGE_SA} \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:${PROJECT_ID}.svc.id.goog[${HDX_KUBERNETES_NAMESPACE}/kinesis-coordinator]" \
--project $PROJECT_ID
To manage access to your Kubernetes cluster in GCP, see Google's documentation on cluster access.
Create a k8s Namespace
Create a dedicated namespace
in your Kubernetes Cluster:
kubectl create namespace $HDX_KUBERNETES_NAMESPACE
For ease of use, set your new namespace as a default:
kubectl config set-context --current --namespace="$HDX_KUBERNETES_NAMESPACE"
Updated about 2 months ago