Preparing your GKE Cluster

In order to deploy the Hydrolix data platform on GCP, you first need to set up a Google Kubernetes Engine (GKE) cluster.

Setting environment variables

We recommend setting up some environment variables using an env.sh script. Replace the < > values below with your actual values.

cat <<<<EOT >>>> env.sh
export HDX_BUCKET_REGION=<i.e. us-central1>                                                                   
export HDX_HYDROLIX_URL=<i.e https://my.domain.com>                                                       
export HDX_KUBERNETES_NAMESPACE=<i.e. production-service> 
export HDX_ADMIN_EMAIL=<i.e. [email protected]>
export HDX_DB_BUCKET_URL=gs://${HDX_KUBERNETES_NAMESPACE}
EOT

Create a Google Project (Optional)

You may have already created the Google Project you want to deploy Hydrolix into, if you haven't you can use the following command in gcloud to create it for you:

gcloud projects create hydrolix-cluster

📘

Make sure you enable the IAM Google API in your project

https://console.cloud.google.com/apis/api/iamcredentials.googleapis.com/

Add your Project ID to the environment variables

To add the Project name to your environment variables you can you use the following.

echo export PROJECT_ID= <your new/existing project id> >> env.sh && source env.sh

Create a Google Storage Bucket

Hydrolix needs access to a google cloud storage bucket to store all the data, the next step is to create this bucket via google cloud console cli:

gsutil mb -l ${HDX_BUCKET_REGION} $HDX_DB_BUCKET_URL

Deploying the Kubernetes Cluster

To deploy a Kubernetes Cluster into this new project, we recommend using the command below as we know this has sufficient node and capacity sizing for a basic load.

📘

Account Scaling

Some users may have scale limits on their account. If so you should ensure you have the ability to scale the following instance sizes. Note this scale is the initial basic scale that is deployed when the architecture is created.

gcloud beta container --project "$PROJECT_ID" clusters create "$HDX_KUBERNETES_NAMESPACE" --region "$HDX_BUCKET_REGION" --no-enable-basic-auth --release-channel "regular" --machine-type "n2-standard-16" --image-type "COS_CONTAINERD" --disk-type "pd-ssd" --disk-size "128" --metadata disable-legacy-endpoints=true --scopes "https://www.googleapis.com/auth/devstorage.read_write","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" --max-pods-per-node "110" --num-nodes "3" --logging=SYSTEM,WORKLOAD --monitoring=SYSTEM --enable-ip-alias --network "projects/$PROJECT_ID/global/networks/default" --subnetwork "projects/$PROJECT_ID/regions/$HDX_BUCKET_REGION/subnetworks/default" --no-enable-intra-node-visibility --default-max-pods-per-node "110" --enable-autoscaling --min-nodes "0" --max-nodes "20" --no-enable-master-authorized-networks --addons HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver --enable-autoupgrade --enable-autorepair --max-surge-upgrade 1 --max-unavailable-upgrade 0 --enable-shielded-nodes --workload-pool="$PROJECT_ID.svc.id.goog" --workload-metadata=GKE_METADATA

The command creates a single node pool with the following characteristics.

default-pool

  • Count: Autoscale 0-20
  • Type: n2-standard-16

👍

Workload Identity

The deployment uses Googles Workload Identity to access Google Cloud services in a secure and manageable way. More information on this can be found here Workload Identity. Any new node pools should have this enabled as it is used by a number of the services deployed.

More information on the command can be found here Reference Create Cluster Flags below.

Create a Service Account for the Cluster

The cluster uses a service account to get access to Google components (for example google storage).

gcloud iam service-accounts create hdx-${HDX_KUBERNETES_NAMESPACE}-sa --project=${PROJECT_ID}

Add the service account name to your environment variables.

echo export GCP_STORAGE_SA=hdx-${HDX_KUBERNETES_NAMESPACE}[email protected]${PROJECT_ID}.iam.gserviceaccount.com >> env.sh \
&& source env.sh

Give the service account permissions to access the storage bucket.

gsutil iam ch serviceAccount:${GCP_STORAGE_SA}:roles/storage.objectAdmin $HDX_DB_BUCKET_URL

The following three commands give the Kubernetes service permission to use the service account you created and specifies that the Kubernetes service accounts, hydrolix, turbine-api and vector should be able to use it

gcloud iam service-accounts add-iam-policy-binding ${GCP_STORAGE_SA} \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:${PROJECT_ID}.svc.id.goog[${HDX_KUBERNETES_NAMESPACE}/hydrolix]"


gcloud iam service-accounts add-iam-policy-binding ${GCP_STORAGE_SA} \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:${PROJECT_ID}.svc.id.goog[${HDX_KUBERNETES_NAMESPACE}/turbine-api]"


gcloud iam service-accounts add-iam-policy-binding ${GCP_STORAGE_SA} \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:${PROJECT_ID}.svc.id.goog[${HDX_KUBERNETES_NAMESPACE}/vector]"

To manage your Kubernetes cluster in GCP follow the instruction from google documentation.

Create a K8's Namespace.

Create a dedicated namespace in your Kubernetes Cluster:

kubectl create namespace $HDX_KUBERNETES_NAMESPACE

For ease of use, set your new namespace as a default:

kubectl config set-context --current --namespace="$HDX_KUBERNETES_NAMESPACE"

Reference Create Cluster Flags

The following provides more information on the flags that are used in the create cluster command above. In addtion more information can be found on Google's website here - [gcloud container clusters create])(https://cloud.google.com/sdk/gcloud/reference/container/clusters/create).

FlagDescriptionRecommended value
--projectProject the cluster will be deployed to.--project "$PROJECT_ID"
--regionThe region to deploy the nodes to."$HDX_BUCKET_REGION"
--no-enable-basic-authNo cluster authorisation is used.--no-enable-basic-auth
--release-channelClusters subscribed to 'regular' receive versions that are considered GA quality. 'regular' is recommended for production users.--release-channel "regular"
--machine-typeThe type of machine to use. Hydrolix recommends using n2-standard-16 machines for running a cluster.--machine-type "n2-standard-16"
--image-typeImage Type specifies the base OS that the nodes in the cluster will run on.--image-type "COS_CONTAINERD"
--disk-typeType of VM boot disk to be used.--disk-type "pd-ssd"
--disk-sizeSize of VM boot disk to be used.--disk-size "192"
--metadataCompute Engine metadata to be made available to the guest operating system running on nodes. Addition of the disable-legacy-endpoints set to true.--metadata disable-legacy-endpoints=true
--scopesMore information on scopes can be found here--scopes storage-rw,logging-write,monitoring,service-control,service-management,trace
--max-pods-per-nodeThe max number of pods per node for a node pool.--max-pods-per-node "110"
--num-nodesNumber of initial nodes to create.--num-nodes "3"
--loggingSet the components that have logging enabled.--logging=SYSTEM,WORKLOAD
--monitoringSet the components that have monitoring enabled.--monitoring=SYSTEM
--enable-ip-aliasEnable use of alias IPs for Pod IPs--enable-ip-alias
--networkThe Compute Engine Network that the cluster will connect to.--network "projects/$PROJECT_ID/global/networks/default"
--subnetworkThe Google Compute Engine subnetwork to which the cluster is connected.--subnetwork "projects/$PROJECT_ID/regions/$HDX_BUCKET_REGION/subnetworks/default"
--no-enable-intra-node-visibilityTurn off intra-node visibility.--no-enable-intra-node-visibility
--default-max-pods-per-nodeThe default max number of pods per node for node pools in the cluster.--default-max-pods-per-node "110"
--enable-autoscalingEnables autoscaling for a node pool.--enable-autoscaling
--min-nodesMinimum number of nodes per zone in the node pool--min-nodes "0"
--max-nodesMaximum number of nodes available for autoscaling.--max-nodes "20"
--no-enable-master-authorized-networksTurn off only allowing only a specified set of CIDR blocks to connect to Kubernetes master.--no-enable-master-authorized-networks
--addonsAddons are additional Kubernetes cluster components. Hydrolix uses HorizontalPodAutoscaling, HttpLoadBalancing, GcePersistentDiskCsiDriver--addons HorizontalPodAutoscaling, HttpLoadBalancing, GcePersistentDiskCsiDriver.
--enable-autoupgradeSets autoupgrade feature for a cluster's default node pool.--enable-autoupgrade
--enable-autorepairEnable node autorepair feature for a cluster's default node pool.--enable-autorepair
--max-surge-upgradeNumber of extra (surge) nodes to be created on each upgrade of a node pool.--max-surge-upgrade 1
--max-unavailable-upgradeNumber of nodes that can be unavailable at the same time on each upgrade of a node pool.--max-unavailable-upgrade 0
--enable-shielded-nodesEnabling Shielded Nodes will enable a more secure Node credential bootstrapping implementation.--enable-shielded-nodes
--workload-poolEnable Workload Identity on the cluster.--workload-pool="$PROJECT_ID.svc.id.goog"
--workload-metadataType of metadata server available to pods running in the node pool.--workload-metadata=GKE_METADATA