Prepare a Cluster

This guide shows how to configure Amazon Elastic Kubernetes Service (EKS) to host a Hydrolix cluster. This uses Amazon's CLI tools to build and configure the cluster.

Terraform Examples

For examples of configuring an EKS environment with Terraform, see the hydrolix/terraform GitHub repository.

Prepare an EKS Cluster⚓︎

Setup⚓︎

Prepare Command Line Tools⚓︎

Install and configure the following tools on your local machine:

aws to interact with the Amazon services
eksctl to build the EKS cluster
kubectl to interact with the cluster

Create a Local Environment Variables File⚓︎

This guide uses environment variables as a templating mechanism. We recommend putting them into a file so you can load them into scope within your terminal shell. Write the following environment variables into a file called env.sh, replacing <> with appropriate values for your deployment:

env.sh

export AWS_PROFILE=<i.e dev/staging/prod>                                                      
export HDX_BUCKET_REGION=<i.e. us-east-2>                                                                   
export HDX_HYDROLIX_URL=<i.e https://my.domain.com>                                                       
export HDX_KUBERNETES_NAMESPACE=<i.e. production-service> 
export HDX_DB_BUCKET_URL=s3://$HDX_KUBERNETES_NAMESPACE
export HDX_ADMIN_EMAIL=<i.e. me@domain.com>
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query "Account" --output text)"
export AWS_STORAGE_ROLE="arn:aws:iam::${AWS_ACCOUNT_ID}:role/${HDX_KUBERNETES_NAMESPACE}-bucket"

Next, run the following command to bring the variables into scope:

1	`source env.sh`

Configure the Cluster⚓︎

Create an EKS Cluster⚓︎

We use eksctl to build the EKS cluster. With eksctl, you can define scale and node group definitions in a YAML file. Because Hydrolix contains a few StatefulSet deployments, you must add the Amazon Elastic Block Store (Amazon EBS) Container Storage Interface (CSI) driver in the addons section. To create your configuration file, write the following to a file named eksctl.yaml:

eksctl.yaml

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: $HDX_KUBERNETES_NAMESPACE 
  region: $HDX_BUCKET_REGION 

addons:
  - name: aws-ebs-csi-driver

iam:
  withOIDC: true

managedNodeGroups:
  - name: nodegroup0 
    instanceType: c5n.4xlarge
    minSize: 0
    maxSize: 30
    desiredCapacity: 3
    volumeSize: 128
    privateNetworking: true

Instance Type

The choice of instanceType depends on your needs. We strongly recommend the c5n compute range due to its network bandwidth guarantees. c5n.4xlarge works well for for development clusters. For production use cases, use c5n.9xlarge at minimum.

Use the following command to replace the environment variables above with their values:

eval "echo \"$(cat eksctl.yaml)\"" > eksctl.yaml

Run the following command to create a cluster based on your configuration file:

eksctl create cluster -f eksctl.yaml

Sharing an existing VPC

The above command creates a new VPC. Amazon also provides an extended syntax for reusing an existing VPC available here and detailed in the full schema definition. Please review Amazon EKS VPC requirements document when selecting a VPC to join.

This step can take several minutes to complete. Thankfully, it provides a lot of progress updates in the terminal.

Create an S3 Bucket & IAM Policy⚓︎

Hydrolix stores your data in cloud storage. For this guide, we'll use the same region as the cluster you created in the previous step and use the same name as your namespace. Run the following command to create an S3 bucket for Hydrolix data storage:

aws s3 mb --region $HDX_BUCKET_REGION  $HDX_DB_BUCKET_URL

To enable Hydrolix access to this bucket, associate it with an IAM policy. Run the following command to define the permissions required for Hydrolix:

read -r -d '' POLICY_DOC << EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ListObjectsInBucket",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::${HDX_KUBERNETES_NAMESPACE}",
                "arn:aws:s3:::hdx-public"
            ]
        },
        {
            "Sid": "AllObjectActions",
            "Effect": "Allow",
            "Action": "s3:*Object",
            "Resource": [
                "arn:aws:s3:::${HDX_KUBERNETES_NAMESPACE}/*",
                "arn:aws:s3:::hdx-public/*"
            ]
        }
    ]
}
EOF

Then, run the following command to create the IAM policy associated with the permissions you just defined:

aws iam create-policy --policy-name "${HDX_KUBERNETES_NAMESPACE}-bucket" --policy-document "${POLICY_DOC}"

Create an IAM Policy for Service Accounts⚓︎

Hydrolix service accounts interact with the cluster using AssumeRoleWithWebIdentity. This is a session token based mechanism managed by an OpenID Connect (OIDC) provider.

When you create the cluster via eksctl, Amazon automatically enables an IAM OICD provider. Run the following command to access the information needed to connect to this provider:

export OIDC_PROVIDER="$(aws --region ${HDX_BUCKET_REGION} eks describe-cluster --name ${HDX_KUBERNETES_NAMESPACE} \
    --query "cluster.identity.oidc.issuer" --output text | sed -e "s/^https:\/\///")"

echo $OIDC_PROVIDER

Add the OIDC_PROVIDER environment variable to your env.sh script so it's available whenever you administrate your Hydrolix cluster.

Run the following command to define the OIDC managed service account policies:

read -r -d '' SA_POLICY_DOC << EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "${OIDC_PROVIDER}:aud": "sts.amazonaws.com",
          "${OIDC_PROVIDER}:sub": [
            "system:serviceaccount:${HDX_KUBERNETES_NAMESPACE}:hydrolix",
            "system:serviceaccount:${HDX_KUBERNETES_NAMESPACE}:turbine-api",
            "system:serviceaccount:${HDX_KUBERNETES_NAMESPACE}:vector",
            "system:serviceaccount:${HDX_KUBERNETES_NAMESPACE}:merge-controller",
            "system:serviceaccount:${HDX_KUBERNETES_NAMESPACE}:kinesis-coordinator",
            "system:serviceaccount:${HDX_KUBERNETES_NAMESPACE}:ingest"
          ]
        }
      }
    }
  ]
}
EOF

Run the following command to create a role using this policy:

aws iam create-role --role-name "${HDX_KUBERNETES_NAMESPACE}-bucket" \
    --assume-role-policy-document "${SA_POLICY_DOC}" \
    --description "${HDX_KUBERNETES_NAMESPACE}-bucket"

Finally, attach the service account IAM policy to the service account IAM role.

aws iam attach-role-policy --role-name "${HDX_KUBERNETES_NAMESPACE}-bucket" \
    --policy-arn="arn:aws:iam::${AWS_ACCOUNT_ID}:policy/${HDX_KUBERNETES_NAMESPACE}-bucket"

Create Namespace & GP3 Performance Disks⚓︎

It's best to deploy Hydrolix in its own namespace. Run the following command to create that namespace:

kubectl create namespace $HDX_KUBERNETES_NAMESPACE

Next, set the Hydrolix namespace as the default namespace in kubectl:

kubectl config set-context --current --namespace=$HDX_KUBERNETES_NAMESPACE

Hydrolix deploys StatefulSet infrastructure, which benefits greatly from high performance EBS storage due to the volume of data Hydrolix processes. Let's define a collection of high performance GP3 disks:

gp3.yaml

---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  encrypted: 'true'
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

Finally, run the following command to create the GP3 disks in your cluster:

kubectl apply -f gp3.yaml

Generate operator config⚓︎

The Hydrolix operator resources API generates all of the Kubernetes resource definitions required to deploy the operator, including service accounts and role permissions. Once deployed, the operator manages your Hydrolix cluster deployment. To upgrade your deployment to a new version, repeat this step.

Run the following command to generate the operator YAML file, named operator.yaml:

curl "https://www.hydrolix.io/operator/latest/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&aws-storage-role=${AWS_STORAGE_ROLE}" > operator.yaml

Configure Cluster Autoscaling⚓︎

Deploy the Metric Server⚓︎

Autoscaling requires a metrics server. Use the URL endpoint to deploy a metrics server in your cluster:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Create Autoscaler Node Group Policy⚓︎

You must provide the autoscaler with all the autoscale permissions in the IAM policy for the nodes in your namespace. Run the following command to define the permissions:

read -r -d '' AUTOSCALER_POLICY_DOC << EOF
{   
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "autoscaling:SetDesiredCapacity",
                "autoscaling:TerminateInstanceInAutoScalingGroup"
            ],
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/k8s.io/cluster-autoscaler/${HDX_KUBERNETES_NAMESPACE}": "owned"
                }
            }
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:DescribeAutoScalingGroups",
                "ec2:DescribeLaunchTemplateVersions",
                "autoscaling:DescribeTags",
                "autoscaling:DescribeLaunchConfigurations"
            ],
            "Resource": "*"
        }
    ]
}
EOF

Run the following command to create a policy for the autoscaler using those permissions:

aws iam create-policy --policy-name eks-${HDX_KUBERNETES_NAMESPACE}-autoscaler --policy-document "${AUTOSCALER_POLICY_DOC}"

Create Autoscaler Service Account Permissions⚓︎

Next, run the following command to define a role using the autoscaler policy you just created:

read -r -d '' AUTOSCALER_TRUST_DOC << EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "${OIDC_PROVIDER}:aud": "sts.amazonaws.com",
          "${OIDC_PROVIDER}:sub": "system:serviceaccount:kube-system:cluster-autoscaler"
        }
      }
    }
  ]
}
EOF

Then, create the role using the role definition:

aws iam create-role --role-name "eks-${HDX_KUBERNETES_NAMESPACE}-autoscaler" \
    --assume-role-policy-document "${AUTOSCALER_TRUST_DOC}" --description "eks-${HDX_KUBERNETES_NAMESPACE}-autoscaler"

Finally, attach the role to the policy:

aws iam attach-role-policy --role-name "eks-${HDX_KUBERNETES_NAMESPACE}-autoscaler" \
--policy-arn="arn:aws:iam::${AWS_ACCOUNT_ID}:policy/eks-${HDX_KUBERNETES_NAMESPACE}-autoscaler"

Deploy Cluster Autoscaler Autodiscovery⚓︎

Run the following command to download the cluster autoscaler autodiscovery configuration into a file named cluster-autoscaler-autodiscover.yaml:

curl -o cluster-autoscaler-autodiscover.yaml https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml

Replace the placeholder <YOUR CLUSTER NAME> with your actual namespace. If you've loaded your environment variables defined in env.sh, you can replace it with the following command:

sed -i'' -e 's/<YOUR CLUSTER NAME>/'"${HDX_KUBERNETES_NAMESPACE}"'/g' cluster-autoscaler-autodiscover.yaml

Otherwise, you can manually replace <YOUR CLUSTER NAME> in cluster-autoscaler-autodiscover.yaml with a text editor.

Then apply the autoscaler autodiscovery configuration changes to your cluster with kubectl:

kubectl apply -f cluster-autoscaler-autodiscover.yaml

As a final step, annotate the cluster autoscaler service account:

kubectl annotate serviceaccount cluster-autoscaler -n kube-system \
eks.amazonaws.com/role-arn=arn:aws:iam::${AWS_ACCOUNT_ID}:role/eks-${HDX_KUBERNETES_NAMESPACE}-autoscaler

Congratulations! You are now ready to deploy Hydrolix on Amazon EKS. Proceed to the next step to get Hydrolix running.