Prepare a Cluster
This guide shows how to configure Amazon Elastic Kubernetes Service (EKS) to host a Hydrolix cluster. This uses Amazon's CLI tools to build and configure the cluster.
Terraform Examples
For examples of configuring an EKS environment with Terraform, see the hydrolix/terraform GitHub repository.
Prepare an EKS Cluster
Setup
Prepare Command Line Tools
Install and configure the following tools on your local machine:
aws
to interact with the Amazon serviceseksctl
to build the EKS clusterkubectl
to interact with the cluster
Create a Local Environment Variables File
This guide uses environment variables as a templating mechanism. We recommend putting them into a file so you can load them into scope within your terminal shell. Write the following environment variables into a file called env.sh
, replacing <>
with appropriate values for your deployment:
export AWS_PROFILE=<i.e dev/staging/prod>
export HDX_BUCKET_REGION=<i.e. us-east-2>
export HDX_HYDROLIX_URL=<i.e https://my.domain.com>
export HDX_KUBERNETES_NAMESPACE=<i.e. production-service>
export HDX_DB_BUCKET_URL=s3://$HDX_KUBERNETES_NAMESPACE
export HDX_ADMIN_EMAIL=<i.e. me@domain.com>
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query "Account" --output text)"
export AWS_STORAGE_ROLE="arn:aws:iam::${AWS_ACCOUNT_ID}:role/${HDX_KUBERNETES_NAMESPACE}-bucket"
Next, run the following command to bring the variables into scope:
source env.sh
Configure the Cluster
Create an EKS Cluster
We use eksctl
to build the EKS cluster. With eksctl
, you can define scale and node group definitions in a YAML file. Because Hydrolix contains a few StatefulSet
deployments, you must add the Amazon Elastic Block Store (Amazon EBS) Container Storage Interface (CSI) driver in the addons section. To create your configuration file, write the following to a file named eksctl.yaml
:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: $HDX_KUBERNETES_NAMESPACE
region: $HDX_BUCKET_REGION
addons:
- name: aws-ebs-csi-driver
iam:
withOIDC: true
managedNodeGroups:
- name: nodegroup0
instanceType: c5n.4xlarge
minSize: 0
maxSize: 30
desiredCapacity: 3
volumeSize: 128
privateNetworking: true
Instance Type
The choice of
instanceType
depends on your needs. We strongly recommend thec5n
compute range due to its network bandwidth guarantees.c5n.4xlarge
works well for for development clusters. For production use cases, usec5n.9xlarge
at minimum.
Use the following command to replace the environment variables above with their values:
eval "echo \"$(cat eksctl.yaml)\"" > eksctl.yaml
Run the following command to create a cluster based on your configuration file:
eksctl create cluster -f eksctl.yaml
Sharing an existing VPC
The above command creates a new VPC. Amazon also provides an extended syntax for reusing an existing VPC available here and detailed in the full schema definition. Please review Amazon EKS VPC requirements document when selecting a VPC to join.
This step can take several minutes to complete. Thankfully, it provides a lot of progress updates in the terminal.
Create an S3 Bucket & IAM Policy
Hydrolix stores your data in cloud storage. For this guide, we'll use the same region as the cluster you created in the previous step and use the same name as your namespace. Run the following command to create an S3 bucket for Hydrolix data storage:
aws s3 mb --region $HDX_BUCKET_REGION $HDX_DB_BUCKET_URL
To enable Hydrolix access to this bucket, associate it with an IAM policy. Run the following command to define the permissions required for Hydrolix:
read -r -d '' POLICY_DOC << EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ListObjectsInBucket",
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::${HDX_KUBERNETES_NAMESPACE}",
"arn:aws:s3:::hdx-public"
]
},
{
"Sid": "AllObjectActions",
"Effect": "Allow",
"Action": "s3:*Object",
"Resource": [
"arn:aws:s3:::${HDX_KUBERNETES_NAMESPACE}/*",
"arn:aws:s3:::hdx-public/*"
]
}
]
}
EOF
Then, run the following command to create the IAM policy associated with the permissions you just defined:
aws iam create-policy --policy-name "${HDX_KUBERNETES_NAMESPACE}-bucket" --policy-document "${POLICY_DOC}"
Create an IAM Policy for Service Accounts
Hydrolix service accounts interact with the cluster using AssumeRoleWithWebIdentity
. This is a session token based mechanism managed by an OpenID Connect (OIDC) provider.
When you create the cluster via eksctl
, Amazon automatically enables an IAM OICD provider. Run the following command to access the information needed to connect to this provider:
export OIDC_PROVIDER="$(aws --region ${HDX_BUCKET_REGION} eks describe-cluster --name ${HDX_KUBERNETES_NAMESPACE} \
--query "cluster.identity.oidc.issuer" --output text | sed -e "s/^https:\/\///")"
echo $OIDC_PROVIDER
Add the OIDC_PROVIDER
environment variable to your env.sh
script so it's available whenever you administrate your Hydrolix cluster.
Run the following command to define the OIDC managed service account policies:
read -r -d '' SA_POLICY_DOC << EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"${OIDC_PROVIDER}:aud": "sts.amazonaws.com",
"${OIDC_PROVIDER}:sub": [
"system:serviceaccount:${HDX_KUBERNETES_NAMESPACE}:hydrolix",
"system:serviceaccount:${HDX_KUBERNETES_NAMESPACE}:turbine-api",
"system:serviceaccount:${HDX_KUBERNETES_NAMESPACE}:vector",
"system:serviceaccount:${HDX_KUBERNETES_NAMESPACE}:merge-controller",
"system:serviceaccount:${HDX_KUBERNETES_NAMESPACE}:kinesis-coordinator"
]
}
}
}
]
}
EOF
Run the following command to create a role using this policy:
aws iam create-role --role-name "${HDX_KUBERNETES_NAMESPACE}-bucket" \
--assume-role-policy-document "${SA_POLICY_DOC}" \
--description "${HDX_KUBERNETES_NAMESPACE}-bucket"
Finally, attach the service account IAM policy to the service account IAM role.
aws iam attach-role-policy --role-name "${HDX_KUBERNETES_NAMESPACE}-bucket" \
--policy-arn="arn:aws:iam::${AWS_ACCOUNT_ID}:policy/${HDX_KUBERNETES_NAMESPACE}-bucket"
Create Namespace & GP3 Performance Disks
It's best to deploy Hydrolix in its own namespace. Run the following command to create that namespace:
kubectl create namespace $HDX_KUBERNETES_NAMESPACE
Next, set the Hydrolix namespace as the default namespace in kubectl
:
kubectl config set-context --current --namespace=$HDX_KUBERNETES_NAMESPACE
Hydrolix deploys StatefulSet
infrastructure, which benefits greatly from high performance EBS
storage due to the volume of data Hydrolix processes. Let's define a collection of high performance GP3 disks:
---
apiVersion storage.k8s.io/v1
kind StorageClass
metadata
name gp3
provisioner ebs.csi.aws.com
parameters
type gp3
encrypted'true'
reclaimPolicy Delete
volumeBindingMode WaitForFirstConsumer
Finally, run the following command to create the GP3 disks in your cluster:
kubectl apply -f gp3.yaml
Generate operator config
The Hydrolix operator resources API generates all of the Kubernetes resource definitions required to deploy the operator, including service accounts and role permissions. Once deployed, the operator manages your Hydrolix cluster deployment. To upgrade your deployment to a new version, repeat this step.
Run the following command to generate the operator YAML file, named operator.yaml:
curl "https://www.hydrolix.io/operator/latest/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&aws-storage-role=${AWS_STORAGE_ROLE}" > operator.yaml
Configure Cluster Autoscaling
Deploy the Metric Server
Autoscaling requires a metrics server. Use the URL endpoint to deploy a metrics server in your cluster:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Create Autoscaler Node Group Policy
You must provide the autoscaler with all the autoscale permissions in the IAM policy for the nodes in your namespace. Run the following command to define the permissions:
read -r -d '' AUTOSCALER_POLICY_DOC << EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"aws:ResourceTag/k8s.io/cluster-autoscaler/${HDX_KUBERNETES_NAMESPACE}": "owned"
}
}
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeAutoScalingGroups",
"ec2:DescribeLaunchTemplateVersions",
"autoscaling:DescribeTags",
"autoscaling:DescribeLaunchConfigurations"
],
"Resource": "*"
}
]
}
EOF
Run the following command to create a policy for the autoscaler using those permissions:
aws iam create-policy --policy-name eks-${HDX_KUBERNETES_NAMESPACE}-autoscaler --policy-document "${AUTOSCALER_POLICY_DOC}"
Create Autoscaler Service Account Permissions
Next, run the following command to define a role using the autoscaler policy you just created:
read -r -d '' AUTOSCALER_TRUST_DOC << EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"${OIDC_PROVIDER}:aud": "sts.amazonaws.com",
"${OIDC_PROVIDER}:sub": "system:serviceaccount:kube-system:cluster-autoscaler"
}
}
}
]
}
EOF
Then, create the role using the role definition:
aws iam create-role --role-name "eks-${HDX_KUBERNETES_NAMESPACE}-autoscaler" \
--assume-role-policy-document "${AUTOSCALER_TRUST_DOC}" --description "eks-${HDX_KUBERNETES_NAMESPACE}-autoscaler"
Finally, attach the role to the policy:
aws iam attach-role-policy --role-name "eks-${HDX_KUBERNETES_NAMESPACE}-autoscaler" \
--policy-arn="arn:aws:iam::${AWS_ACCOUNT_ID}:policy/eks-${HDX_KUBERNETES_NAMESPACE}-autoscaler"
Deploy Cluster Autoscaler Autodiscovery
Run the following command to download the cluster autoscaler autodiscovery configuration into a file named cluster-autoscaler-autodiscover.yaml
:
curl -o cluster-autoscaler-autodiscover.yaml https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml
Replace the placeholder <YOUR CLUSTER NAME>
with your actual namespace. If you've loaded your environment variables defined in env.sh
, you can replace it with the following command:
sed -i'' -e 's/<YOUR CLUSTER NAME>/'"${HDX_KUBERNETES_NAMESPACE}"'/g' cluster-autoscaler-autodiscover.yaml
Otherwise, you can manually replace <YOUR CLUSTER NAME>
in cluster-autoscaler-autodiscover.yaml
with a text editor.
Then apply the autoscaler autodiscovery configuration changes to your cluster with kubectl
:
kubectl apply -f cluster-autoscaler-autodiscover.yaml
As a final step, annotate the cluster autoscaler service account:
kubectl annotate serviceaccount cluster-autoscaler -n kube-system \
eks.amazonaws.com/role-arn=arn:aws:iam::${AWS_ACCOUNT_ID}:role/eks-${HDX_KUBERNETES_NAMESPACE}-autoscaler
Congratulations! You are now ready to deploy Hydrolix on Amazon EKS. Proceed to the next step to get Hydrolix running.
Updated 2 months ago