Deploy Production PostGres - EKS

By default, Hydrolix provisions a single internal PostgreSQL pod to store the catalog. Where production scales are required with the best performance it is suggested to use either an external instance of PostgreSQL or a Kubernetes hosted version of Postgres that is high availability (Crunchydata). This page describes how both these options can used in running a Hydrolix deployment on EKS.

❗️

Potential Unrecoverable Data Loss - Please read.

If you have been loading data and this is a migration, do not proceed unless you fully understand the migration process. Catalog loss can lead to data becoming unrecoverable. To migrate an existing deployment it is strongly suggested to talk to Hydrolix support and review the following page Migrate to External PostgreSQL.

Deploy a Kubernetes HA Postgres

Hydrolix has built in support for the Postgres Kubernetes Operator - (Crunchydata. Crunchy data is supplied externally of Hydrolix and the installation instructions into kubernetes can be found within their documentation here. Using the default install we have found is a good place to start kustomize/install/default

Once Crunchy data is deployed into your Kubernetes cluster the hydrolixcluster.yaml can be edited to add the following into the spec

apiVersion: hydrolix.io/v1
kind: HydrolixCluster
metadata:
  name: hdx
  namespace: .......
spec:
  admin_email: .......
  use_crunchydata_postgres: true
  db_bucket_region: .......
  db_bucket_url: .......
  env: {}
  hydrolix_name: hdx
  hydrolix_url: .......
  ip_allowlist:
  - .........
  kubernetes_namespace: .......
  scale:
		postgres:
      replicas: 2

By adding the use_crunchydata_postgres: true this will enable Hydrolix to use Crunchy data PostGres.

It is suggested to also update the replica count for PostGres when this is enabled to at least two.

  scale:
		postgres:
      replicas: 2

To confirm you have your new Crunchydata Postgres deployment running you can look for the pods that should have started successfully, they will be named main-main

$ kubectl get pods | grep main-main
main-main-4qjd-0                      4/4     Running     0             60m
main-main-cgxw-0                      4/4     Running     0             60m

Configure RDS PostgreSQL

The RDS PostgreSQL instance needs to be in the same account, VPC & AWS region as your EKS cluster. You will need this information to complete the creation on the instances.

Run the following command to find the VPC ID of your cluster:

aws eks describe-cluster --name ${HDX_KUBERNETES_NAMESPACE} --query 'cluster.resourcesVpcConfig.vpcId'
  1. In the AWS console, switch to the RDS service and click create Database. Select PostgreSQL and Engine version 11.12-R1 or greater.

  2. Select the "Production" template. Choose the "Multi-AZ DB instance" availability and durability option. Enter a database name.

  3. Supply a root username and password; Hydrolix uses these to access the database.

  4. In the storage section, select the general purpose gp3 storage type with 100 GiB of allocated storage.

  5. In the connectivity section, select the VPC ID associated with your EKS cluster. The dropdown lists all VPC in the region as: VPC Tag name (VPC ID). Select the default DB Subnet group.

    📘

    If you don't know your VPC ID, see the command provided at the beginning of this guide.

  6. Select both EKS cluster & node security groups from the Existing VPC security groups dropdown list. Use the default certificate authority and password authentication.

  7. Disable the following settings by unchecking their checkboxes:

    • Turn on Performance Insight
    • Enable auto minor version upgrade.
  8. Click Create Database to confirm your settings.

It takes about 10 minutes to create your database. When ready, AWS provides an endpoint to connect to your database. Find this endpoint in the Connectivity & security tab of the database details page. Use this endpoint as the catalog_db_host in the next step.

Define the External PostgreSQL Connection

Disable the internal PostgreSQL instance by setting scale.postgres.replicas to 0. Provide values for catalog_db_admin_user, catalog_db_admin_db, and catalog_db_host so your Hydrolix instance can connect to your newly created external PostgreSQL endpoint:

apiVersion: hydrolix.io/v1
kind: HydrolixCluster
metadata:
  name: hdx
  namespace: ${HDX_KUBERNETES_NAMESPACE}
spec:
  admin_email: ${HDX_ADMIN_EMAIL}
  db_bucket_region: ${HDX_BUCKET_REGION}
  db_bucket_url: ${HDX_DB_BUCKET_URL}
  env: {}
  hydrolix_name: hdx
  hydrolix_url: ${HDX_HYDROLIX_URL}
  catalog_db_admin_user: postgres  #<--- Add the admin user "postgres" to your config
  catalog_db_admin_db: postgres    #<--- Add the admin db "postgres" to your config
  catalog_db_host: <YOU HOST/IP>   #<--- Add the endpoint for your cluster
  ip_allowlist:
  - 42.78.92.98/32 // TODO: Replace this with your IP address!
  kubernetes_namespace: ${HDX_KUBERNETES_NAMESPACE}
  overcommit: false
  scale:
    postgres:
      replicas: 0                  #<---- Don't forget to set the internal postgres to 0!
  scale_profile: prod

Use the following command to replace the environment variables above with their values, sourced from environment variables defined in env.sh when you initially prepared your cluster:

eval "echo \"$(cat hydrolixcluster.yaml)\""

Create Secret

Store the PostgreSQL secret within a curated Kubernetes secret:

---
apiVersion: v1
kind: Secret
metadata:
  name: curated
  namespace: <namespace>
stringData:
  ROOT_DB_PASSWORD: <the password to your postgres>
type: Opaque

Apply your configuration

Run the following commands to add the secret and apply the settings to your Kubernetes cluster:

kubectl apply -f db-secret.yaml
kubectl apply -f hydrolix.yaml

📘

Already Running Cluster

If your cluster is already running, run the following command to redeploy the cluster with these settings applied:

kubectl rollout restart deployment