External RDS Postgres - Multi-AZ

A regular Hydrolix deployment includes an internal Postgres instance in a single availability zone. It is a critical component in our architecture as it contains the catalog of data partitions used by many services.

We recommend in a production deployment to use an external multi-AZ Postgres configuration. This can easily be achieved through the AWS RDS console. The following guide provide all instructions.

🚧

Note

If you are looking to migrate a cluster that already has data loaded please contact [email protected] before commencing this change. The catalog (Postgres) is a critical part of the Hydrolix service and you can cause irreparable damage to the cluster if it is moved incorrectly.

AWS RDS console settings

The RDS Postgres instances needs to be in the same account, VPC & AWS region as your EKS cluster. You will need this information to complete the creation on the instances.

To find the VPC Id of your cluster, run the following command

aws eks describe-cluster --name ${HDX_KUBERNETES_NAMESPACE} --query 'cluster.resourcesVpcConfig.vpcId'

In the AWS console, switch to the RDS service and click create Database. Select Postgres and Engine version (11.12-R1 or greater).

1266

Select Production if it's not automatically selected, Multi-AZ and provide a database name

1272

You need to supply a root database password. We will use this as a K8s secret later on to enable access from Hydrolix services - don't forget it.

1288

In the storage section, select gp3 disk and 100 GiB storage

1288

In the connectivity section, select the VPC Id associated with your EKS cluster (see command in at the top of this guide if you don't know your VPC ID). The drop-down list will list all VPC in the region as: VPC Tag name (VPC Id). Then select the default subnet group.

1200

Select the default security group, certificate authority and password authentication.

1192

The final settings are to uncheck Turn on Performance Insight and Enable auto minor version upgrade. You can now confirm by clicking Create Database.

It takes about 10 minutes to create. When ready, you will be provided with an endpoint for database connection. This is displayed on the **Connectivity & security** tab of the database details page. You will use this in hydrolix.yaml as the catalog_db_host you wish to connect too.

Creating a Hydrolix cluster yaml.

The hydrolix-cluster command generates the hydrolix.yaml deployment file. We have provided a number of scale profiles for various cloud providers and deployment sizes. You can specify a profile using the scale-profile flag. You can also edit the hydrolix.yaml to tune each deployment to your resource requirements. The following instructions create a prod scale deployment and apply it to your cluster.

hkt hydrolix-cluster --scale-profile prod --ip-allowlist `curl -s ifconfig.me`/32 > hydrolix.yaml

Once you have the basic file created you will need to edit it.

Edit your hydrolixcluster.yaml

Open the hydrolix.yaml in your favourite text editor and you will need to edit the following values. It is import that we disable the internal Postgres replicas: 0 and provide the catalog_db_host to the newly created external Postgres endpoint.

---
apiVersion: hydrolix.io/v1
kind: HydrolixCluster
metadata:
  name: <NameSpace>                #<--- Should already be set
spec:
  admin_email: <admin email>       #<--- Should already be set
  db_bucket_url: <bucket path>     #<--- Should already be set
  db_bucket_region: <region>       #<--- Should already be set
  hydrolix_url: <hostname to use>  #<--- Should already be set
  catalog_db_admin_user: postgres  #<--- Add the admin user "postgres" to your config
  catalog_db_admin_db: postgres    #<--- Add the admin db "postgres" to your config
  catalog_db_host: <YOU HOST/IP>   #<--- Add the endpoint for your cluster
  ip_allowlist:
    - 111.222.333.444/32           #<--- Should be already set
  scale_profile: prod               #<--- Should be already set
  scale:
    postgres:
      replicas: 0                  #<---- Don't forget to set the internal postgres to 0!

For example:

---
apiVersion: hydrolix.io/v1
kind: HydrolixCluster
metadata:
  name: my-namespace
spec:
  admin_email: [email protected]
  db_bucket_url: s3://my-namespace
  db_bucket_region: us-east-1
  hydrolix_url: http://my.domain.com
  catalog_db_admin_user: postgres
  catalog_db_admin_db: postgres
  catalog_db_host: hdx-rds-test.xxxyyyzzz.us-east-1.rds.amazonaws.com
  pg_ssl_mode: disable
  ip_allowlist:
    - 111.222.333.444/32
  scale_profile: prod
  scale:
    postgres:
      replicas: 0

❗️

Postgres

In the Scale portion of your file make sure to set Postgres to 0 replicas in the scale section. This is so the internal Postgres instance isn't started.

   postgres:
     replicas: 0

Create your secret

The Postgres secret should be held within a curated secret within Kubernetes i.e db-secret.yaml.

---
apiVersion: v1
kind: Secret
metadata:
  name: curated
  namespace: <namespace>
stringData:
  ROOT_DB_PASSWORD: <the password to your postgres>
type: Opaque

Apply your configuration

The following commands will apply these settings.

kubectl apply -f db-secret.yaml
kubectl apply -f hydrolix.yaml

📘

Already Running Cluster

If you have created this after you have already deployed, you will need to do a kubectl rollout restart deployment for the cluster.