Deploy External PostgreSQL

By default, Hydrolix provisions an internal PostgreSQL instance to store the catalog. For the best performance when processing production-scale data loads, use an external instance of PostgreSQL. An external instance exists outside of the Kubernetes cluster that runs Hydrolix. This page describes how to configure an external PostgreSQL instance for a Hydrolix instance running on AWS.


This Guide Only Applies to New Deployments

This guide explains how to initially configure a Hydrolix instance to use an external PostgreSQL instance. To migrate a an existing Hydrolix cluster to an external PostgreSQL instance, see Migrate to External PostgreSQL.

Configure RDS PostgreSQL

The RDS PostgreSQL instance needs to be in the same account, VPC & AWS region as your EKS cluster. You will need this information to complete the creation on the instances.

Run the following command to find the VPC ID of your cluster:

aws eks describe-cluster --name ${HDX_KUBERNETES_NAMESPACE} --query 'cluster.resourcesVpcConfig.vpcId'
  1. In the AWS console, switch to the RDS service and click create Database. Select PostgreSQL and Engine version 11.12-R1 or greater.

  2. Select the "Production" template. Choose the "Multi-AZ DB instance" availability and durability option. Enter a database name.

  3. Supply a root username and password; Hydrolix uses these to access the database.

  4. In the storage section, select the general purpose gp3 storage type with 100 GiB of allocated storage.

  5. In the connectivity section, select the VPC ID associated with your EKS cluster. The dropdown lists all VPC in the region as: VPC Tag name (VPC ID). Select the default DB Subnet group.


    If you don't know your VPC ID, see the command provided at the beginning of this guide.

  6. Select both EKS cluster & node security groups from the Existing VPC security groups dropdown list. Use the default certificate authority and password authentication.

  7. Disable the following settings by unchecking their checkboxes:

    • Turn on Performance Insight
    • Enable auto minor version upgrade.
  8. Click Create Database to confirm your settings.

It takes about 10 minutes to create your database. When ready, AWS provides an endpoint to connect to your database. Find this endpoint in the Connectivity & security tab of the database details page. Use this endpoint as the catalog_db_host in the next step.

Define the External PostgreSQL Connection

Disable the internal PostgreSQL instance by setting scale.postgres.replicas to 0. Provide values for catalog_db_admin_user, catalog_db_admin_db, and catalog_db_host so your Hydrolix instance can connect to your newly created external PostgreSQL endpoint:

kind: HydrolixCluster
  name: hdx
  admin_email: ${HDX_ADMIN_EMAIL}
  db_bucket_region: ${HDX_BUCKET_REGION}
  db_bucket_url: ${HDX_DB_BUCKET_URL}
  env: {}
  hydrolix_name: hdx
  hydrolix_url: ${HDX_HYDROLIX_URL}
  catalog_db_admin_user: postgres  #<--- Add the admin user "postgres" to your config
  catalog_db_admin_db: postgres    #<--- Add the admin db "postgres" to your config
  catalog_db_host: <YOU HOST/IP>   #<--- Add the endpoint for your cluster
  - // TODO: Replace this with your IP address!
  kubernetes_namespace: ${HDX_KUBERNETES_NAMESPACE}
  overcommit: false
      replicas: 0                  #<---- Don't forget to set the internal postgres to 0!
  scale_profile: prod

Use the following command to replace the environment variables above with their values, sourced from environment variables defined in when you initially prepared your cluster:

eval "echo \"$(cat hydrolixcluster.yaml)\""

Create Secret

Store the PostgreSQL secret within a curated Kubernetes secret:

apiVersion: v1
kind: Secret
  name: curated
  namespace: <namespace>
  ROOT_DB_PASSWORD: <the password to your postgres>
type: Opaque

Apply your configuration

Run the following commands to add the secret and apply the settings to your Kubernetes cluster:

kubectl apply -f db-secret.yaml
kubectl apply -f hydrolix.yaml


Already Running Cluster

If your cluster is already running, run the following command to redeploy the cluster with these settings applied:

kubectl rollout restart deployment

What’s Next