Deploy External PostgreSQL
By default, Hydrolix provisions an internal PostgreSQL instance to store the catalog. For the best performance when processing production-scale data loads, use an external instance of PostgreSQL. An external instance exists outside of the Kubernetes cluster that runs Hydrolix. This page describes how to configure an external PostgreSQL instance for a Hydrolix instance running on AWS.
This Guide Only Applies to New Deployments
This guide explains how to initially configure a Hydrolix instance to use an external PostgreSQL instance. To migrate a an existing Hydrolix cluster to an external PostgreSQL instance, see Migrate to External PostgreSQL.
Configure RDS PostgreSQL
The RDS PostgreSQL instance needs to be in the same account, VPC & AWS region as your EKS cluster. You will need this information to complete the creation on the instances.
Run the following command to find the VPC ID of your cluster:
aws eks describe-cluster --name ${HDX_KUBERNETES_NAMESPACE} --query 'cluster.resourcesVpcConfig.vpcId'
-
In the AWS console, switch to the RDS service and click create Database. Select PostgreSQL and Engine version 11.12-R1 or greater.
-
Select the "Production" template. Choose the "Multi-AZ DB instance" availability and durability option. Enter a database name.
-
Supply a root username and password; Hydrolix uses these to access the database.
-
In the storage section, select the general purpose gp3 storage type with 100 GiB of allocated storage.
-
In the connectivity section, select the VPC ID associated with your EKS cluster. The dropdown lists all VPC in the region as: VPC Tag name (VPC ID). Select the default DB Subnet group.
If you don't know your VPC ID, see the command provided at the beginning of this guide.
-
Select both EKS cluster & node security groups from the
Existing VPC security groups
dropdown list. Use the default certificate authority and password authentication. -
Disable the following settings by unchecking their checkboxes:
Turn on Performance Insight
Enable auto minor version upgrade
.
-
Click
Create Database
to confirm your settings.
It takes about 10 minutes to create your database. When ready, AWS provides an endpoint to connect to your database. Find this endpoint in the Connectivity & security
tab of the database details page. Use this endpoint as the catalog_db_host
in the next step.
Define the External PostgreSQL Connection
Disable the internal PostgreSQL instance by setting scale.postgres.replicas
to 0
. Provide values for catalog_db_admin_user
, catalog_db_admin_db
, and catalog_db_host
so your Hydrolix instance can connect to your newly created external PostgreSQL endpoint:
apiVersion: hydrolix.io/v1
kind: HydrolixCluster
metadata:
name: hdx
namespace: ${HDX_KUBERNETES_NAMESPACE}
spec:
admin_email: ${HDX_ADMIN_EMAIL}
db_bucket_region: ${HDX_BUCKET_REGION}
db_bucket_url: ${HDX_DB_BUCKET_URL}
env: {}
hydrolix_name: hdx
hydrolix_url: ${HDX_HYDROLIX_URL}
catalog_db_admin_user: postgres #<--- Add the admin user "postgres" to your config
catalog_db_admin_db: postgres #<--- Add the admin db "postgres" to your config
catalog_db_host: <YOU HOST/IP> #<--- Add the endpoint for your cluster
ip_allowlist:
- 42.78.92.98/32 // TODO: Replace this with your IP address!
kubernetes_namespace: ${HDX_KUBERNETES_NAMESPACE}
overcommit: false
scale:
postgres:
replicas: 0 #<---- Don't forget to set the internal postgres to 0!
scale_profile: prod
Use the following command to replace the environment variables above with their values, sourced from environment variables defined in env.sh
when you initially prepared your cluster:
eval "echo \"$(cat hydrolixcluster.yaml)\""
Create Secret
Store the PostgreSQL secret
within a curated Kubernetes secret:
---
apiVersion: v1
kind: Secret
metadata:
name: curated
namespace: <namespace>
stringData:
ROOT_DB_PASSWORD: <the password to your postgres>
type: Opaque
Apply your configuration
Run the following commands to add the secret and apply the settings to your Kubernetes cluster:
kubectl apply -f db-secret.yaml
kubectl apply -f hydrolix.yaml
Already Running Cluster
If your cluster is already running, run the following command to redeploy the cluster with these settings applied:
kubectl rollout restart deployment
Updated 10 months ago