Deploy Production PostGres - EKS
By default, Hydrolix provisions a single internal PostgreSQL pod to store the catalog. Where production scales are required with the best performance it is suggested to use either an external instance of PostgreSQL or a Kubernetes hosted version of Postgres that is high availability (Crunchydata). This page describes how both these options can used in running a Hydrolix deployment on EKS.
Potential Unrecoverable Data Loss - Please read.
If you have been loading data and this is a migration, do not proceed unless you fully understand the migration process. Catalog loss can lead to data becoming unrecoverable. To migrate an existing deployment it is strongly suggested to talk to Hydrolix support and review the following page Migrate to External PostgreSQL.
Deploy a Kubernetes HA Postgres
Hydrolix has built in support for the Postgres Kubernetes Operator - (Crunchydata. Crunchy data is supplied externally of Hydrolix and the installation instructions into kubernetes can be found within their documentation here. Using the default install we have found is a good place to start kustomize/install/default
Once Crunchy data is deployed into your Kubernetes cluster the hydrolixcluster.yaml
can be edited to add the following into the spec
apiVersion: hydrolix.io/v1
kind: HydrolixCluster
metadata:
name: hdx
namespace: .......
spec:
admin_email: .......
use_crunchydata_postgres: true
db_bucket_region: .......
db_bucket_url: .......
env: {}
hydrolix_name: hdx
hydrolix_url: .......
ip_allowlist:
- .........
kubernetes_namespace: .......
scale:
postgres:
replicas: 2
By adding the use_crunchydata_postgres: true
this will enable Hydrolix to use Crunchy data PostGres.
It is suggested to also update the replica count for PostGres when this is enabled to at least two.
scale:
postgres:
replicas: 2
To confirm you have your new Crunchydata Postgres deployment running you can look for the pods that should have started successfully, they will be named main-main
$ kubectl get pods | grep main-main
main-main-4qjd-0 4/4 Running 0 60m
main-main-cgxw-0 4/4 Running 0 60m
Configure RDS PostgreSQL
The RDS PostgreSQL instance needs to be in the same account, VPC & AWS region as your EKS cluster. You will need this information to complete the creation on the instances.
Run the following command to find the VPC ID of your cluster:
aws eks describe-cluster --name ${HDX_KUBERNETES_NAMESPACE} --query 'cluster.resourcesVpcConfig.vpcId'
-
In the AWS console, switch to the RDS service and click create Database. Select PostgreSQL and Engine version 11.12-R1 or greater.
-
Select the "Production" template. Choose the "Multi-AZ DB instance" availability and durability option. Enter a database name.
-
Supply a root username and password; Hydrolix uses these to access the database.
-
In the storage section, select the general purpose gp3 storage type with 100 GiB of allocated storage.
-
In the connectivity section, select the VPC ID associated with your EKS cluster. The dropdown lists all VPC in the region as: VPC Tag name (VPC ID). Select the default DB Subnet group.
If you don't know your VPC ID, see the command provided at the beginning of this guide.
-
Select both EKS cluster & node security groups from the
Existing VPC security groups
dropdown list. Use the default certificate authority and password authentication. -
Disable the following settings by unchecking their checkboxes:
Turn on Performance Insight
Enable auto minor version upgrade
.
-
Click
Create Database
to confirm your settings.
It takes about 10 minutes to create your database. When ready, AWS provides an endpoint to connect to your database. Find this endpoint in the Connectivity & security
tab of the database details page. Use this endpoint as the catalog_db_host
in the next step.
Define the External PostgreSQL Connection
Disable the internal PostgreSQL instance by setting scale.postgres.replicas
to 0
. Provide values for catalog_db_admin_user
, catalog_db_admin_db
, and catalog_db_host
so your Hydrolix instance can connect to your newly created external PostgreSQL endpoint:
apiVersion: hydrolix.io/v1
kind: HydrolixCluster
metadata:
name: hdx
namespace: ${HDX_KUBERNETES_NAMESPACE}
spec:
admin_email: ${HDX_ADMIN_EMAIL}
db_bucket_region: ${HDX_BUCKET_REGION}
db_bucket_url: ${HDX_DB_BUCKET_URL}
env: {}
hydrolix_name: hdx
hydrolix_url: ${HDX_HYDROLIX_URL}
catalog_db_admin_user: postgres #<--- Add the admin user "postgres" to your config
catalog_db_admin_db: postgres #<--- Add the admin db "postgres" to your config
catalog_db_host: <YOU HOST/IP> #<--- Add the endpoint for your cluster
ip_allowlist:
- 42.78.92.98/32 // TODO: Replace this with your IP address!
kubernetes_namespace: ${HDX_KUBERNETES_NAMESPACE}
overcommit: false
scale:
postgres:
replicas: 0 #<---- Don't forget to set the internal postgres to 0!
scale_profile: prod
Use the following command to replace the environment variables above with their values, sourced from environment variables defined in env.sh
when you initially prepared your cluster:
eval "echo \"$(cat hydrolixcluster.yaml)\""
Create Secret
Store the PostgreSQL secret
within a curated Kubernetes secret:
---
apiVersion: v1
kind: Secret
metadata:
name: curated
namespace: <namespace>
stringData:
ROOT_DB_PASSWORD: <the password to your postgres>
type: Opaque
Apply your configuration
Run the following commands to add the secret and apply the settings to your Kubernetes cluster:
kubectl apply -f db-secret.yaml
kubectl apply -f hydrolix.yaml
Already Running Cluster
If your cluster is already running, run the following command to redeploy the cluster with these settings applied:
kubectl rollout restart deployment
Updated 10 months ago