Deploy Hydrolix

Hydrolix deployments follow the Kubernetes operator pattern. To deploy Hydrolix, generate an operator configuration (operator.yaml) and a custom resource Hydrolix configuration (hydrolixcluster.yaml). You'll use these files to deploy Hydrolix on your Kubernetes cluster.

📘

Prerequisite: Environment Variables

These CLI commands require you to set environment variables before generating the configuration. See Prepare your GKE Cluster for more information about the required inputs.

Configure and Deploy the Hydrolix Operator

The operator-resources command generates the Kubernetes resource definitions required for deploying the operator, service accounts, and role permissions. The operator manages all Hydrolix cluster deployments. Run the following command to generate a YAML operator configuration file for your cluster:

curl "https://www.hydrolix.io/operator/latest/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&gcp-storage-sa=${GCP_STORAGE_SA}" > operator.yaml

Next, use the Kubernetes command line tool (kubectl) to apply the generated configuration to your Kubernetes cluster:

kubectl apply -f operator.yaml

Configure and Deploy a Hydrolix Cluster

The hydrolix-cluster command generates the hydrolixcluster.yaml deployment file. We provide scale profiles for various cloud providers and deployment sizes. You can optionally specify a profile using the scale-profile flag. By default, Hydrolix uses a minimal profile. Add the following to a file named hydrolixcluster.yaml to generate a YAML cluster configuration file for a dev scale deployment:

apiVersion: hydrolix.io/v1
kind: HydrolixCluster
metadata:
  name: hdx
  namespace: ${HDX_KUBERNETES_NAMESPACE}
spec:
  admin_email: ${HDX_ADMIN_EMAIL}
  db_bucket_region: ${HDX_BUCKET_REGION}
  db_bucket_url: ${HDX_DB_BUCKET_URL}
  env: {}
  hydrolix_name: hdx
  hydrolix_url: ${HDX_HYDROLIX_URL}
  ip_allowlist:
  - 0.0.0.0/0 #TODO: Replace this with your IP address in CIDR notation, eg. 12.13.14.15/32
  kubernetes_namespace: ${HDX_KUBERNETES_NAMESPACE}
  overcommit: false
  scale_profile: dev

The above config will deploy, among other things, a default, internal Postgres instance that is non-HA. If you want to run a more resilient version, read our Deploy Production Postgres guide .

Use the following command to replace the environment variables above with their values:

eval "echo \"$(cat hydrolixcluster.yaml)\"" > hydrolixcluster.yaml

Don't forget to add your IP address to the allowlist. You can get your IP address by running curl -s ifconfig.me.

📘

Manually Edit Configuration Files

You can also edit the hydrolixcluster.yaml to tune each deployment to your resource requirements.

Next, use the Kubernetes command line tool (kubectl) to apply the generated configuration to your Kubernetes cluster:

kubectl apply -f hydrolixcluster.yaml

Create Your DNS Record

Next, create a DNS record so you can access your cluster. Run the following command to retrieve the traefik record:

kubectl get service/traefik --namespace=$HDX_KUBERNETES_NAMESPACE

You should see output similar to the following:

NAME          TYPE           CLUSTER-IP       EXTERNAL-IP                                                                     PORT(S)                                AGE                                                                          8089/TCP                               68m
traefik       LoadBalancer   10.64.14.42    34.66.136.134   80:31708/TCP,9000:32344/TCP            2m50s

If the response you receive instead is

Error from server (NotFound): services "traefik" not found

try restarting the operator with

kubectl -n $HDX_KUBERNETES_NAMESPACE rollout restart deployment operator

Check Deployment Status

You can now check the status of your deployment. Run the followingkubectl command to see the status of all pods in your cluster:

kubectl get pods --namespace $HDX_KUBERNETES_NAMESPACE

You should see output similar to the following:

NAME                             READY   STATUS      RESTARTS   AGE
autoingest-658f799497-czw59      1/1     Running     0          5m44s
batch-head-bcf7869bc-fm794       1/1     Running     0          5m46s
batch-peer-555df86d8-svlmw       2/2     Running     0          5m45s
decay-78775df79d-ppxpf           1/1     Running     0          5m45s
init-cluster-v3-16-0-6fcml       0/1     Completed   0          5m45s
init-turbine-api-v3-16-0-jqt4m   0/1     Completed   0          5m46s
intake-api-747cdd5d4d-vrsjm      1/1     Running     0          5m45s
keycloak-68fcff9b69-p4lt5        1/1     Running     0          5m46s
load-sample-project-nv8dl        1/1     Running     0          5m44s
merge-head-7df478d57-7qgwn       1/1     Running     0          5m44s
merge-peer-dbb68cc75-c8fl4       1/1     Running     0          5m45s
merge-peer-dbb68cc75-ntwpj       1/1     Running     0          5m45s
operator-55d4dfff6f-pktrl        1/1     Running     0          7m10s
postgres-0                       1/1     Running     0          5m46s
prometheus-0                     2/2     Running     0          5m45s
query-head-65bf688594-l9prj      1/1     Running     0          5m45s
query-peer-67dfcccb56-h6rkw      1/1     Running     0          5m44s
rabbitmq-0                       1/1     Running     0          5m46s
reaper-647d474f5-mfgww           1/1     Running     0          5m44s
redpanda-0                       2/2     Running     0          5m46s
redpanda-1                       2/2     Running     0          5m23s
redpanda-2                       2/2     Running     0          3m38s
stream-head-6ccc9779df-7jvzf     1/1     Running     0          5m43s
stream-peer-6db9464bd5-cgq6x     2/2     Running     0          5m44s
traefik-6f898fd647-lxf84         2/2     Running     0          5m43s
turbine-api-65d44c7d54-crpcm     1/1     Running     0          5m43s
ui-5b8bc9c9d4-pgjtv              1/1     Running     0          5m43s
validator-769ff76ddb-5mm5w       2/2     Running     0          5m43s
vector-557q5                     1/1     Running     0          4m58s
vector-5ttd4                     1/1     Running     0          5m46s
vector-5z8zq                     1/1     Running     0          5m46s
vector-qnpn9                     1/1     Running     0          5m46s
vector-r8pj6                     1/1     Running     0          3m4s
version-848c8c964c-j2khx         1/1     Running     0          5m43s
zookeeper-0                      1/1     Running     0          5m46s

You can also check your cluster status in the Google Cloud console.

Enable IP Access and TLS

Configure IP Access control and a TLS certificate. You can find instructions in Enabling Access & TLS.

Login

You should have received an email that will now allow you to set a password and login. If you do not receive this e-mail, or have trouble logging in, try these things:

  • Verify the e-mail address in your hydrolixcluster.yaml file is correct and that you can receive mail sent to it.
  • Try the "Forgot my password" option on the login page.
  • If those two steps fail, contact us at support@hydrolix.io and we'll happily assist you.

Once you are able to log in to your Hydrolix cluster, setup is complete, and you are ready to store and query data. Proceed to the next step only if you want to query your data via the Hydrolix Spark Connector

(Spark Connector only) Add a Credential to the Storage Bucket

To query the Hydrolix Cluster via the Hydrolix Spark Connector, configure a credential for your storage bucket. The following steps will walk you through generating a new credential and updating your storage bucket to use the credential.

Step 1: Create a credential

This step is best accomplished in the UI. Download the credentials.json file from Google containing your keys. If you need to create a new credential, or you're not sure where to find this file, see Google's Create credentials for a service account instructions.

Within the Hydrolix cluster UI, select Add new -> Credential in your Hydrolix cluster UI. Fill out the ensuing form with the following:

  • Supply a name and description for your credential
  • Select gcp_service_account_keys for Cloud Provider Type
  • Upload your Google credentials file
  • Review the fields filled in from the supplied credentials file then select Create credential
Name: gcp_credential
Description: A credential for the default GCP bucket
Cloud Provider Type: gcp_service_account_keys
Upload Credential JSON (optional): gcp-credential-options
Type: service_account
Project Id: hdx-cluster-docs
Private Key Id: private_key_id
Private Key: private_key_goes_here
Client Email: user@hdx-cluster-docs.iam.gserviceaccount.com
Client Id: {id}
Auth Uri: https://accounts.google.com/o/oauth2/auth
Token Uri: https://oauth2.googleapis.com/token
Auth Provider X509 Cert Url: https://www.googleapis.com/oauth2/v1/certs
Client X509 Cert Url: https://www.googleapis.com/robot/v1/metadata/x509

You can review your new credential by navigating to Security -> Credentials, then selecting your credential by name. You can also do this using the API via the List Credentials endpoint. You will need your credential ID for the next step.

Step 2: Attach the Credential to the Storage Bucket

Using the update storage endpoint, in the next steps you will attach your newly created credential to the storage bucket.

Set settings.credential_id to the ID of the credential you created in the previous step. This is the Credential ID in the UI or uuid in the API response to List Credentials.

Credential ID in the UI

Credential ID: d70d9fc4-8422-496c-98ce-f59aed82099d
Name: gcp service account credential
Description: A test credential for a gcp storage bucket
Cloud Provider Type: gcp_service_account_keys
HDX credential key: K1E4FB4F63DC14C9CAD3C1ED56D412FA0

Credential ID in the API Response

[
    {
        "name": "gcp service account credential",
        "type": "gcp_service_account_keys",
        "cloud": "gcp",
        "org": "ae5e3698-b13a-4f8f-ab82-ad2fa391a1a8",
        "description": "A test credential for a gcp storage bucket",
        "uuid": "d70d9fc4-8422-496c-98ce-f59aed82099*,
        ...

Append ?force_operation=true to the URL.

The following is an example cURL request attaching a credential to the default Google storage bucket:

curl --request PUT \
     --url https://{hdx-cluster-host}/config/v1/orgs/{org-id}/storages/{bucket-id}\?force_operation\=true \
     --header 'accept: application/json' \
     --header 'authorization: Bearer {token}' \
     --header 'content-type: application/json' \
     --data '
{
  "settings": {
    "bucket_path": "/",
    "is_default": true,
    "bucket_name": "{bucket-name}",
    "cloud": "gcp",
    "credential_id": "{credential-id-as-a-string}",
    "region": "{region}"
  },
  "name": "hdx_primary",
  "uuid": "{bucket-id}",
  "description": "The default google storage bucket"
}
'

Once you've completed these steps, your cluster can receive queries from the Hydrolix Spark Connector.