Deploy Hydrolix

Hydrolix deployments follow the Kubernetes operator pattern. To deploy Hydrolix, generate an operator configuration (operator.yaml) and a custom resource Hydrolix configuration (hydrolixcluster.yaml). You'll use these files to deploy Hydrolix on your Kubernetes cluster.

📘
Prerequisite: Environment Variables
These CLI commands require you to set environment variables before generating the configuration. See Prepare your GKE Cluster for more information about the required inputs.

Configure and Deploy the Hydrolix Operator

The operator-resources command generates the Kubernetes resource definitions required for deploying the operator, service accounts, and role permissions. The operator manages all Hydrolix cluster deployments. Run the following command to generate a YAML operator configuration file for your cluster:

curl "https://www.hydrolix.io/operator/latest/operator-resources?namespace=${HDX_KUBERNETES_NAMESPACE}&gcp-storage-sa=${GCP_STORAGE_SA}" > operator.yaml

Next, use the Kubernetes command line tool (kubectl) to apply the generated configuration to your Kubernetes cluster:

kubectl apply -f operator.yaml

Configure and Deploy a Hydrolix Cluster

The hydrolix-cluster command generates the hydrolixcluster.yaml deployment file. We provide scale profiles for various cloud providers and deployment sizes. You can optionally specify a profile using the scale-profile flag. By default, Hydrolix uses a minimal profile. Add the following to a file named hydrolixcluster.yaml to generate a YAML cluster configuration file for a dev scale deployment:

apiVersion: hydrolix.io/v1
kind: HydrolixCluster
metadata:
  name: hdx
  namespace: ${HDX_KUBERNETES_NAMESPACE}
spec:
  admin_email: ${HDX_ADMIN_EMAIL}
  db_bucket_region: ${HDX_BUCKET_REGION}
  db_bucket_url: ${HDX_DB_BUCKET_URL}
  env: {}
  hydrolix_name: hdx
  hydrolix_url: ${HDX_HYDROLIX_URL}
  ip_allowlist:
  - 0.0.0.0/0 #TODO: Replace this with your IP address in CIDR notation, eg. 12.13.14.15/32
  overcommit: false
  scale_profile: dev

The above config will deploy, among other things, a default, internal Postgres instance that is non-HA. If you want to run a more resilient version, read our Deploy Production Postgres guide .

Use the following command to replace the environment variables above with their values:

eval "echo \"$(cat hydrolixcluster.yaml)\"" > hydrolixcluster.yaml

Don't forget to add your IP address to the allowlist. You can get your IP address by running curl -s ifconfig.me.

📘
Manually Edit Configuration Files
You can also edit the hydrolixcluster.yaml to tune each deployment to your resource requirements.

Next, use the Kubernetes command line tool (kubectl) to apply the generated configuration to your Kubernetes cluster:

kubectl apply -f hydrolixcluster.yaml

Create Your DNS Record

Next, create a DNS record so you can access your cluster. Run the following command to retrieve the traefik record:

kubectl get service/traefik --namespace=$HDX_KUBERNETES_NAMESPACE

You should see output similar to the following:

NAME          TYPE           CLUSTER-IP       EXTERNAL-IP                                                                     PORT(S)                                AGE                                                                          8089/TCP                               68m
traefik       LoadBalancer   10.64.14.42    34.66.136.134   80:31708/TCP,9000:32344/TCP            2m50s

If the response you receive instead is

Error from server (NotFound): services "traefik" not found

try restarting the operator with

kubectl -n $HDX_KUBERNETES_NAMESPACE rollout restart deployment operator

Check Deployment Status

You can now check the status of your deployment. Run the followingkubectl command to see the status of all pods in your cluster:

kubectl get pods --namespace $HDX_KUBERNETES_NAMESPACE

You should see output similar to the following:

NAME                                  READY   STATUS      RESTARTS   AGE
check-bucket-access-v5soln3bp-p67xk   0/1     Completed   0          2m50s
hydrologs-0646fb7a-cc6649766-lk2vh    2/2     Running     0          2m48s
init-acme-509c50f0-kmjfd              0/1     Completed   3          2m49s
init-cluster-v4-17-0-b86e0119-dkxdp   0/1     Completed   0          2m50s
init-turbine-api-v4-17-0-lj45f        0/1     Completed   0          2m50s
intake-head-858bb875b-58l2b           2/2     Running     0          2m44s
intake-head-858bb875b-gvnn8           2/2     Running     0          2m44s
keycloak-575d78ff58-8kfx6             1/1     Running     0          2m48s
load-sample-project-96jdt             1/1     Running     0          2m49s
merge-cleanup-29199670-dr7q8          0/1     Completed   0          2m48s
merge-head-b9cdfbc6b-gthf2            1/1     Running     0          2m48s
merge-peer-6cdbdf5f8d-7b8pf           2/2     Running     0          2m46s
merge-peer-ii-957b77b85-vnpsb         2/2     Running     0          2m45s
merge-peer-iii-7cb7977bf7-tkpdm       2/2     Running     0          2m46s
monitor-ingest-7fb565c97d-x8pdn       1/1     Running     0          2m46s
operator-586f67bcb6-wxfjm             1/1     Running     0          2m57s
pushgateway-f6864dd79-w6d6z           1/1     Running     0          2m48s
query-head-64cf6c7f5f-zwqzk           1/1     Running     0          2m45s
query-peer-5cd45fc74d-m25zt           1/1     Running     0          2m44s
rabbitmq-0                            1/1     Running     0          2m49s
redpanda-0                            2/2     Running     0          2m50s
refresh-job-statuses-29199674-nqqv5   0/1     Completed   0          50s
stale-job-monitor-29199670-vfxgp      0/1     Completed   0          2m48s
task-monitor-29199674-9lzvr           0/1     Completed   0          50s
traefik-6cc4c57894-7qvjr              3/3     Running     0          2m46s
traefik-6cc4c57894-q5qkl              3/3     Running     0          2m46s
turbine-api-7564c994b9-9lwjk          1/1     Running     0          2m47s
turbine-api-7564c994b9-pg7j4          1/1     Running     0          2m47s
ui-5d9d5ccb46-vhcz6                   1/1     Running     0          2m46s
usagemeter-648ff5b88c-tvmz7           1/1     Running     0          2m50s
validator-5bd656d65f-mbbvw            2/2     Running     0          2m14s
vector-2nftt                          1/1     Running     0          2m9s
vector-cw5wq                          1/1     Running     0          2m9s
vector-klj5l                          1/1     Running     0          89s
vector-p47pb                          1/1     Running     0          2m10s
vector-q4lnn                          1/1     Running     0          119s
vector-qxgbk                          1/1     Running     0          2m9s
vector-t8fpz                          1/1     Running     0          2m
vector-v2fpq                          1/1     Running     0          2m9s
vector-vtwvr                          1/1     Running     0          2m49s
vector-xwmp6                          1/1     Running     0          2m11s
version-5c69568597-tpg5k              1/1     Running     0          2m47s
zookeeper-0                           1/1     Running     0          2m49s
zookeeper-1                           1/1     Running     0          2m34s
zookeeper-2                           1/1     Running     0          107s

You can also check your cluster status in the Google Cloud console.

Enable IP Access and TLS

Configure IP Access control and a TLS certificate. You can find instructions in Secure a Kubernetes Cluster.

Login

You should have received an email that will now allow you to set a password and login. If you do not receive this e-mail, or have trouble logging in, try these things:

Verify the e-mail address in your hydrolixcluster.yaml file is correct and that you can receive mail sent to it.
Try the "Forgot my password" option on the login page.
If those two steps fail, contact us at [email protected] and we'll happily assist you.

Once you are able to log in to your Hydrolix cluster, setup is complete, and you are ready to store and query data. Proceed to the next step only if you want to query your data via the Hydrolix Connector for Apache Spark

(Hydrolix Connector for Apache Spark only) Add a Credential to the Storage Bucket

To query the Hydrolix Cluster via the Hydrolix Connector for Apache Spark, configure a credential for your storage bucket. The following steps will walk you through generating a new credential and updating your storage bucket to use the credential.

Step 1: Create a credential

This step is best accomplished in the UI. Download the credentials.json file from Google containing your keys. If you need to create a new credential, or you're not sure where to find this file, see Google's Create credentials for a service account instructions.

Within the Hydrolix cluster UI, select Add new -> Credential in your Hydrolix cluster UI. Fill out the ensuing form with the following:

Supply a name and description for your credential
Select gcp_service_account_keys for Cloud Provider Type
Upload your Google credentials file
Review the fields filled in from the supplied credentials file then select Create credential

Name: gcp_credential
Description: A credential for the default GCP bucket
Cloud Provider Type: gcp_service_account_keys
Upload Credential JSON (optional): gcp-credential-options
Type: service_account
Project Id: hdx-cluster-docs
Private Key Id: private_key_id
Private Key: private_key_goes_here
Client Email: [email protected]
Client Id: {id}
Auth Uri: https://accounts.google.com/o/oauth2/auth
Token Uri: https://oauth2.googleapis.com/token
Auth Provider X509 Cert Url: https://www.googleapis.com/oauth2/v1/certs
Client X509 Cert Url: https://www.googleapis.com/robot/v1/metadata/x509

You can review your new credential by navigating to Security -> Credentials, then selecting your credential by name. You can also do this using the API via the List Credentials endpoint. You will need your credential ID for the next step.

Step 2: Attach the Credential to the Storage Bucket

Using the update storage endpoint, in the next steps you will attach your newly created credential to the storage bucket.

Set settings.credential_id to the ID of the credential you created in the previous step. This is the Credential ID in the UI or uuid in the API response to List Credentials.

Credential ID in the UI

Credential ID: d70d9fc4-8422-496c-98ce-f59aed82099d
Name: gcp service account credential
Description: A test credential for a gcp storage bucket
Cloud Provider Type: gcp_service_account_keys
HDX credential key: K1E4FB4F63DC14C9CAD3C1ED56D412FA0

Credential ID in the API Response

[
    {
        "name": "gcp service account credential",
        "type": "gcp_service_account_keys",
        "cloud": "gcp",
        "org": "ae5e3698-b13a-4f8f-ab82-ad2fa391a1a8",
        "description": "A test credential for a gcp storage bucket",
        "uuid": "d70d9fc4-8422-496c-98ce-f59aed82099*,
        ...

Append ?force_operation=true to the URL.

The following is an example cURL request attaching a credential to the default Google storage bucket:

curl --request PUT \
     --url https://{hdx-cluster-host}/config/v1/orgs/{org-id}/storages/{bucket-id}\?force_operation\=true \
     --header 'accept: application/json' \
     --header 'authorization: Bearer {token}' \
     --header 'content-type: application/json' \
     --data '
{
  "settings": {
    "bucket_path": "/",
    "is_default": true,
    "bucket_name": "{bucket-name}",
    "cloud": "gcp",
    "credential_id": "{credential-id-as-a-string}",
    "region": "{region}"
  },
  "name": "hdx_primary",
  "uuid": "{bucket-id}",
  "description": "The default google storage bucket"
}
'

Once you've completed these steps, your cluster can receive queries from the Hydrolix Connector for Apache Spark.

Deploy Hydrolix

📘
Prerequisite: Environment Variables

Configure and Deploy the Hydrolix Operator

Configure and Deploy a Hydrolix Cluster

📘
Manually Edit Configuration Files

Create Your DNS Record

Check Deployment Status

Enable IP Access and TLS

Login

(Hydrolix Connector for Apache Spark only) Add a Credential to the Storage Bucket

Step 1: Create a credential

Step 2: Attach the Credential to the Storage Bucket

📘Prerequisite: Environment Variables

Configure and Deploy the Hydrolix Operator

Configure and Deploy a Hydrolix Cluster

📘Manually Edit Configuration Files

Create Your DNS Record

Check Deployment Status

Enable IP Access and TLS

Login

(Hydrolix Connector for Apache Spark only) Add a Credential to the Storage Bucket

Step 1: Create a credential

Step 2: Attach the Credential to the Storage Bucket

📘
Prerequisite: Environment Variables

📘
Manually Edit Configuration Files