Configuration Options

Hydrolix cluster specification

There are lots of different options available in hydrolixcluster spec.
Here's the list of available settings in the spec and their definition.

admin_email"[email protected]"Admin email receiving the password once the cluster is up
basic_auth[]A list of Hydrolix services that should be protected with basic auth when accessed. See here for more details.
batch_head_concurrency8The number of concurrent consumers the batch head should use for LISTING tasks
batch_peer_threads1The number of concurrent threads the batch-peer should use to process data
catalog_db_admin_user"turbine"The admin user of the Postgres server where Hydrolix metadata is stored
catalog_db_admin_db"turbine"The default db of the admin user of the Postgres server where Hydrolix metadata is stored
catalog_db_host"postgres"The Postgres server where Hydrolix metadata is stored
db_bucket_urlnullThe URL of the cloud storage bucket you would like Hydrolix to store data in.
db_bucket_credentials_method"web_identity"The method Hydrolix uses to acquire credentials for connecting to cloud storage.
examples=["static", "instance_profile", "web_identity"]
db_bucket_endpointnullThe endpoint url for S3 compatible object storage services.
Not required if using AWS S3 or if db_bucket_url is provided
db_bucket_namenullThe name of the bucket you would like Hydrolix to store data in.
Not required if db_bucket_url is provided
db_bucket_regionnullThe region of the bucket you would like Hydrolix to store data in.
Not required if it can be inferred from db_bucket_url
examples=["us-east-2", "us-central1"]
db_bucket_typenullThe object storage type of the bucket you would like Hydrolix to store data in.
Not required if db_bucket_url is provided.
examples=["gs", "s3"]
db_bucket_use_httpstrueIf true use https when connecting to the cloud storage service. Inferred from db_bucket_url if possible
default_query_pool"query-peer"A name for the default query pool
enable_manifest_cachetrueIf true, query heads will cache manifests downloaded from the database bucket
enable_query_authfalseWhen enabled requests to the query service, url paths starting with /query will or TCP native will require authentication.
See here for more details.
enable_vectornullRun vector to send kubernetes pod logs to json files in a bucket. Default inferred from the value of scale_off
env{}Environment variables to set on all Kubernetes pods that are part of the Hydrolix cluster, useful to specify AWS Key for example
heartbeat_timeout600How old a tasks heartbeat should be (in seconds) before it is timed out.
Task are created by batch, alter job
hydrolix_name"hdx"The name you would like to assign your Hydrolix cluster
hydrolix_urlnullThe url you would like to use to access your Hydrolix cluster
ip_allowlist[""]A list of CIDR ranges that should be allowed to connect to the Hydrolix cluster load balancer
job_purge_age7The age in days of a job after which it should be deleted from postgres (when job is in terminal state)
job_purge_period120How often (in minutes) the JobPurge task should run
kafka_tls_ca""A CA certificate used by the kafka_peer to authenticate Kafka servers it connects to
kafka_tls_cert""The PEM format certificate the kafka_peer will use to authenticate itself to a Kafka server
kafka_tls_keynullThe PEM format key the kafka_peer will use to authenticate itself to a Kafka server
kubernetes_premium_storage_classnullThe storage class to use with persistent volumes created in Kubernetes for parts of a Hydrolix cluster where throughput is most critical
kubernetes_profilegenericUse default settings appropriate to this type of Kubernetes deployment
examples=["gke", "eks"]
kubernetes_storage_classnullThe storage class to use with persistent volumes created in Kubernetes as part of a Hydrolix cluster
logs_kafka_bootstrap_servers"redpanda"A comma separated list of kafka bootstrap servers to send logs to
logs_kafka_topic"logs"A Kafka topic to send logs to
merge_head_batch_size10000Number of records to pull from the catalog per request by the merge head
merge_interval"15s"The time the merge process waits between checking for mergeable partitions
merge_max_partitions_per_candidate100The maximum number of partitions per merge candidate
merge_queue_limit500Maximum number of pending merge jobs
otel_endpointnullSend otlp data to the http server at this URL
overcommitfalseWhen true, turn off memory reservations and limits for Kubernetes pods. Useful when running on a single node Kubernetes cluster with constrained resources
pg_ssl_mode"disable"Determines whether and with what priority an SSL connection will be negotiated when connecting to a Postgres server. See
examples=["disable", "require", "verify-ca", "verify-full"]
poolsnullA list of dictionaries describing pools to deploy as part of the Hydrolix cluster
See here for more details.
registry""A docker registry to pull Hydrolix containers from
sample_data_url""The storage bucket url to use to load sample data
scalenullA list of dictionaries describing overrides for scale related configuration for Hydrolix services
See here for more details.
scale_offfalseWhen true, override all deployment and statefulset replica counts with a value of 0 and disable vector
Scale your Hydrolix deployment to 0
sdk_timeout_sec300How many seconds the Merge SDK should be given to run before it is killed
stale_job_batch_size300How many jobs to probe in a single request
stale_job_limit3000How many jobs in total StaleJob will process per cycle
stale_job_period120How often (in minutes) the StaleJobMonitor should run
str_dict_enabledtrueEnable/disable multi-threaded string dictionary decoding
str_dict_nr_threads8Sets the maximum number of concurrent vCPU used for decoding
str_dict_min_dict_size32768Controls the number of entries in each string dictionary block
stream_partition_count50The number of partitions for the internal redpanda topic used by the Stream service
stream_replication_factor3The replication factor for the internal Redpanda topic used by the Stream service
task_start_timeout21600How old a ready task should be (in seconds) before it is considered lost and timed out.
vacuum_rejects_dry_runfalseIf enabled, the Vacuum Rejects job will not delete files, but instead log its intentions
vacuum_rejects_max_age"168h"How old a rejects file should be before deleted, expressed as a duration string (e.g. 1h5m4s}
vacuum_rejects_period_mins180How often (in minutes) to run the Vacuum Rejects job
vector_bucketnullBucket where Vector should save json format pod logs
vector_bucket_path"logs"Prefix under which vector will save pod logs

Kubernetes Secret variable

Some Hydrolix settings requires to specify password or secret, here's the list of variable we support in Kubernetes secret

ROOT_DB_PASSWORDnullThe admin password of the Postgres server where Hydrolix metadata is stored
AWS_SECRET_ACCESS_KEYnullAWS secret key used to connect to AWS service
TRAEFIK_PASSWORDrandomDefault password when basic_auth is enabled
See here for more details.