Configuration Options

Hydrolix cluster specification

There are lots of different options available in hydrolixcluster spec.
Here's the list of available settings in the spec and their definition.

SettingValueDefinitionReference
admin_email"[email protected]"Admin email receiving the password once the cluster is up
aws_load_balancer_tags"Environment=dev,Team=test"Additional tags to be added to the load balancer of the traefik service when running in EKS
azure_blob_storage_accountnullThe storage account to use when accessing an Azure blob storage container.
basic_auth[]A list of Hydrolix services that should be protected with basic auth when accessed. See here for more details.Enabling Access & TLS
batch_head_concurrency8The number of concurrent consumers the batch head should use for LISTING tasksBatch Ingest
batch_peer_threads1The number of concurrent threads the batch-peer should use to process data.Batch Ingest
batch_peer_heartbeat_period5mHow frequently a batch peer should heartbeat any task it's working on as a duration string
catalog_db_admin_user"turbine"Used when external Postgres is utilized. This is the admin user of the Postgres server where Hydrolix metadata is stored.GKE

EKS
catalog_db_admin_db"turbine"The default db of the admin user of the Postgres server where Hydrolix metadata is storedGKE

EKS
catalog_db_host"postgres"The Postgres server where Hydrolix metadata is storedGKE

EKS
db_bucket_credentials_method"web_identity"The method Hydrolix uses to acquire credentials for connecting to cloud storage.
examples=["static", "ec2_profile", "web_identity"]
db_bucket_endpointnullThe endpoint url for S3 compatible object storage services.
Not required if using AWS S3 or if db_bucket_url is provided
db_bucket_namenullThe name of the bucket you would like Hydrolix to store data in.
Not required if db_bucket_url is provided
db_bucket_regionnullThe region of the bucket you would like Hydrolix to store data in.
Not required if it can be inferred from db_bucket_url
examples=["us-east-2", "us-central1"]
db_bucket_typenullThe object storage type of the bucket you would like Hydrolix to store data in.
Not required if db_bucket_url is provided.
examples=["gs", "s3"]
db_bucket_urlnullThe URL of the cloud storage bucket you would like Hydrolix to store data in.
examples=[
"gs://my-bucket",
"s3://my-bucket",
"https://my-bucket.s3.us-east-2.amazonaws.com",
"https://s3.us-east-2.amazonaws.com/my-bucket",
"https://my-bucket.us-southeast-1.linodeobjects.com",
"https://minio.local/my-bucket",
]
Preparing your GKE Cluster

Preparing your EKS Cluster

Preparing your LKE Cluster
db_bucket_use_httpstrueIf true use https when connecting to the cloud storage service. Inferred from db_bucket_url if possible
decay_enabledtrueWhether or not the Decay CronJob should run
decay_schedule0 0 * * *CRON schedule for Decay CronJob
decay_batch_size5000Number of entries to fetch for each request to the catalog
default_query_pool"query-peer"A name for the default query pool
dns_server_ipThe IP address of a DNS server used for performance critical purposes.
enable_manifest_cachetrueIf true, query heads will cache manifests downloaded from the database bucket
enable_query_authfalseWhen enabled requests to the query service, url paths starting with /query will or TCP native will require authentication.
See here for more details.
Query Authentication
enable_vectornullRun vector to send kubernetes pod logs to json files in a bucket. Default inferred from the value of scale_off
env{}Environment variables to set on all Kubernetes pods that are part of the Hydrolix cluster, useful to specify AWS Key for example
force_container_user_rootfalseSet the initial user for all containers to 0 (root).
task_monitor_heartbeat_timeout600How old a tasks heartbeat should be (in seconds) before it is timed out.
Task are created by batch, alter job
hydrolix_name"hdx"The name you would like to assign your Hydrolix cluster
hydrolix_urlnullThe url you would like to use to access your Hydrolix cluster
examples=[
"https://my-host.hydrolix.live",
"https://my-host.mydomain.com",
"http://my-host.local",
]
ip_allowlist["127.0.0.1/32"]A list of CIDR ranges that should be allowed to connect to the Hydrolix cluster load balancerEnabling Access & TLS
job_purge_enabledtrueWhether or not the Job Purge CronJob should run
job_purge_schedule0 2 * * *CRON schedule for Job Purge CronJob
job_purge_age2160hThe age in days of a job after which it should be deleted from postgres Batch / AlterBatch Ingest
kafka_tls_ca""A CA certificate used by the kafka_peer to authenticate Kafka servers it connects tovia Kafka
kafka_tls_cert""The PEM format certificate the kafka_peer will use to authenticate itself to a Kafka servervia Kafka
kafka_tls_keynullThe PEM format key the kafka_peer will use to authenticate itself to a Kafka servervia Kafka
kubernetes_premium_storage_classnullThe storage class to use with persistent volumes created in Kubernetes for parts of a Hydrolix cluster where throughput is most critical
kubernetes_profilegenericUse default settings appropriate to this type of Kubernetes deployment
examples=["gke", "eks"]
kubernetes_storage_classnullThe storage class to use with persistent volumes created in Kubernetes as part of a Hydrolix cluster
kubernetes_version"1.22"Make manifests compatible with this version of Kubernetes
limit_cputrueIf set, container cpu limits are set to match cpu requests in Kubernetes.
logs_kafka_bootstrap_servers"redpanda"A comma separated list of kafka bootstrap servers to send logs to
logs_kafka_topic"logs"A Kafka topic to send logs to
log_vacuum_enabledtrueWhether or not the Log Vacuum CronJob should run
log_vacuum_schedule0 4 * * *CRON schedule for Log Vacuum CronJob
log_vacuum_max_age2160hMaximum age of a log file before it is removed from cloud storage expressed as a duration string
log_vacuum_concurrency8Number of concurrent log deletion processes
log_vacuum_dry_runfalseIf true, LogVacuum will only log it's intentions and take no action
merge_head_batch_size10000Number of records to pull from the catalog per request by the merge headMerge
merge_interval"15s"The time the merge process waits between checking for mergeable partitionsMerge
merge_max_partitions_per_candidate100The maximum number of partitions per merge candidateMerge
merge_queue_limit500Maximum number of pending merge jobsMerge
merge_cleanup_enabledtrueWhether or not the Merge Clean-up CronJob should run
merge_cleanup_schedule*/5 * * * *CRON schedule for Merge Clean-up CronJob
merge_cleanup_delay15mHow long before a merged partition should be deleted expressed as a duration string
merge_cleanup_batch_size5000Number of entries to fetch for each request to the catalog
otel_endpointnullSend otlp data to the http server at this URL
overcommitfalseWhen true, turn off memory reservations and limits for Kubernetes pods. Useful when running on a single node Kubernetes cluster with constrained resources
partition_vacuum_enabledtrueWhether or not the Partition Vacuum CronJob should run
partition_vacuum_schedule0 1 * * *CRON schedule for Partition Vacuum CronJob
partition_vacuum_dry_runtrueIf true, Partition Vacuum will only log it's intentions and take no action
partition_vacuum_batch_size10000Number of entries to fetch from partition providers on each request
partition_vacuum_grace_period24hMinimum age of a partition before it is considered for deactivation or deletion expressed as a duration string
pg_ssl_mode"disable"Determines whether and with what priority an SSL connection will be negotiated when connecting to a Postgres server. See
examples=["disable", "require", "verify-ca", "verify-full"]
prune_locks_enabledtrueWhether or not the Prune Locks CronJob should run.
prune_locks_grace_period"24h"Minimum age of a lock before it is considered for removal expressed as a duration string.
prune_locks_schedule30 0 * * *CRON schedule for Prune Locks CronJob
poolsnullA list of dictionaries describing pools to deploy as part of the Hydrolix cluster
See here for more details.
registry"public.ecr.aws/l2i3s2a2"A docker registry to pull Hydrolix containers from
rejects_vacuum_enabledtrueWhether or not the Rejects Vacuum CronJob should run
rejects_vacuum_schedule0 0 * * *CRON schedule for Reject Vacuum CronJob
rejects_vacuum_dry_runfalseIf enabled, the Rejects Vacuum CronJob will not delete files, but instead log its intentions
rejects_vacuum_max_age168hHow old a rejects file should be before deleted, expressed as a duration string (e.g. 1h5m4s)
sample_data_url""The storage bucket url to use to load sample data
scalenullA list of dictionaries describing overrides for scale related configuration for Hydrolix services
See here for more details.
Scaling your Cluster
scale_offfalseWhen true, override all deployment and statefulset replica counts with a value of 0 and disable vector
Scale your Hydrolix deployment to 0
Scaling your Cluster
scale_profile"eval"Selects from a set of predefined defaults for scale
sdk_timeout_sec300How many seconds the Merge SDK should be given to run before it is killedMerge
stale_job_monitor_batch_size300How many jobs to probe in a single request
stale_job_monitor_enabledtrueWhether or not the Statel Job Monitor CronJob should run.
stale_job_monitor_limit3000How many jobs in total StaleJob will process per cycle
stale_job_monitor_period120How often (in minutes) the StaleJobMonitor should run
str_dict_enabledtrueEnable/disable multi-threaded string dictionary decoding
str_dict_nr_threads8Sets the maximum number of concurrent vCPU used for decoding
str_dict_min_dict_size32768Controls the number of entries in each string dictionary block
stream_partition_block6The number of partitions to use on a non-default redpanda stream topic per TB/day of usage.
stream_partition_count50The number of partitions for the internal redpanda topic used by the Stream serviceStream Ingest
stream_replication_factor3The replication factor for the internal Redpanda topic used by the Stream serviceStream Ingest
targetingnullSpecify target node where the hydrolix resources can run in the k8s cluster.
See here for more details.
task_monitor_enabledtrueWhether or not the Task Monitor CronJob should run.
task_monitor_heartbeat_timeout600How old a tasks heartbeat should be (in seconds) before it is timed out.
task_monitor_start_timeout21600How old a ready task should be (in seconds) before it is considered lost and timed out.
task_monitor_schedule*/2 * * * *CRON schedule for Task Monitor.
traefik_service_type"public_lb",
"private_lb",
"node_port",
"cluster_ip"
Specifies the type of Load-balancer to use for the cluster. Default is to use a public load-balancer. If private_lb is chosen and the cluster is running on eks or gke an internal load-balancer will be provisioned. node_port can be used for a customer provided ELB or cluster_ip to not set up anything related to exposing the service outside of the Kubernetes cluster itself.
turbine_api_init_poolsfalseIf enabled, the turbine-api component initializes some pools.
rejects_vacuum_dry_runfalseIf enabled, the Vacuum Rejects job will not delete files, but instead log its intentions
rejects_vacuum_max_age"168h"How old a rejects file should be before deleted, expressed as a duration string (e.g. 1h5m4s}
rejects_vacuum_period_mins180How often (in minutes) to run the Vacuum Rejects job
vector_bucketnullBucket where Vector should save json format pod logs
vector_bucket_path"logs"Prefix under which vector will save pod logs

Kubernetes Secret variable

Some Hydrolix settings requires to specify password or secret, here's the list of variable we support in Kubernetes secret

VariableValueDescription
ROOT_DB_PASSWORDnullThe admin password of the Postgres server where Hydrolix metadata is stored
AWS_SECRET_ACCESS_KEYnullAWS secret key used to connect to AWS an externa aws service. Note Service accounts are used in deployments
AZURE_ACCOUNT_KEYnullAzure secret key used to connect to Azure blob storage
TRAEFIK_PASSWORDrandomDefault password when basic_auth is enabled
See here for more details.