Configuration Options Reference
Hydrolix Cluster Specification
There are many different options available in hydrolixcluster.yaml
spec that control the behavior of your Hydrolix cluster.
Setting | Default Value | Definition | Example Values |
---|---|---|---|
acme_enabled | false | Whether or not the ACME certificate challenge server is enabled. | |
admin_email | "[email protected]" | Admin email that receives a password once the cluster starts. | |
aws_credentials_method | "static" | The credentials method used to authenticate with AWS. Deprecated: Use db_bucket_credentials_method . | |
aws_load_balancer_tags | "Environment=dev,Team=test" | Additional tags added to the load balancer of the traefik service when running in EKS. | |
aws_load_balancer_subnets | "service.beta.kubernetes.io/aws-load-balancer-subnets" | A list of subnets assigned to the load balancer of the traefik service when running in EKS. | |
azure_blob_storage_account | null | The storage account to use when accessing an Azure blob storage container. | |
basic_auth | [] | A list of Hydrolix services that should be protected with basic auth when accessed. For more information, see Enabling Access & TLS. This is incompatible with the unified_auth setting, which was on by default starting with the 4.6 release. | |
batch_peer_heartbeat_period | "5m" | How frequently a batch peer should heartbeat any task it's working on as a duration string | |
catalog_db_admin_user | "turbine" | When PostgreSQL is managed externally, the admin user of the Catalog. | |
catalog_db_admin_db | "turbine" | The default db of the admin user of the Catalog. | |
catalog_db_host | "postgres" | Hostname used by the external PostgreSQL instance serving as the Hydrolix instance Catalog. | |
catalog_db_port | null | The PostgreSQL catalog port. | |
catalog_intake_connections | "max_lifetime": "10m", "max_idle_time": "1m" | Catalog database connection pool settings for intake services. Available options: (1) 'max_lifetime' - the max duration that a connection can live before being recycled. (2) 'max_idle_time' - the max duration that a connection can be idle before being closed. (3) 'max' - the max number of connections that can be opened by each intake service that connects to the database. (4) 'min' - the minimum number of connections to keep open to the database. (5) 'check_writable' - if true, then a connection is opened to the database to ensure it can handle writes. | |
clickhouse_http_port | 8088 | The port to dedicate to the ClickHouse HTTP interface. | |
db_bucket_credentials_method | "web_identity" | The method Hydrolix uses to acquire credentials for connecting to cloud storage. | "static", "ec2_profile", "web_identity" |
db_bucket_endpoint | null | The endpoint url for S3 compatible object storage services. Not required if using AWS S3 or if db_bucket_url is provided. | |
db_bucket_name | null | The name of the bucket you would like Hydrolix to store data in. Not required if db_bucket_url is provided. | |
db_bucket_region | null | The region of the bucket you would like Hydrolix to store data in. Not required if it can be inferred from db_bucket_url . | "us-east-2", "us-central1" |
db_bucket_type | null | The object storage type of the bucket you would like Hydrolix to store data in. Not required if db_bucket_url is provided. | "gs", "s3" |
db_bucket_url | null | The URL of the cloud storage bucket you would like Hydrolix to store data in. | "gs://my-bucket", "s3://my-bucket", "https://my-bucket.s3.us-east-2.amazonaws.com", "https://s3.us-east-2.amazonaws.com/my-bucket", "https://my-bucket.us-southeast-1.linodeobjects.com", "https://minio.local/my-bucket" |
db_bucket_use_https | true | If true use https when connecting to the cloud storage service. Inferred from db_bucket_url if possible. | |
decay_batch_size | 5000 | Number of entries to fetch for each request to the catalog. | |
decay_enabled | true | Whether or not the Decay CronJob should run. | |
decay_max_deactivate_iterations | Maximum number of deactivation iterations to execute per table. | ||
decay_max_reap_iterations | Maximum number of reap iterations to execute per table. | ||
decay_reap_batch_size | 5000 | Number of entries to fetch for each request when locating entries for reaping. | |
decay_schedule | 0 0 * * * | CRON schedule for Decay CronJob. | |
default_query_pool | "query-peer" | A name for the default query pool. | |
disable_disk_cache | false | Whether or not to disable disk caching. | |
disable_traefik_http_port | false | Whether or not to disable the default HTTP port (80) used by the traefik service. | |
disable_traefik_https_port | false | Whether or not to disable the default HTTPS port (443) used by the traefik service. | |
disable_traefik_native_port | false | Whether or not to disable the default Native (TCP) Clickhouse port (8088) used by the traefik service. | |
disable_traefik_clickhouse_http_port | false | Whether or not to disable the default Clickhouse HTTP port used by the traefik service. Uses port 9000 when TLS is disabled; 9440 when TLS is enabled. | |
disable_vector_bucket_logging | false | Whether or not to disable bucket logging via vector. | |
disable_vector_kafka_logging | false | Whether or not to disable Kafka logging via vector. | |
disk_cache_entry_max_ttl_minutes | 360 | Max TTL for a cache disk entry. It is the longest period of time for which the LRU disk cache can save an entry before it expires. | |
dns_server_ip | The IP address of a DNS server used for performance critical purposes. | ||
dns_aws_max_resolution_attempts | The maximum number of attempts made by the DNS resolver for AWS and all S3-compatible storages in a given DNS refresh cycle. | ||
dns_aws_max_ttl_secs | The maximum DNS TTL for AWS and S3-compatible storages. It is the longest period of time the DNS resolver can cache a DNS record before it expires and needs to be refreshed. If 0, the DNS cache will strictly respect the TTL from the DNS query response. | ||
dns_azure_max_resolution_attempts | The maximum number of attempts made by the DNS resolver for Azure storages in a given DNS refresh cycle | ||
dns_azure_max_ttl_secs | The maximum DNS TTL for Azure storages. It is the longest period of time the DNS resolver can cache a DNS record before it expires and needs to be refreshed. If 0, the DNS cache will strictly respect the TTL from the DNS query response | ||
dns_gcs_max_resolution_attempts | The maximum number of attempts made by the DNS resolver for GCS storages in a given DNS refresh cycle | ||
dns_gcs_max_ttl_secs | The maximum DNS TTL for GCS storages. It is the longest period of time the DNS resolver can cache a DNS record before it expires and needs to be refreshed. If 0, the DNS cache will strictly respect the TTL from the DNS query response | ||
enable_manifest_cache | true | If true, query heads will cache manifests downloaded from the database bucket. | |
enable_query_auth | false | When enabled requests to the query service, URL paths starting with /query and TCP native queries require authentication.For more information, see Query Authentication. | |
enable_traefik_hsts | false | If set to true, Traefik will enforce HSTS on all its connections WARNING: This may lead to hard-to-diagnose persistent SSL failures if there are any errors in SSL configuration, and cannot be turned off later | |
enable_vector | null | Run vector to send kubernetes pod logs to json files in a bucket. Default inferred from the value of scale_off . | |
env | {} | Environment variables to set on all Kubernetes pods that are part of the Hydrolix cluster, useful to specify AWS Key for example. | |
force_container_user_root | false | Set the initial user for all containers to 0 (root). | |
http_connect_timeout_ms | 300 | Maximum time to wait for socket to complete connection to cloud storage. | |
http_port | None | The port for non-HTTPS connections. This setting is rarely needed sincehydrolix_url contains this information. | |
http_ssl_connect_timeout_ms | 1000 | Maximum time to wait for SSL/TLS handshake completion during connection to cloud storage. | |
http_response_timeout_ms | 1000 | Maximum time to wait for HTTP headers while reading from cloud storage. | |
http_read_timeout_ms | 1000 | Maximum time to wait, after socket read (recv system call), for cloud storage to make data available. | |
http_write_timeout_ms | 10000 | Maximum time to spend uploading a single partition to cloud storage. | |
https_port | None | The port for HTTPS connections. This setting is rarely needed sincehydrolix_url contains this information. | |
hydrolix_name | "hdx" | The name you would like to assign your Hydrolix cluster. | |
hydrolix_url | null | The URL you would like to use to access your Hydrolix cluster. | "https://my-host.hydrolix.live", "https://my-host.mydomain.com", "http://my-host.local" |
intake_head_accept_data_timeout | 0s | Configures the maximum duration that aintake-head pod will wait for a request to be accepted into the partition creation pipeline. If the timeout is reached, the request will be rejected with a 429 status code response. If not configured or set to 0, intake-head pods will not timeout. | |
intake_head_index_backlog_enabled | false | Enables the indexing backlog feature: rather than blocking intake requests until storage is ready, the intake-head will keep a backlog of requests when storage is overwhelmed. If enabled, the newest data received will indexed ahead of older data when the backlog grows. See other intake_head_index_backlog_* options below. | |
intake_head_index_backlog_max_accept_batch_size | 50 | Controls the maximum number of buckets accepted from ingestion and added to the backlog at a time. Only applicable if intake_head_index_backlog_enabled is true . | |
intake_head_index_backlog_max_mb | 256 | Controls the maximum size in MB that the indexing backlog on intake-head is allowed to grow before either dropping data or slowing new entries. Only applicable if intake_head_index_backlog_enabled is true . | |
intake_head_index_backlog_purge_concurrency | 1 | Controls the number of workers used to purge buckets from the intake-head backlog when the max size is breached. Only applicable if intake_head_index_backlog_enabled is true . | |
intake_head_max_outstanding_requests | 0 | Configures the maximum number of requests that an intake-head pod will allow to be outstanding and in process before rejecting new requests with a 429 status code response. If not configured or set to 0, intake-head pods will never reject new requests. | |
ip_allowlist | ["127.0.0.1/32"] | A list of CIDR ranges that should be allowed to connect to the Hydrolix cluster load balancer. For more information, see Enabling Access & TLS. | |
job_purge_age | 2160h | The age in days of a job after which it should be deleted from PostgreSQL Batch / Alter. For more information, see Batch Ingest. | |
job_purge_enabled | true | Whether or not the Job Purge CronJob should run. | |
job_purge_schedule | 0 2 * * * | CRON schedule for Job Purge CronJob. | |
kafka_careful_mode | false | Whether or not to rate-limit Kafka reads to 10 messages per read call and limit concurrency to 1 reader. By default, Hydrolix reads 50 messages per call across 32 concurrent readers. | |
kafka_tls_ca | "" | A CA certificate used by the kafka_peer to authenticate with a Kafka server. For more information, see Ingest via Kafka. | |
kafka_tls_cert | "" | The PEM format certificate the kafka_peer uses to authenticate with a Kafka server. For more information, see Ingest via Kafka. | |
kafka_tls_key | null | The PEM format key the kafka_peer uses to authenticate with a Kafka server. For more information, see Ingest via Kafka. | |
kinesis_coordinate_period | 10s | For Kinesis sources, how often the coordination process runs which distributes shard consumption amongst available peers. For more information, see ingest via Kinesis. | |
kubernetes_premium_storage_class | null | The storage class to use with persistent volumes created in Kubernetes for parts of a Hydrolix cluster where throughput is most critical. If set to default-kubernetes-storage-class , it will use your cloud provider's default storage class. | |
kubernetes_profile | generic | Use default settings appropriate to this type of Kubernetes deployment. | "gke", "eks" |
kubernetes_storage_class | null | The storage class to use with persistent volumes created in Kubernetes as part of a Hydrolix cluster. If set to default-kubernetes-storage-class , it will use your cloud provider's default storage class. | |
kubernetes_version | "1.22" | Make manifests compatible with this version of Kubernetes. | |
limit_cpu | true | If set, container CPU limits are set to match CPU requests in Kubernetes.. | |
logs_http_remote_table | "hydro.logs" | An existing Hydrolix <project.table> where the log data should be stored within a remote cluster's object store. | "my_project.my_table" |
logs_http_remote_transform | "megaTransform" | The transform schema to use for log data ingested into the remote cluster's object store. | "another_transform_name" |
logs_http_table | "hydro.logs" | An existing Hydrolix <project.table> where the log data should be stored in the local cluster's object store. | "my_project.my_table" |
logs_kafka_bootstrap_servers | ["redpanda"] | A comma separated list of Kafka bootstrap servers to send logs to. | |
logs_kafka_topic | "logs" | Kafka topic that Hydrolix sends logs to. | |
logs_sink_local_url | "http://stream-head:8089/ingest/event" | The full URI to send local HTTP requests containing log data. | |
logs_sink_remote_url | "" | The full URI to send remote HTTP requests containing log data. | "https://{remote_hdx_host}/ingest/event" "https://{remote_hdx_host}/pool/stream-head" "https://{remote_hdx_host}/pool/{custom-ingest-pool}/ingest/event" |
logs_http_transform | "megaTransform" | The transform schema to use for log data ingested into the local Hydrolix cluster's object store. | "another_transform_name" |
logs_sink_remote_auth_enabled | false | When enabled, remote HTTP will use basic authentication from a curated secret. Note that enabling this option requires providing basic auth via the environment variables LOGS_HTTP_AUTH_USERNAME and LOGS_HTTP_AUTH_PASSWORD . | true, false |
logs_sink_type | "kafka" | This determines the sink to send log data to on its way to an object store. The value "kafka" actually places the log data on the cluster's redpanda queue, not a kafka queue. | "kafka", "http" |
log_level | Allows modification of logging levels for individual Hydrolix services. See Log Level for more. | ||
log_vacuum_concurrency | 8 | Number of concurrent log deletion processes. | |
log_vacuum_dry_run | false | If true, LogVacuum will only log it's intentions and take no action. | |
log_vacuum_enabled | true | Whether or not the Log Vacuum CronJob should run. | |
log_vacuum_max_age | 2160h | Maximum age of a log file before it is removed from cloud storage expressed as a duration string. | |
log_vacuum_schedule | 0 4 * * * | CRON schedule for Log Vacuum CronJob. | |
max_http_retries | 3 | The number of retries Hydrolix attempts for a failed HTTP request. These retries happen in addition to the original request, so the total maximum number of requests for value N is always N+1 . | |
merge_cleanup_batch_size | 5000 | Number of entries to fetch for each request to the catalog. | |
merge_cleanup_delay | 15m | How long before a merged partition should be deleted expressed as a duration string. | |
merge_cleanup_enabled | true | Whether or not the Merge Clean-up CronJob should run. | |
merge_cleanup_schedule | */5 * * * * | CRON schedule for Merge Clean-up CronJob. | |
merge_head_batch_size | 10000 | Number of records to pull from the catalog per request by the Merge head. | |
merge_interval | "15s" | The time the merge process waits between checking for Mergeable partitions. | |
merge_max_partitions_per_candidate | 100 | The maximum number of partitions per Merge candidate. | |
merge_min_mb | 1024 | Size in megabytes of the smallest merge tier. All other merge tiers are multiples of this value. | |
monitor_ingest | false | Runs a pod which sends a heartbeat event through ingest to the hydro.montor table once per second. Tests entire intake->storage->query path when paired with an alert on that table. | |
monitor_ingest_timeout | 1 | HTTP timeout (in seconds) for POSTs from monitor_ingest | |
native_port | 9000 | The port on which to serve the ClickHouse plaintext native protocol. | |
native_tls_port | 9440 | The port on which to serve the ClickHouse TLS native protocol. | |
oom_detection | Configuration options for detecting indexing OOM scenarios and retry with smaller data sizes if possible for services that perform ingest. Outer keys are names of the ingest services. The supported services are intake-head , kafka-peer , kinesis-peer , and akamai-siem-peer . Available keys under each service are k8s_oom_kill_detection_enabled , k8s_oom_kill_detection_max_attempts , circuit_break_oom_detection_enabled , and preemptive_splitting_enabled . See this page for details. | ||
otel_endpoint | null | Send OTLP data to the HTTP server at this URL. | |
overcommit | false | When true, turn off memory reservations and limits for Kubernetes pods. Useful when running on a single node Kubernetes cluster with constrained resources. | |
partition_cleaner_grace_period | 24h | Sets a minimum age for a partition before it's deactivated or deleted. Expressed as a duration string. | |
partition_vacuum_batch_size | 10000 | Number of entries to fetch from object storage or the catalog on each request. | |
partition_vacuum_concurrency | 5 | Number of concurrent vacuum operations to run. Each vacuum operation covers a single table at a time. | |
partition_vacuum_dry_run | true | If true, Partition Vacuum will only log its intentions and take no action. | |
partition_vacuum_enabled | true | Whether or not the Partition Vacuum CronJob should run. | |
partition_vacuum_grace_period | 24h | Minimum age of a partition before it is considered for deactivation or deletion expressed as a duration string. | |
partition_vacuum_schedule | 0 1 * * * | Cron schedule for Partition Vacuum CronJob. | |
pg_ssl_mode | "disable" | Determines whether and with what priority an SSL connection will be negotiated when connecting to a Postgres server. See here for more details. | "disable", "require", "verify-ca", "verify-full" |
prune_locks_enabled | true | Whether or not the Prune Locks CronJob should run. | |
prune_locks_grace_period | "24h" | Minimum age of a lock before it is considered for removal expressed as a duration string. | |
prune_locks_schedule | 30 0 * * * | CRON schedule for Prune Locks CronJob. | |
prometheus_remote_write_url | None | URL of external Prometheus server | |
prometheus_remote_write_username | "hdx" | User name for external Prometheus server authentication | |
prometheus_retention_time | When to remove old prometheus data. Example: 15d | ||
prometheus_retention_size | The maximum number of bytes of prometheus data to retain. Units supported: B, KB, MB, GB, TB, PB, EB | ||
prometheus_scrape_interval | "15s" | This sets Prometheus' default scrape_interval value, setting the overall frequency of metric scraping. | |
pools | null | A list of dictionaries describing pools to deploy as part of the Hydrolix cluster. See here for more details. | |
query_peer_liveness_check_path | (appropriate path) | The HTTP path used to configure a kubernetes liveness check for query-peers. Set to 'none' to disable. | |
query_readiness_initial_delay | 0 | The time in seconds to wait before starting query readiness checks. | |
registry | "public.ecr.aws/l2i3s2a2" | A docker registry to pull Hydrolix containers from. | |
rejects_vacuum_dry_run | false | If enabled, the Rejects Vacuum CronJob will not delete files, but instead log files it would have deleted. | |
rejects_vacuum_enabled | true | Whether or not the Rejects Vacuum CronJob should run. | |
rejects_vacuum_max_age | 168h | How old a rejects file should be before deleted, expressed as a duration string (e.g. 1h5m4s). | |
rejects_vacuum_period_mins | 180 | How often (in minutes) to run the Vacuum Rejects job. | |
rejects_vacuum_schedule | 0 0 * * * | CRON schedule for Reject Vacuum CronJob. | |
s3_endpoint | "" | The endpoint used to communicate with S3. | |
sample_data_url | "" | The storage bucket URL used to load sample data. | |
scale | null | A list of dictionaries describing overrides for scale related configuration for Hydrolix services For more information, see Scaling your Cluster. | |
scale_off | false | When true, override all deployment and statefulset replica counts with a value of 0 and disable vector Scale your Hydrolix deployment to 0. For more information, see Scaling your Cluster. | |
scale_profile | "eval" | Selects from a set of predefined defaults for scale. | |
sdk_timeout_sec | 300 | How many seconds Merge should be given to run before it is killed. | |
silence_linode_alerts | false | Disables email notifications triggered by violations of Notification Thresholds for a Linode Compute Instance. For more information, see Linode's Resource Usage Email Alerts. | |
skip_init_turbine_api | false | Skips running database migrations in the init-turbine-api job. Set to true when running multiple clusters with a shared database. | |
stale_job_monitor_batch_size | 300 | How many jobs to probe in a single request. | |
stale_job_monitor_enabled | true | Whether or not the Statel Job Monitor CronJob should run. | |
stale_job_monitor_limit | 3000 | How many jobs in total StaleJob will process per cycle. | |
str_dict_enabled | true | Enable/disable multi-threaded string dictionary decoding. | |
str_dict_nr_threads | 8 | Sets the maximum number of concurrent vCPU used for decoding. | |
str_dict_min_dict_size | 32768 | Controls the number of entries in each string dictionary block. | |
stream_concurrency_limit | null | The number of concurrent stream requests per CPU, allocated across all pods, beyond which traefik will return 429 busy error responses. If not set or set to null, no limit is enforced. | |
stream_load_balancer_algorithm | round-robin | The load balancer algorithm to use with stream-head and intake-head services. | round-robin , least-connections-p2c |
stream_partition_block | 6 | The number of partitions to use on a non-default redpanda stream topic per TB/day of usage. | |
stream_partition_count | 50 | The number of partitions for the internal redpanda topic used by the Stream Ingest service. | |
stream_replication_factor | 3 | The replication factor for the internal Redpanda topic used by the Stream Ingest service. | |
targeting | null | Specify target node where the hydrolix resources can run in the k8s cluster. See here for more details. | |
task_monitor_enabled | true | Whether or not the Task Monitor CronJob should run. | |
task_monitor_heartbeat_timeout | 600 | How old a tasks heartbeat should be (in seconds) before it is timed out. | |
task_monitor_schedule | */2 * * * * | CRON schedule for Task Monitor. | |
task_monitor_start_timeout | 21600 | How old a ready task should be (in seconds) before it is considered lost and timed out. | |
traefik_hsts_expire_time | 315360000 | Expiration time for HSTS caching in seconds. | |
traefik_keep_alive_max_time | 26 | The number of seconds a client HTTP connection can be reused before recieving a Connection: close response from the server. Zero means no limit. | |
traefik_service_annotations | null | Creates traefik annotations as a dictionary. Allows additional traefik service annotations. | annotations: external-dns.alpha.kubernetes.io/hostname: nginx.example.com |
traefik_service_type | "public_lb", "private_lb", "node_port", "cluster_ip" | Specifies the type of Load-balancer to use for the cluster. Default is to use a public load-balancer. If private_lb is chosen and the cluster is running on eks or gke an internal load-balancer will be provisioned. node_port can be used for a customer provided ELB or cluster_ip to not set up anything related to exposing the service outside of the Kubernetes cluster itself. | |
turbine_api_init_pools | false | If enabled, the turbine-api component initializes some pools. | |
unified_auth | true | Enforce centralized API authentication for ingest endpoints and other endpoints served by traefik. Unified Authentication has been true by default since the 4.6 release. It is incompatible with basic_auth. | |
use_https_with_s3 | true | Whether or not to use HTTPS when communicating with S3. | |
use_hydrolix_dns_resolver | true | If true, use Hydrolix DNS resolver. If false, use system resolver. | |
use_crunchydata_postgres | false | Use a postgres managed by Crunchydata's postgres operator instead of the default dev mode postgres. | |
vector_bucket | null | Bucket where Vector should save JSON format pod logs. | |
vector_bucket_path | "logs" | Prefix under which Vector saves pod logs. |
Kubernetes Secrets
Some Hydrolix settings requires sensitive data, including passwords or secrets:
Variable | Default Value | Description |
---|---|---|
ROOT_DB_PASSWORD | null | The admin password of the PostgreSQL server used as the Hydrolix Catalog. |
AWS_SECRET_ACCESS_KEY | null | AWS service account secret key used to connect to AWS and external AWS services. |
AZURE_ACCOUNT_KEY | null | Azure secret key used to connect to Azure blob storage. |
TRAEFIK_PASSWORD | random | Default password when basic_auth is enabled. For more information, see IP Access and TLS. |
Updated 5 days ago