Pod - Query Peer

Overview⚓︎

Most clusters run multiple instances of the query peer application.

Each query peer registers with the Zookeeper coordinator and awaits work assignments from the query head.

The query peer interacts with Storage and the partition data to retrieve raw data in answering a query.

Scaling choices depends on query load, data size, and expected responsiveness.

Clusters can additionally be configured with Query pools, a feature allow separation of different classes of queries onto distinct CPU or memory resources.

Each query peer awaits instructions from a query head. When a query peer receives an assignment, it fetches and reads the raw Hydrolix partition data from storage, executes the query against the partitions, collects intermediate results and returns the results to the query head.

Responsibilities⚓︎

At startup, each query peer application

Verifies that the storage bucket is available TODO: all storage buckets?
Listens on its network sockets for requests from a query head
Registers its name, IP, pool membership with zookeeper
Reloads the cluster config.json data file from primary storage every 5 seconds
Awaits assignments from query head

Upon receiving an incoming query

Parses the incoming SQL query written by the query head and generates a plan
Downloads from storage and caches locally the partition manifest, index, and column dictionaries required for the query
Executes the filter clause of the query WHERE clause using index to identify blocks of interest
Issues byte range requests for blocks of interest to the storage systems
Accumulates and returns intermediate results to query-head

Deployment and scale⚓︎

	comment
Type	ReplicaSet in a Deployment
Replicas	default 3, minimum 1
CPU	14
Memory	36 GiB
Storage	default 5GiB, recommend 50 GiB

Typical production scale shown here.

TODO: Is there a memory and CPU balance ratio?

On query peers, memory is used for accumulating query results.

Storage inside the cluster is ephemeral. It provides local disk for work operations, query caching when enabled, and caching of partition files, especially the manifest, index, and column dictionaries for each partition.

Scaling considerations⚓︎

Parallelization of data retrieval is a core design principle of the Hydrolix system. The query peers are the key support for providing this parallelization.

Horizontal scaling by adding query peers can improve query response times at the cost of idle CPU other times. Maximum node count can be determined from the Kubernetes cluster. Hydrolix places no maximum on peer count and will use all available peers when parallelizing.

Vertical scaling query peer pods by requesting more CPU, memory, or storage or increasing the node size can help if query peers frequently experience out-of-memory (OOM) conditions. If a query peer has enough memory to read and process hdx_query_max_concurrent_partitions multiplied by hdx_query_max_streams, it is less likely to hit OOM conditions.

Autoscaling is not generally useful for the query system. Queries are a bursty workload and usually complete very quickly. Load-based signals are not a good indicator for autoscaling. Since scaling response time is measured in tens of seconds, queries have long failed or completed before any scaling changes can help.

Containers⚓︎

wait-for-zookeeper is an InitContainer
query-peer runs the turbine_server in

Important files⚓︎

/usr/bin/turbine_server - executable
/var/lib/turbine/turbine.ini - configuration file

Startup dependencies⚓︎

Pod Zookeeper

Runtime dependencies⚓︎

Registers and continually sends a heartbeat to Zookeeper
Query Head
Storage

Runtime provides⚓︎

Network listeners⚓︎

External⚓︎

None.

Cluster internal⚓︎

tcp/9000 - ClickHouse native protocol
tcp/8088 - ClickHouse HTTP

These ports are reachable only from inside the cluster, for example the tooling pod.

HTTP APIs⚓︎

tcp/8088, Metrics /metrics

Configuration⚓︎

The Hydrolix operator conveys cluster settings, tunables and secrets into the container.

Per-query configuration is assembled by the query head and passed to each query-peer over intra-cluster communication.

Query options⚓︎

The query head handles resolution of query options according to Query Options Precedence. The query head rewrites the original query with all relevant options as a ClickHouse table function query sent to the query peer.

Hydrolix Config API⚓︎

Query peer instances do not read the

Cluster⚓︎

CONFIG_API_ENDPOINT - in-cluster http://turbine-api:3000/config
CACHE_SPACE_CAPACITY - from storage scaling
CONTAINER_MEMORY_LIMIT_BYTES - from memory scaling

Secrets⚓︎

See Hydrolix Tunables List for detailed explanations.

These settings control interactions with storage buckets.

auth_http_read_timeout_ms
auth_http_response_timeout_ms
http_connect_timeout_ms
http_ssl_connect_timeout_ms
http_response_timeout_ms
http_read_timeout_ms
http_write_timeout_ms
max_http_retries
initial_exp_backoff_seconds
max_exp_backoff_seconds
exp_backoff_growth_factor_ms
exp_backoff_additive_jitter

Local disk⚓︎

These tunables influence local disk behavior.

disable_disk_cache
disk_cache_cull_start_perc
disk_cache_cull_stop_perc
disk_cache_redzone_start_perc
disk_cache_entry_max_ttl_minutes
io_perf_mappings

Application⚓︎

These settings set global limits and control the behavior of the query peer.

enable_query_auth
default_query_pool
max_concurrent_queries
max_server_memory_usage
user_acl_refresh_interval_secs
user_token_refresh_interval_secs
user_token_expiration_secs
str_dict_enabled
str_dict_nr_threads
str_dict_min_dict_size

Network⚓︎

These settings control application configuration as a client of the Domain Name System (DNS).

use_hydrolix_dns_resolver
dns_server_ip
dns_aws_max_ttl_secs
dns_aws_max_resolution_attempts
dns_azure_max_ttl_secs
dns_azure_max_resolution_attempts
dns_gcs_max_ttl_secs
dns_gcs_max_resolution_attempts

Deprecated and other⚓︎

These are old, deprecated, or unused tunables.

use_https_with_s3 - DEPRECATED: Use db_bucket_url or db_bucket_http_enabled

Metrics⚓︎

Key health metrics⚓︎

TODO: What are the most important metrics?

Other application metrics⚓︎

Logs⚓︎

Effects of breakage⚓︎

If all query peers are failing, the query subsystem will not be able to return results. The query head will retry up to hdx_query_max_attempts times.