Pod - Query Peer
Overview⚓︎
Most clusters run multiple instances of the query peer application.
Each query peer registers with the Zookeeper coordinator and awaits work assignments from the query head.
The query peer interacts with Storage and the partition data to retrieve raw data in answering a query.
Scaling choices depends on query load, data size, and expected responsiveness.
Clusters can additionally be configured with Query pools, a feature allow separation of different classes of queries onto distinct CPU or memory resources.
Each query peer awaits instructions from a query head. When a query peer receives an assignment, it fetches and reads the raw Hydrolix partition data from storage, executes the query against the partitions, collects intermediate results and returns the results to the query head.
Responsibilities⚓︎
At startup, each query peer application
- Verifies that the storage bucket is available TODO: all storage buckets?
- Listens on its network sockets for requests from a query head
- Registers its name, IP, pool membership with zookeeper
- Reloads the cluster config.json data file from primary storage every 5 seconds
- Awaits assignments from query head
Upon receiving an incoming query
- Parses the incoming SQL query written by the query head and generates a plan
- Downloads from storage and caches locally the partition manifest, index, and column dictionaries required for the query
- Executes the filter clause of the query
WHEREclause using index to identify blocks of interest - Issues byte range requests for blocks of interest to the storage systems
- Accumulates and returns intermediate results to
query-head
Deployment and scale⚓︎
| comment | |
|---|---|
| Type | ReplicaSet in a Deployment |
| Replicas | default 3, minimum 1 |
| CPU | 14 |
| Memory | 36 GiB |
| Storage | default 5GiB, recommend 50 GiB |
Typical production scale shown here.
TODO: Is there a memory and CPU balance ratio?
On query peers, memory is used for accumulating query results.
Storage inside the cluster is ephemeral. It provides local disk for work operations, query caching when enabled, and caching of partition files, especially the manifest, index, and column dictionaries for each partition.
Scaling considerations⚓︎
Parallelization of data retrieval is a core design principle of the Hydrolix system. The query peers are the key support for providing this parallelization.
Horizontal scaling by adding query peers can improve query response times at the cost of idle CPU other times. Maximum node count can be determined from the Kubernetes cluster. Hydrolix places no maximum on peer count and will use all available peers when parallelizing.
Vertical scaling query peer pods by requesting more CPU, memory, or storage or increasing the node size can help if query peers frequently experience out-of-memory (OOM) conditions. If a query peer has enough memory to read and process hdx_query_max_concurrent_partitions multiplied by hdx_query_max_streams, it is less likely to hit OOM conditions.
Autoscaling is not generally useful for the query system. Queries are a bursty workload and usually complete very quickly. Load-based signals are not a good indicator for autoscaling. Since scaling response time is measured in tens of seconds, queries have long failed or completed before any scaling changes can help.
See also - Scale Profiles - Scale your Cluster - Custom Scale Profiles
Containers⚓︎
wait-for-zookeeperis an InitContainerquery-peerruns theturbine_serverin
Important files⚓︎
/usr/bin/turbine_server- executable/var/lib/turbine/turbine.ini- configuration file
Startup dependencies⚓︎
Runtime dependencies⚓︎
- Registers and continually sends a heartbeat to Zookeeper
- Query Head
- Storage
Runtime provides⚓︎
Network listeners⚓︎
External⚓︎
None.
Cluster internal⚓︎
- tcp/9000 - ClickHouse native protocol
- tcp/8088 - ClickHouse HTTP
These ports are reachable only from inside the cluster, for example the tooling pod.
HTTP APIs⚓︎
- tcp/8088, Metrics
/metrics
Configuration⚓︎
The Hydrolix operator conveys cluster settings, tunables and secrets into the container.
Per-query configuration is assembled by the query head and passed to each query-peer over intra-cluster communication.
Query options⚓︎
The query head handles resolution of query options according to Query Options Precedence. The query head rewrites the original query with all relevant options as a ClickHouse table function query sent to the query peer.
Hydrolix Config API⚓︎
Query peer instances do not read the
Cluster⚓︎
CONFIG_API_ENDPOINT- in-cluster http://turbine-api:3000/configCACHE_SPACE_CAPACITY- from storage scalingCONTAINER_MEMORY_LIMIT_BYTES- from memory scaling
Secrets⚓︎
See Hydrolix Tunables List for detailed explanations.
Storage related⚓︎
These settings control interactions with storage buckets.
auth_http_read_timeout_msauth_http_response_timeout_mshttp_connect_timeout_mshttp_ssl_connect_timeout_mshttp_response_timeout_mshttp_read_timeout_mshttp_write_timeout_msmax_http_retriesinitial_exp_backoff_secondsmax_exp_backoff_secondsexp_backoff_growth_factor_msexp_backoff_additive_jitter
Local disk⚓︎
These tunables influence local disk behavior.
disable_disk_cachedisk_cache_cull_start_percdisk_cache_cull_stop_percdisk_cache_redzone_start_percdisk_cache_entry_max_ttl_minutesio_perf_mappings
Application⚓︎
These settings set global limits and control the behavior of the query peer.
enable_query_authdefault_query_poolmax_concurrent_queriesmax_server_memory_usageuser_acl_refresh_interval_secsuser_token_refresh_interval_secsuser_token_expiration_secsstr_dict_enabledstr_dict_nr_threadsstr_dict_min_dict_size
Network⚓︎
These settings control application configuration as a client of the Domain Name System (DNS).
use_hydrolix_dns_resolverdns_server_ipdns_aws_max_ttl_secsdns_aws_max_resolution_attemptsdns_azure_max_ttl_secsdns_azure_max_resolution_attemptsdns_gcs_max_ttl_secsdns_gcs_max_resolution_attempts
Deprecated and other⚓︎
These are old, deprecated, or unused tunables.
use_https_with_s3- DEPRECATED: Usedb_bucket_urlordb_bucket_http_enabled
Metrics⚓︎
Key health metrics⚓︎
TODO: What are the most important metrics?
Other application metrics⚓︎
Logs⚓︎
Effects of breakage⚓︎
- If all query peers are failing, the query subsystem will not be able to return results. The query head will retry up to
hdx_query_max_attemptstimes.