Skip to content

Pod - Query Peer

Overview⚓︎

Most clusters run multiple instances of the query peer application.

Each query peer registers with the Zookeeper coordinator and awaits work assignments from the query head.

The query peer interacts with Storage and the partition data to retrieve raw data in answering a query.

Scaling choices depends on query load, data size, and expected responsiveness.

Clusters can additionally be configured with Query pools, a feature allow separation of different classes of queries onto distinct CPU or memory resources.

Each query peer awaits instructions from a query head. When a query peer receives an assignment, it fetches and reads the raw Hydrolix partition data from storage, executes the query against the partitions, collects intermediate results and returns the results to the query head.

Responsibilities⚓︎

At startup, each query peer application

  • Verifies that the storage bucket is available TODO: all storage buckets?
  • Listens on its network sockets for requests from a query head
  • Registers its name, IP, pool membership with zookeeper
  • Reloads the cluster config.json data file from primary storage every 5 seconds
  • Awaits assignments from query head

Upon receiving an incoming query

  • Parses the incoming SQL query written by the query head and generates a plan
  • Downloads from storage and caches locally the partition manifest, index, and column dictionaries required for the query
  • Executes the filter clause of the query WHERE clause using index to identify blocks of interest
  • Issues byte range requests for blocks of interest to the storage systems
  • Accumulates and returns intermediate results to query-head

Deployment and scale⚓︎

comment
Type ReplicaSet in a Deployment
Replicas default 3, minimum 1
CPU 14
Memory 36 GiB
Storage default 5GiB, recommend 50 GiB

Typical production scale shown here.

TODO: Is there a memory and CPU balance ratio?

On query peers, memory is used for accumulating query results.

Storage inside the cluster is ephemeral. It provides local disk for work operations, query caching when enabled, and caching of partition files, especially the manifest, index, and column dictionaries for each partition.

Scaling considerations⚓︎

Parallelization of data retrieval is a core design principle of the Hydrolix system. The query peers are the key support for providing this parallelization.

Horizontal scaling by adding query peers can improve query response times at the cost of idle CPU other times. Maximum node count can be determined from the Kubernetes cluster. Hydrolix places no maximum on peer count and will use all available peers when parallelizing.

Vertical scaling query peer pods by requesting more CPU, memory, or storage or increasing the node size can help if query peers frequently experience out-of-memory (OOM) conditions. If a query peer has enough memory to read and process hdx_query_max_concurrent_partitions multiplied by hdx_query_max_streams, it is less likely to hit OOM conditions.

Autoscaling is not generally useful for the query system. Queries are a bursty workload and usually complete very quickly. Load-based signals are not a good indicator for autoscaling. Since scaling response time is measured in tens of seconds, queries have long failed or completed before any scaling changes can help.

See also - Scale Profiles - Scale your Cluster - Custom Scale Profiles

Containers⚓︎

  • wait-for-zookeeper is an InitContainer
  • query-peer runs the turbine_server in

Important files⚓︎

  • /usr/bin/turbine_server - executable
  • /var/lib/turbine/turbine.ini - configuration file

Startup dependencies⚓︎

Runtime dependencies⚓︎

Runtime provides⚓︎

Network listeners⚓︎

External⚓︎

None.

Cluster internal⚓︎

  • tcp/9000 - ClickHouse native protocol
  • tcp/8088 - ClickHouse HTTP

These ports are reachable only from inside the cluster, for example the tooling pod.

HTTP APIs⚓︎

  • tcp/8088, Metrics /metrics

Configuration⚓︎

The Hydrolix operator conveys cluster settings, tunables and secrets into the container.

Per-query configuration is assembled by the query head and passed to each query-peer over intra-cluster communication.

Query options⚓︎

The query head handles resolution of query options according to Query Options Precedence. The query head rewrites the original query with all relevant options as a ClickHouse table function query sent to the query peer.

Hydrolix Config API⚓︎

Query peer instances do not read the

Cluster⚓︎

  • CONFIG_API_ENDPOINT - in-cluster http://turbine-api:3000/config
  • CACHE_SPACE_CAPACITY - from storage scaling
  • CONTAINER_MEMORY_LIMIT_BYTES - from memory scaling

Secrets⚓︎

See Hydrolix Tunables List for detailed explanations.

These settings control interactions with storage buckets.

  • auth_http_read_timeout_ms
  • auth_http_response_timeout_ms
  • http_connect_timeout_ms
  • http_ssl_connect_timeout_ms
  • http_response_timeout_ms
  • http_read_timeout_ms
  • http_write_timeout_ms
  • max_http_retries
  • initial_exp_backoff_seconds
  • max_exp_backoff_seconds
  • exp_backoff_growth_factor_ms
  • exp_backoff_additive_jitter

Local disk⚓︎

These tunables influence local disk behavior.

  • disable_disk_cache
  • disk_cache_cull_start_perc
  • disk_cache_cull_stop_perc
  • disk_cache_redzone_start_perc
  • disk_cache_entry_max_ttl_minutes
  • io_perf_mappings

Application⚓︎

These settings set global limits and control the behavior of the query peer.

  • enable_query_auth
  • default_query_pool
  • max_concurrent_queries
  • max_server_memory_usage
  • user_acl_refresh_interval_secs
  • user_token_refresh_interval_secs
  • user_token_expiration_secs
  • str_dict_enabled
  • str_dict_nr_threads
  • str_dict_min_dict_size

Network⚓︎

These settings control application configuration as a client of the Domain Name System (DNS).

  • use_hydrolix_dns_resolver
  • dns_server_ip
  • dns_aws_max_ttl_secs
  • dns_aws_max_resolution_attempts
  • dns_azure_max_ttl_secs
  • dns_azure_max_resolution_attempts
  • dns_gcs_max_ttl_secs
  • dns_gcs_max_resolution_attempts

Deprecated and other⚓︎

These are old, deprecated, or unused tunables.

  • use_https_with_s3 - DEPRECATED: Use db_bucket_url or db_bucket_http_enabled

Metrics⚓︎

Key health metrics⚓︎

TODO: What are the most important metrics?

Other application metrics⚓︎

Logs⚓︎

Effects of breakage⚓︎

  • If all query peers are failing, the query subsystem will not be able to return results. The query head will retry up to hdx_query_max_attempts times.

Troubleshoot⚓︎