Skip to content

Pod - Query Head

Overview⚓︎

Clusters typically run two instances of the query head application. At least one query head must be running in any cluster for the query subsystem to function.

The reverse proxy, traefik, directs incoming connections to Query interfaces in a query head pod.

Scaling choices depend on query load, data volume, and expected responsiveness.

In clusters configured with Query pools, the query head directs queries to the requested or default query pool.

Responsibilities⚓︎

At startup, the query head application

  • Verifies that the primary storage bucket is available TODO: all storage buckets?
  • Listens on network sockets
  • Awaits incoming queries over any configured Query Interfaces
  • Reloads the cluster config.json data file from primary storage every 5 seconds
  • Spawns a thread upon receiving a query

When handling a query received over any interface, the query head

  • Parses the incoming SQL query, generates an optimized plan
  • Constructs query options from cluster config, HTTP, and SQL SETTINGs clauses according to Query Options Precedence
  • Generates a query to the Catalog for relevant partitions
  • Retrieves available query peer instances for the query from Zookeeper
  • Hashes partition filenames for assignment to available query peers
  • Transmits the assignment to each query peer
  • Collects the intermediate results and completes the aggregation and sorting stage of query handling
  • Returns the query result to the client

Recoverable errors are handled according to configuration. For example, upon a query peer failure, a query head might assign the work to a different query peer if hdx_query_max_attempts has not been exceeded.

Upon non-recoverable errors the query head logs the failure and returns an informative error message to the client.

Deployment and scale⚓︎

comment
Type ReplicaSet in a Deployment
Replicas default 2, minimum 1
CPU 14
Memory 36 GiB
Storage default 5 GiB, recommend 50 GiB

Typical production scale shown here.

Storage inside the cluster is ephemeral. It provides local disk for work operations and optional query caching.

TODO: Are there recommended ratios for CPU, memory and storage?

On query heads, memory is used for accumulating query results until the limits set in hdx_query_max_bytes_before_external_group_by and hdx_query_max_bytes_before_external_sort are exceeded. Data operations then use disk for both GROUP BY and ORDER BY operations. If you are experiencing out-of-memory (OOM) conditions on a query head, set these values accordingly.

Scaling considerations⚓︎

Clients connect to the query head over some protocol. The query heads require sufficient resources to handle the final stages of query processing, aggregation and sorting, and all of the coordination with the Zookeeper and query peers.

Horizontal scaling is not usually necessary.

Vertical scaling can help if a query head encounters OOM conditions.

Autoscaling is not generally useful for the query system. Queries are a bursty workload and usually complete very quickly. Load-based signals are not a good indicator for autoscaling. Since scaling response time is measured in tens of seconds, queries have long failed or completed before any scaling changes can help.

See also - Scale Profiles - Scale your Cluster - Custom Scale Profiles

Containers⚓︎

Important files⚓︎

  • /usr/bin/turbine_server - executable
  • /var/lib/turbine/turbine.ini - local configuration file

Startup dependencies⚓︎

Runtime dependencies⚓︎

  • Depends on Zookeeper for presence and absence of query-peer instances in each pool
  • Distributes work to each Query Peer
  • Consults Catalog in a read-only fashion for relevant partitions for every query
  • Requires the Config API for authenticating clients
  • Scrapes the config.json blob Storage path every 5 seconds for updates

Runtime provides⚓︎

Network listeners⚓︎

External⚓︎

External traffic is managed by the Traefik application router before passing to the query-head application.

When TLS is enabled (standard), the following sockets are open

  • tcp/9440 - traefik handles TLS, passes plaintext to pod tcp/9000, set port using tunable native_tls_port
  • tcp/8088 - traefik handles TLS, passes plaintext to pod tcp/8088, set port using tunable clickhouse_http_port

When TLS is disabled (non-standard), the following sockets are open

  • tcp/9000 - connected to ClickHouse native protocol, tunable native_port
  • tcp/8088 - connected to ClickHouse HTTP, set port using tunable clickhouse_http_port

When tunable disable_traefik_mysql_port is false

  • tcp/9004 - can be set with tunable mysql_port

Cluster internal⚓︎

  • tcp/9000 - ClickHouse native protocol (not configurable)
  • tcp/8088 - ClickHouse HTTP (not configurable)
  • tcp/9004 - MySQL protocol (not configurable)

These ports are reachable inside the cluster, for example the tooling pod.

HTTP APIs⚓︎

All on tcp/8088⚓︎

  • /query - HTTP Query API
  • /support - Hydrolix core system build-time diagnostic output
  • /metrics - Metrics
  • / - ClickHouse HTTP API

Configuration⚓︎

Application configuration is split across multiple sources.

Query options⚓︎

Query options provide flexible control over circuit breakers, format selection, and output destination.

Customers set query options using Query Interfaces or the Config API. See Query Options Precedence and Query Options Reference.

The config.json data file contains org, project, and table query options.

The incoming query may contain HTTP headers or query parameters with query options.

The query head resolves the different sources of query options to construct the set that will be used for each handled query.

Hydrolix Config API⚓︎

The query head depends on the Config API to authenticate clients.

Cluster⚓︎

  • CONFIG_API_ENDPOINT - in-cluster http://turbine-api:3000/config
  • CACHE_SPACE_CAPACITY - from storage scaling
  • CONTAINER_MEMORY_LIMIT_BYTES - from memory scaling

Secrets⚓︎

  • general - CATALOG_DB_PASSWORD - Credentials used for communicating with the Catalog
  • general - CI_USER_PASSWORD - TODO: Why is this here?

Hydrolix tunables⚓︎

See Hydrolix Tunables List for detailed explanations.

These settings control interactions with storage buckets.

  • auth_http_read_timeout_ms
  • auth_http_response_timeout_ms
  • http_connect_timeout_ms
  • http_ssl_connect_timeout_ms
  • http_response_timeout_ms
  • http_read_timeout_ms
  • http_write_timeout_ms
  • max_http_retries
  • initial_exp_backoff_seconds
  • max_exp_backoff_seconds
  • exp_backoff_growth_factor_ms
  • exp_backoff_additive_jitter

Local disk⚓︎

These tunables influence local disk behavior.

  • disable_disk_cache
  • disk_cache_cull_start_perc
  • disk_cache_cull_stop_perc
  • disk_cache_redzone_start_perc
  • disk_cache_entry_max_ttl_minutes
  • io_perf_mappings

Application⚓︎

These settings set global limits and control the behavior of the query head.

  • enable_query_auth
  • default_query_pool
  • max_concurrent_queries
  • max_server_memory_usage
  • user_acl_refresh_interval_secs
  • user_token_refresh_interval_secs
  • user_token_expiration_secs
  • str_dict_enabled
  • str_dict_nr_threads
  • str_dict_min_dict_size

Network⚓︎

These settings control application configuration as a client of the Domain Name System (DNS).

  • use_hydrolix_dns_resolver
  • dns_server_ip
  • dns_aws_max_ttl_secs
  • dns_aws_max_resolution_attempts
  • dns_azure_max_ttl_secs
  • dns_azure_max_resolution_attempts
  • dns_gcs_max_ttl_secs
  • dns_gcs_max_resolution_attempts

Deprecated and other⚓︎

These are old, deprecated, or unused tunables.

  • use_https_with_s3 - DEPRECATED: Use db_bucket_url or db_bucket_http_enabled

Metrics⚓︎

See Query Metrics.

Key health metrics⚓︎

TODO: What are the most important metrics?

Other application metrics⚓︎

TODO: Are there any additional metrics that could fall into the "interesting" category?

Logs⚓︎

Logging output from the query head is handled by the Vector and can be found in two places

  • primary storage bucket path logs/query-head/query-head/<YYYY-MM-DD>/<timestamp>-<hash>.log.gz
  • in the Hydrologs, under the system project hydro and table logs

Effects of breakage⚓︎

  • The query head is a crucial aspect of query system availability in the cluster. If all query heads are failing, clients will be unable to query. They will receive network errors or timeouts.

Troubleshoot⚓︎