Query System Logical Operation

Query operation⚓︎

Query System Logical View

Auth Client: Incoming requests to query interfaces must be authorized unless the tunable enable_query_auth has been set to false. For HTTP interfaces, Traefik interacts with the Config API to verify access for the requesting client. For TCP-based interfaces, the query head passes any tokens or credentials to the Config API to perform access checking.
Parse, Plan, and Optimize: The query head assigns an initial query ID, parses the SQL including all query options, and collects all data access controls from the Config API. Column-level access control is applied: clients requesting blocked columns receive a permissions error. The query head produces an optimized query plan. The plan includes derived SQL queries for the peers, query options, and an optional row filter expression.
Identify Partitions: The query head uses shard key columns, if present, and the time range clause of the client's query to generate a catalog query. The catalog's response lists all table partitions covering that time range and matching the other query metadata.

Assign to Peers: Zookeeper tracks available query peers. The query head distributes work across available peers, assigning the derived SQL query and each partition to a single peer.
Fetch Manifests: Each query peer receives the derived query and its assigned partitions. Of the three files that comprise any partition, the query peer immediately retrieves the entire first file, the manifest, from object storage.
Load Relevant Indexes: Using the derived query and the manifest, the query peer retrieves parts of the second file, the index, using byte range requests to object storage.
Install Row Filters: The query peer installs the final, combined row filter row-level access control when opening each partition's third file, the data.
Read Raw Rows: The query peer loads data from the raw partition into memory.
Raw Aggregation: Query peers construct intermediate responses. For speed, they use in-memory hash aggregation tables for any GROUP BY operations, though spilling to disk is configurable.

Merge Intermediate Results: Query peers stream intermediate results to the query head.
Sort and Aggregate: For speed, query heads use memory for ORDER BY and GROUP BY operations, though spilling to disk for both sorting and grouping operations is configurable.
Format Results: The query head returns the results to the consumer in the desired output format. The HTTP Stream API interface also returns X-Hdx-Query-Stats as an HTTP header.