Query Performance Debugging

The HTTP Query API can respond with performance details. Set the query option hdx_query_debug=true:

  • Add hdx_query_debug=true to a SQL SETTINGS clause
  • Add query parameter hdx_query_debug=true to a request
  • Add an HTTP header X-HDX-query-settings: hdx_query_debug=true to a request

Responses include an HTTP Header X-Hdx-Query-Stats.

SELECT payment_type, count(pickup_datetime) as count
FROM sample.taxi_trips
WHERE pickup_datetime > '2008-01-01'
AND cab_type = 'yellow'
AND payment_type IN ('1', '2', '3', '4', '5', 'Cash', 'cre', 'CRE', 'No ', 'CRD', 'CSH', 'Dis', 'Dispute', 'Cas', 'DIS', 'NOC', 'CASH','No Charge','UNK','CREDIT')
AND trip_id IN (31190610, 32217069, 37860364, 35766523, 28006587, 211374605, 204381276, 210330995, 344756899, 353014930, 348959937, 520894156, 521592271, 521416965, 704689382, 707758682, 702269037, 883426527, 879781110, 872153954, 1048852862, 18432388, 1039420067, 1042644877, 17314313)
GROUP BY payment_type
ORDER BY count
SETTINGS hdx_query_debug=true

As the SQL statement include hdx_query_debug=true the response include the HTTP header X-HDX-Query-Stats:

X-HDX-Query-Stats: exec_time=732 rows_read=172032 bytes_read=7220125 num_partitions=100 num_peers=10 query_attempts=1 memory_usage=31468292 index_stats=[{"sample.taxi_trips":{"columns_read":["cab_type","payment_type","pickup_datetime","trip_id"],"indexes_used":["cab_type","payment_type","pickup_datetime","trip_id"],"shard_key_values_used":[]}}]

Main hdx_query_debug fields

Key / Field NameDescription
exec_timeIn-cluster query execution time in milliseconds. Does not include transfer time to end user
rows_readNumber of rows read by the query peers
bytes_readNumber of bytes read by the query peers
num_partitionsNumber of partitions read by the query peers
num_peersNumber of query peers used for the query
query_attemptsNumber of query attempts, does not exceed hdx_max_query_attempts
memory_usageMemory used, in bytes, by the query head
pool_nameName of the query pool used
limit_optimizationMethod of optimization used by the query engine for early query termination.
Possible values: where_order_by_limit_n, order_by_limit_n, limit_n, not_applied
query_detail_runtime_statsJSON object containing details about query execution Details below
index_statsJSON object containing details about indexing for each table referenced. Details below

Fields in query_detail_runtime_stats

Field NameDescription
cached_col_dict_bytes_readTotal dictionary bytes read from local query peer cache, rather than fetched from object storage
cached_index_bytes_readTotal index bytes read from local query peer cache, rather than fetched from object storage
cached_manifest_bytes_readTotal manifest bytes read from local query peer cache, rather than fetched from object storage
catalog_resp_time_msExecution time of catalog query in milliseconds
hdx_bad_partitions_skippedCount of partitions skipped due to data corruption
hdx_blocks_readCount of data blocks fetched from object storage
hdx_blocks_skippedCount of data blocks considered but not fetched due to optimization
net_col_dict_bytes_readTotal dictionary bytes read from object storage
net_data_bytes_readTotal data bytes read from object storage
net_index_bytes_readTotal index bytes read from object storage
net_manifest_bytes_readTotal manifest bytes read from object storage

Fields in index_stats

Field NameDescription
columns_readCount of columns read
distributed_join_or_inBoolean showing whether JOIN and IN statements were converted to GLOBAL JOIN and GLOBAL IN statements
indexes_usedCount of indexes used
shard_key_values_usedJSON list of shard key values

If you are writing complex query using JOIN or INNER SELECT statements, it's very useful to look at the index_stats to ensure that Hydrolix fetching only columns used by your statement.

🚧

X-HDX-Query-Stats

This header is present only when you are using the HTTP query API endpoint, don't use TCP clickhouse-native for debugging as the information won't be present.

📘

Stats included in logs

If you want to review those query statistics after the fact, those are present in the query logs