Skip to content

Query Performance Debugging

The HTTP Query API can respond with performance details. Set the query option hdx_query_debug=true:

  • Add hdx_query_debug=true to a SQL SETTINGS clause
  • Add query parameter hdx_query_debug=true to a request
  • Add an HTTP header X-HDX-query-settings: hdx_query_debug=true to a request

Responses include an HTTP Header X-Hdx-Query-Stats.

1
2
3
4
5
6
7
8
9
SELECT payment_type, count(pickup_datetime) as count
FROM sample.taxi_trips
WHERE pickup_datetime > '2008-01-01'
AND cab_type = 'yellow'
AND payment_type IN ('1', '2', '3', '4', '5', 'Cash', 'cre', 'CRE', 'No ', 'CRD', 'CSH', 'Dis', 'Dispute', 'Cas', 'DIS', 'NOC', 'CASH','No Charge','UNK','CREDIT')
AND trip_id IN (31190610, 32217069, 37860364, 35766523, 28006587, 211374605, 204381276, 210330995, 344756899, 353014930, 348959937, 520894156, 521592271, 521416965, 704689382, 707758682, 702269037, 883426527, 879781110, 872153954, 1048852862, 18432388, 1039420067, 1042644877, 17314313)
GROUP BY payment_type
ORDER BY count
SETTINGS hdx_query_debug=true

As the SQL statement include hdx_query_debug=true the response include the HTTP header X-HDX-Query-Stats:

X-HDX-Query-Stats: exec_time=732 rows_read=172032 bytes_read=7220125 num_partitions=100 num_peers=10 query_attempts=1 memory_usage=31468292 index_stats=[{"sample.taxi_trips":{"columns_read":["cab_type","payment_type","pickup_datetime","trip_id"],"indexes_used":["cab_type","payment_type","pickup_datetime","trip_id"],"shard_key_values_used":[]}}]

Main hdx_query_debug fields⚓︎

Key / Field Name Description
exec_time In-cluster query execution time in milliseconds. Does not include transfer time to end user
rows_read Number of rows read by the query peers
bytes_read Number of bytes read by the query peers
num_partitions Number of partitions read by the query peers
num_peers Number of query peers used for the query
query_attempts Number of query attempts, does not exceed hdx_max_query_attempts
memory_usage Memory used, in bytes, by the query head
pool_name Name of the query pool used
limit_optimization Method of optimization used by the query engine for early query termination.
Possible values: where_order_by_limit_n, order_by_limit_n, limit_n, not_applied
query_detail_runtime_stats JSON object containing details about query execution Details below
index_stats JSON object containing details about indexing for each table referenced. Details below

Fields in query_detail_runtime_stats⚓︎

Field Name Description
cached_col_dict_bytes_read Total dictionary bytes read from local query peer cache, rather than fetched from object storage
cached_index_bytes_read Total index bytes read from local query peer cache, rather than fetched from object storage
cached_manifest_bytes_read Total manifest bytes read from local query peer cache, rather than fetched from object storage
catalog_resp_time_ms Execution time of catalog query in milliseconds
hdx_bad_partitions_skipped Count of partitions skipped due to data corruption
hdx_blocks_read Count of data blocks fetched from object storage
hdx_blocks_skipped Count of data blocks considered but not fetched due to optimization
net_col_dict_bytes_read Total dictionary bytes read from object storage
net_data_bytes_read Total data bytes read from object storage
net_index_bytes_read Total index bytes read from object storage
net_manifest_bytes_read Total manifest bytes read from object storage

Fields in index_stats⚓︎

Field Name Description
columns_read Count of columns read
distributed_join_or_in Boolean showing whether JOIN and IN statements were converted to GLOBAL JOIN and GLOBAL IN statements
indexes_used Count of indexes used
shard_key_values_used JSON list of shard key values

If you are writing complex query using JOIN or INNER SELECT statements, it's very useful to look at the index_stats to ensure that Hydrolix fetching only columns used by your statement.

X-HDX-Query-Stats

This header is present only when you are using the HTTP query API endpoint, don't use TCP clickhouse-native for debugging as the information won't be present.

Stats included in logs

If you want to review those query statistics after the fact, those are present in the query logs