Query Performance Debugging
The HTTP Query API can respond with performance details. Set the query option hdx_query_debug=true
:
- Add
hdx_query_debug=true
to a SQL SETTINGS clause - Add query parameter
hdx_query_debug=true
to a request - Add an HTTP header
X-HDX-query-settings: hdx_query_debug=true
to a request
Responses include an HTTP Header X-Hdx-Query-Stats
.
SELECT payment_type, count(pickup_datetime) as count
FROM sample.taxi_trips
WHERE pickup_datetime > '2008-01-01'
AND cab_type = 'yellow'
AND payment_type IN ('1', '2', '3', '4', '5', 'Cash', 'cre', 'CRE', 'No ', 'CRD', 'CSH', 'Dis', 'Dispute', 'Cas', 'DIS', 'NOC', 'CASH','No Charge','UNK','CREDIT')
AND trip_id IN (31190610, 32217069, 37860364, 35766523, 28006587, 211374605, 204381276, 210330995, 344756899, 353014930, 348959937, 520894156, 521592271, 521416965, 704689382, 707758682, 702269037, 883426527, 879781110, 872153954, 1048852862, 18432388, 1039420067, 1042644877, 17314313)
GROUP BY payment_type
ORDER BY count
SETTINGS hdx_query_debug=true
As the SQL statement include hdx_query_debug=true
the response include the HTTP header X-HDX-Query-Stats
:
X-HDX-Query-Stats: exec_time=732 rows_read=172032 bytes_read=7220125 num_partitions=100 num_peers=10 query_attempts=1 memory_usage=31468292 index_stats=[{"sample.taxi_trips":{"columns_read":["cab_type","payment_type","pickup_datetime","trip_id"],"indexes_used":["cab_type","payment_type","pickup_datetime","trip_id"],"shard_key_values_used":[]}}]
Main hdx_query_debug
fields
hdx_query_debug
fieldsKey / Field Name | Description |
---|---|
exec_time | In-cluster query execution time in milliseconds. Does not include transfer time to end user |
rows_read | Number of rows read by the query peers |
bytes_read | Number of bytes read by the query peers |
num_partitions | Number of partitions read by the query peers |
num_peers | Number of query peers used for the query |
query_attempts | Number of query attempts, does not exceed hdx_max_query_attempts |
memory_usage | Memory used, in bytes, by the query head |
pool_name | Name of the query pool used |
limit_optimization | Method of optimization used by the query engine for early query termination. Possible values: where_order_by_limit_n , order_by_limit_n , limit_n , not_applied |
query_detail_runtime_stats | JSON object containing details about query execution Details below |
index_stats | JSON object containing details about indexing for each table referenced. Details below |
Fields in query_detail_runtime_stats
query_detail_runtime_stats
Field Name | Description |
---|---|
cached_col_dict_bytes_read | Total dictionary bytes read from local query peer cache, rather than fetched from object storage |
cached_index_bytes_read | Total index bytes read from local query peer cache, rather than fetched from object storage |
cached_manifest_bytes_read | Total manifest bytes read from local query peer cache, rather than fetched from object storage |
catalog_resp_time_ms | Execution time of catalog query in milliseconds |
hdx_bad_partitions_skipped | Count of partitions skipped due to data corruption |
hdx_blocks_read | Count of data blocks fetched from object storage |
hdx_blocks_skipped | Count of data blocks considered but not fetched due to optimization |
net_col_dict_bytes_read | Total dictionary bytes read from object storage |
net_data_bytes_read | Total data bytes read from object storage |
net_index_bytes_read | Total index bytes read from object storage |
net_manifest_bytes_read | Total manifest bytes read from object storage |
Fields in index_stats
index_stats
Field Name | Description |
---|---|
columns_read | Count of columns read |
distributed_join_or_in | Boolean showing whether JOIN and IN statements were converted to GLOBAL JOIN and GLOBAL IN statements |
indexes_used | Count of indexes used |
shard_key_values_used | JSON list of shard key values |
If you are writing complex query using JOIN or INNER SELECT statements, it's very useful to look at the index_stats to ensure that Hydrolix fetching only columns used by your statement.
X-HDX-Query-Stats
This header is present only when you are using the HTTP query API endpoint, don't use TCP clickhouse-native for debugging as the information won't be present.
Stats included in logs
If you want to review those query statistics after the fact, those are present in the query logs
Updated 21 days ago