Query Performance Debugging

The HTTP Query API can respond with performance details. Set the query option hdx_query_debug=true:

  • Add hdx_query_debug=true to a SQL SETTINGS clause
  • Add query parameter hdx_query_debug=true to a request
  • Add an HTTP header X-HDX-query-settings: hdx_query_debug=true to a request

Responses include an HTTP Header X-Hdx-Query-Stats.

SELECT payment_type, count(pickup_datetime) as count
FROM sample.taxi_trips
WHERE pickup_datetime > '2008-01-01'
AND cab_type = 'yellow'
AND payment_type IN ('1', '2', '3', '4', '5', 'Cash', 'cre', 'CRE', 'No ', 'CRD', 'CSH', 'Dis', 'Dispute', 'Cas', 'DIS', 'NOC', 'CASH','No Charge','UNK','CREDIT')
AND trip_id IN (31190610, 32217069, 37860364, 35766523, 28006587, 211374605, 204381276, 210330995, 344756899, 353014930, 348959937, 520894156, 521592271, 521416965, 704689382, 707758682, 702269037, 883426527, 879781110, 872153954, 1048852862, 18432388, 1039420067, 1042644877, 17314313)
GROUP BY payment_type
ORDER BY count
SETTINGS hdx_query_debug=true

As the SQL statement include hdx_query_debug=true the response include the HTTP header X-HDX-Query-Stats:

X-HDX-Query-Stats: exec_time=732 rows_read=172032 bytes_read=7220125 num_partitions=100 num_peers=10 query_attempts=1 memory_usage=31468292 index_stats=[{"sample.taxi_trips":{"columns_read":["cab_type","payment_type","pickup_datetime","trip_id"],"indexes_used":["cab_type","payment_type","pickup_datetime","trip_id"],"shard_key_values_used":[]}}]
keydefinition
exec_timeIn-cluster query execution time in milliseconds. Does not include transfer time to end user.
rows_readNumber of rows read by the query peers
bytes_readNumber of bytes read by the query peers
num_partitionsNumber of partitions read by the query peers
num_peersNumber of query peers used for the query
query_attemptsNumber of query attempts, does not exceed hdx_max_query_attempts
memory_usageMemory used, in bytes, by the query head
For example GROUP BY is done at the head and requires RAM to do the aggregation.
index_statsShows the table used for the query, and for each table which column is read and also which index is used.

If you are writing complex query using JOIN or INNER SELECT statement it's very useful to look at the index_stats to ensure that Hydrolix fetching only columns used by your statement.

🚧

X-HDX-Query-Stats

This header is present only when you are using the HTTP query API endpoint, don't use TCP clickhouse-native for debugging as the information won't be present.

📘

Stats included in logs

If you want to review those query statistics after the fact, those are present in the query logs