Query Options Reference
Categories
Query options provide control over the Hydrolix query system. They control output format, pool selection, query rate limiting, circuit breakers, and observability features. They're available over the many Query Interfaces.
Use the Format control options to enable writing to cloud storage, specify filenames, and select a different output format.
Use Observability and metrics options to allow system analysts to track query volume and usage across applications and users.
Use Caching control options to activate caching and set lifetimes on cached query results.
Use Circuit breakers to prevent exhaustion of cluster query resources.
Use Query rate limiting to select alternate query pools and influence query peer selection and behavior.
Use Advanced circuit breakers if you have knowledge about internals of Hydrolix cluster query subsystem and have diagnosed an issue with the query plan, query result merging, or latest-N ordering.
Table of Query Options
Table of query option types, defaults, support, and links to detailed descriptions.
Notes:
- Numeric types indicated as
unlimited
can hold arbitrarily large numeric values. Numeric range support should cover all uses for this system.
Detailed descriptions
Format control
hdx_query_output_format
hdx_query_output_format
Select any of the Formats for input and output data known to the underlying ClickHouse engine.
The Hydrolix HTTP query API defaults to plain CSV.
The Hydrolix UI defaults to JSON and allows easy download in CSVWithNames format.
For example, use a GET
query to request CSV format with column names (CSVWithNames):
https://YOUR-HYDROLIX-HOST.hydrolix.live/query/?query=YOUR-SQL-QUERY&hdx_query_output_format=CSVWithNames
Interaction with SQL FORMAT
clause
FORMAT
clauseIf supported by the client's SQL dialect and present in a query, the FORMAT
clause takes precedence over hdx_query_output_format
.
hdx_query_output_file_enabled
hdx_query_output_file_enabled
Indicates whether you want to save a query result to a file on your cloud storage. The query is saved in the format instructed by hdx_query_output_format
in a randomly generated filename at the top-level of the directory.
Default is false
.
When used, query output is sent to the storage location and the client receives a small response.
{
"file_path": "bucket-name/filename",
"type": "output_file"
}
hdx_query_output_filename
hdx_query_output_filename
Indicates whether you want to save a query result to a file on your cloud storage. The query is saved in the format instructed by hdx_query_output_format
in the filename specified at the top-level of the directory tree. This will overwrite the file if it's already present.
Default is empty string; no file will be saved to cloud storage.
When used, query output is sent to a storage location and the client receives a small response.
{
"file_path": "bucket-name/filename",
"type": "output_file"
}
Observability and metrics
hdx_query_comment
hdx_query_comment
Add a comment to the query visible in Active Queries. This allows a user to provide explanation for the query. For example, if you have a query running every X min as part of a reporting tool you can include this information in the comment of the query.
Default is empty string.
hdx_query_admin_comment
hdx_query_admin_comment
Add an admin comment to the query which is stored in Active Queries. This field can be filled automatically by Superset or Grafana to include username information in order to track user activity.
Default is empty string.
hdx_query_debug
hdx_query_debug
Request inclusion of X-HDX-Query-Stats
in the response headers from the HTTP query API. See Query Performance Debugging.
Default is false
.
hdx_query_label
hdx_query_label
Attach the supplied string as a query_label
for query-relevant metrics emitted to Prometheus. This query option is only available on SQL SETTINGS
.
This is a convenient way to collect metrics on queries emitted by specific applications. For example, setting query_label = "grafana-nrt"
would result in prometheus metrics like:
net_http_status_code_bucket{ ... query_label="grafana-nrt" ...}`
Consider Prometheus label cardinality concerns when constructing dynamic patterns for this query option.
Default is empty string, in which case the Prometheus label query_label
isn't present on the relevant metrics.
hdx_log_query
hdx_log_query
When set to false
, suppress this query from the query log. Metrics will still be tallied. Only usable in the SETTINGS
clause of an SQL query.
Default is true
.
Caching control
use_query_cache
use_query_cache
Only usable in the SETTINGS
clause of an SQL query, this marks the query as a candidate for ClickHouse query caching. See "Use Query Caching" on the Query Efficiency page.
Default is false
.
query_cache_ttl
query_cache_ttl
Default is 60.
Circuit breakers
hdx_query_catalog_timeout_ms
hdx_query_catalog_timeout_ms
The duration in milliseconds before a catalog query is canceled. If the setting hdx_query_max_execution_time
is set, the value for that setting will supersede this one.
Default value is 0; no limit.
Sample error: CatalogError Failed to submit transaction: ERROR: canceling statement due to statement timeout
hdx_query_max_rows
hdx_query_max_rows
Specifies the maximum number of rows that can be evaluated to answer a query. The query will be canceled if it exceeds this number of rows.
Default value is 0; no limit.
Sample error: HdxStorageError Maximum number of rows exceeded for query: 335427 rows (maximum is 200000)
hdx_query_max_attempts
hdx_query_max_attempts
Specify a number of failures of a query peer that can occur (for any reason). The query head will cancel the query if an additional attempt exceeds this number.
Default value is 3.
hdx_query_max_result_bytes
hdx_query_max_result_bytes
Limit the number of bytes that can be stored on the query head before returning data. The query will be canceled if the response byte count exceeds this value.
Default value is 0; no limit.
Sample error: Code: 396. DB::Exception: Limit for result exceeded, max bytes: 8.00 KiB, current bytes: 8.72 MiB. (TOO_MANY_ROWS_OR_BYTES)
hdx_query_max_result_rows
hdx_query_max_result_rows
Limit the number of rows that can be stored on the query head before returning the response. The query will be canceled if the resulting row count exceeds this number.
Default value is 0; no limit.
Sample error: Code: 396. DB::Exception: Limit for result exceeded, max rows: 20.00 thousand, current rows: 21.56 thousand. (TOO_MANY_ROWS_OR_BYTES)
hdx_query_max_timerange_sec
hdx_query_max_timerange_sec
Limit the total timerange allowed for a query. Calculates a query's covered time range from the WHERE
clause filter on the primary column and cancels if the time range in seconds exceeds this number.
For example, setting this value to 86400 prevents operating on a time window larger than one day.
To prevent queries lacking WHERE
clauses (or any filters) on the primary column, you must use the hdx_query_timerange_required
query option.
Default value is 0; no limit.
Sample error: HdxStorageError Maximum time range exceeded for query: 601786 seconds (maximum is 86400)
hdx_query_timerange_required
hdx_query_timerange_required
Set this parameter to True
to ensure that queries include a WHERE
clause on the primary column. Any query lacking a WHERE
clause or filter on the primary column will be canceled.
Requiring a time range for queries prevents accidental scanning of all data for all time.
Default is false
.
Sample error: HdxStorageError hdx_query_timerange_required is set to true. Your query needs a time range filter in a WHERE clause
hdx_query_max_partitions
hdx_query_max_partitions
Limit the number of partitions the query may read. The query will be canceled if the total number of partitions required to execute the query exceeds this number.
Default value is 0; no limit.
Sample error: HdxStorageError Maximum number of partitions exceeded for query: 168 partitions (maximum is 10)
hdx_query_max_execution_time
hdx_query_max_execution_time
Limit the total runtime of a query. The query will be canceled after the specified number of seconds have passed since query execution began.
Default value is 0; no limit.
Sample error: Code: 159. DB::Exception: Timeout exceeded: elapsed 2.179121784 seconds, maximum: 2. (TIMEOUT_EXCEEDED)
hdx_query_max_columns_to_read
hdx_query_max_columns_to_read
Limit the number of columns allowed in a SELECT statement. The query will be canceled if the total number of columns required to complete the query exceeds this number.
Columnar data stores are built to optimize for retrieval of specific columns. See also Query Efficiency on SELECT *
.
Default is 0, which sets no limit on the number of columns. Use this setting if you intend to SELECT *
.
Sample error: Code: 161. DB::Exception: Limit for number of columns to read exceeded. Requested: 246, maximum: 100. (TOO_MANY_COLUMNS)
hdx_query_max_memory_usage
hdx_query_max_memory_usage
Limit total memory in bytes allowed for a single query. The query will be canceled if memory usage on any query peers or the query head exceeds this value.
Default value is 0; no limit.
Sample error: Code: 241. DB::Exception: Memory limit (for query) exceeded: would use 65.18 MiB (attempt to allocate chunk of 33554526 bytes), maximum: 50.00 MiB.
Query rate limiting
These options specify limits on the resources available to query processes.
hdx_query_max_peers
hdx_query_max_peers
Set this flag to instruct the query head to use only a subset of available peers for the query. If the number is greater than the number of available peers, all available peers are used.
The effect of this setting may influence total query runtime by restricting work to a subset of query peers. For a systematic approach to separating query resources, see also Query pools.
Default is 0; the query head distributes across all available peers to maximize parallel processing.
hdx_query_pool_name
hdx_query_pool_name
Send the query to a specific query pool. This option is only useful when multiple query pools are available, and the named query pool must exist already. See also Query pools.
Sample error: ClusterError Pool name unknown_pool does not exist
Default is empty string; the default query pool uses all available peers.
hdx_query_max_streams
hdx_query_max_streams
Limit the count of threads per CPU core each query process may initiate. See also interaction with hdx_query_max_concurrent_partitions
.
Default is 1.
hdx_query_max_concurrent_partitions
hdx_query_max_concurrent_partitions
Limit the number of partitions assignable to a single query peer.
Default is 3.
Decreasing this value constrains CPU demand at the cost of query speed (performance). Increasing this value may increase query responsiveness in clusters with sufficient query peers and available CPU.
This setting is applied per-process and combines with hdx_query_max_streams
. Each query process will spawn threads up to hdx_query_max_streams
(per CPU core) and each thread will open hdx_query_max_concurrent_partitions
.
Hydrolix query depends on available network bandwidth and CPU when extracting data from many HDX partitions in parallel. These query options provide controls for balancing memory and CPU usage for optimal query response time.
Setting the value above 25 risks excessive memory pressure on the query peer. Tread carefully.
Advanced circuit breakers
hdx_query_distributed_aggregation_memory_efficient
hdx_query_distributed_aggregation_memory_efficient
Enable memory-saving mode of distributed aggregation.
Default is 0.
hdx_query_max_bytes_before_external_group_by
hdx_query_max_bytes_before_external_group_by
Limit memory in bytes used when applying GROUP BY
before spilling to disk in the query head process. This can help protect against out of memory errors at the cost of local disk utilization. Use with care.
Default is 0; no limit.
hdx_query_max_bytes_before_external_sort
hdx_query_max_bytes_before_external_sort
Limit memory in bytes used when applying SORT
before spilling to disk in the query head process. This can help protect against out of memory errors at the cost of local disk utilization. Use with care.
Default is 0; no limit.
hdx_query_unlimited_cnf
hdx_query_unlimited_cnf
Disable limits on the number of clauses when converting the query to conjunctive normal form (CNF) by setting the value to 1
. See the ClickHouse documentation for more information. Note that disabling limits on CNFs will likely cause the query to be very slow and potentially use much more memory.
Default is 0.
hdx_query_optimize_order_by_primary
hdx_query_optimize_order_by_primary
Employ an optimization to return early when an ORDER BY ... LIMIT
query has collected sufficient results to satisfy the LIMIT
.
Suppress this optimization by setting this value to 0
. This will cause all partitions in the queried time range to be process completely for relevant queries. See also Latest N Rows with Row-Level Partition Pruning.
Default is 1.
Other flags
hdx_summary_override_indexes
hdx_summary_override_indexes
Use exclusively in the context of summary transform SQL. See Summary Tables.
hdx_query_max_memory_usage_perc
hdx_query_max_memory_usage_perc
This is a Hydrolix tunable name, not a query option. It's included here due to its similarity to a query option.
Use exclusively in the context of the merge system. See also Merge. This option is mutually exclusive with hdx_query_max_memory_usage
used for memory usage protection during query operations.
hdx_alter_reliable_update_enabled
hdx_alter_reliable_update_enabled
Use exclusively in the context of the alter table operations. See also Alter Table.
Updated about 9 hours ago