Options Reference
Query options provide control over the Hydrolix query system. They control output format, pool selection, query pooling and concurrency, circuit breakers, and observability features. They're available over the many Query Interfaces.
Categories⚓︎
The table and detailed listing group options by purpose:
- Format control options enable writing to cloud storage, specifying filenames, and selecting a different output format
- Observability and metrics options allow system analysts to track query volume and usage across applications and users
- Caching control options activate caching and set lifetimes on cached query results
- Circuit breakers prevent exhaustion of cluster query resources
- Query pooling and concurrency influence selection of query pools and query peer behavior
- Advanced circuit breakers offer control over query planning, query result merging, and latest-N ordering in the Hydrolix query subsystem
Table of Query Options⚓︎
Table of query option types, defaults, support, and links to detailed descriptions. Query options precedence describes how to set query options.
Notes:
- Numeric types indicated as
unlimitedcan hold arbitrarily large numeric values. Numeric range support should cover all uses for this system.
Detailed descriptions⚓︎
Format control⚓︎
hdx_query_output_format⚓︎
Select any of the Formats for input and output data known to the underlying ClickHouse engine.
The Hydrolix HTTP query API defaults to plain Character Separated Values, specifically TSV.
The Hydrolix UI defaults to JSON and allows easy download in CSVWithNames format.
For example, use a GET query to request CSV format with column names (CSVWithNames):
https://{myhost}.hydrolix.live/query/?query=SQL-QUERY&hdx_query_output_format=CSVWithNames
Interaction with SQL FORMAT clause⚓︎
If supported by the client's SQL dialect and present in a query, the FORMAT clause takes precedence over hdx_query_output_format.
Interaction with X-ClickHouse-Format⚓︎
The Hydrolix query API also respects the X-ClickHouse-Format header. If present, X-ClickHouse-Format will take precedence over organization, project, and table format settings as well as hdx_query_output_format when sent as an HTTP header.
A query parameter or SQL SETTINGS clause with hdx_query_output_format has a higher precedence.
hdx_query_output_file_enabled⚓︎
Indicates whether you want to save a query result to a file on your cloud storage. The query is saved using a randomly generated filename in the format instructed by hdx_query_output_format.
Default is false.
Set to true to send query output to primary storage bucket. The client receives a small response:
{ "file_path": "bucket-name/filename", "type": "output_file" }
hdx_query_output_filename⚓︎
Indicates whether you want to save a query result to a file on your cloud storage. The query output is saved using the designated filename in the format indicated by hdx_query_output_format. This will overwrite an existing file, if present.
Query output is sent to a storage location and the client receives a small response.
{ "file_path": "bucket-name/filename", "type": "output_file" }
Default is empty string. Results are streamed to query client, not saved to cloud storage.
Observability and metrics⚓︎
hdx_query_comment⚓︎
Add a comment to the query visible in Active Queries. This allows a user to provide explanation for the query. For example, if you have a query running every X min as part of a reporting tool you can include this information in the comment of the query.
Default is empty string.
hdx_query_admin_comment⚓︎
Add an admin comment to the query which is stored in Active Queries. This field can be filled automatically by Superset or Grafana to include username information in order to track user activity.
Default is empty string.
hdx_query_debug⚓︎
Request inclusion of X-HDX-Query-Stats in the response headers from the HTTP query API. See Query Performance Debugging.
Default is false.
hdx_query_label⚓︎
Attach the supplied string as a query_label for query-relevant metrics emitted to Prometheus. This query option is only available on SQL SETTINGS.
This is a convenient way to collect metrics on queries emitted by specific applications. For example, setting query_label = "grafana-nrt" would result in Prometheus metrics like net_http_status_code_bucket{ ... query_label="grafana-nrt" ... }.
Consider Prometheus label cardinality concerns when constructing dynamic patterns for this query option.
Default is empty string, in which case the Prometheus label query_label isn't present on the relevant metrics.
hdx_log_query⚓︎
When set to false, suppress this query from the query log. Only usable in the SETTINGS clause of an SQL query. Metrics are always collected.
Default is true.
Caching control⚓︎
use_query_cache⚓︎
Only usable in the SETTINGS clause of an SQL query, this marks the query as a candidate for ClickHouse query caching. See Query caching page.
Default is false.
query_cache_ttl⚓︎
Default time to live is 60 seconds.
Circuit breakers⚓︎
hdx_query_catalog_timeout_ms⚓︎
The duration in milliseconds before a catalog query is canceled. If set, hdx_query_max_execution_time supersedes this setting.
Default value is 0 which sets no limit.
Sample error: CatalogError Failed to submit transaction: ERROR: canceling statement due to statement timeout
hdx_query_max_rows⚓︎
Specifies the maximum number of rows that can be evaluated to answer a query. The query is canceled if it exceeds this number of rows.
Default value is 0 which sets no limit.
Sample error: HdxStorageError Maximum number of rows exceeded for query: 335427 rows (maximum is 200000)
hdx_query_max_attempts⚓︎
Specify a number of failures of a query peer that can occur (for any reason). The query head cancels the query if an additional attempt exceeds this number.
Default value is 3.
hdx_query_max_result_bytes⚓︎
Limit the number of bytes that can be stored on the query head before returning data. The query is canceled if the response byte count exceeds this value.
The Config API enforces a minimum of 10000 for this query option on organization, project, and table.
Users may set this circuit breaker lower using either the Query API or SQL SETTINGS for responses that are expected to be very small.
Default value is 0 which sets no limit.
Sample error: Code: 396. DB::Exception: Limit for result exceeded, max bytes: 8.00 KiB, current bytes: 8.72 MiB. (TOO_MANY_ROWS_OR_BYTES)
hdx_query_max_result_rows⚓︎
Limit the number of rows that can be stored on the query head before returning the response. The query is canceled if the resulting row count exceeds this number.
Default value is 0 which sets no limit.
Sample error: Code: 396. DB::Exception: Limit for result exceeded, max rows: 20.00 thousand, current rows: 21.56 thousand. (TOO_MANY_ROWS_OR_BYTES)
hdx_query_max_timerange_sec⚓︎
Limit the total timerange allowed for a query. Calculates a query's covered time range from the WHERE clause filter on the primary column and cancels if the time range in seconds exceeds this number.
For example, setting this value to 86400 prevents operating on a time window larger than one day.
To prevent queries lacking WHERE clauses (or any filters) on the primary column, you must use the hdx_query_timerange_required query option.
Default value is 0 which sets no limit.
Sample error: HdxStorageError Maximum time range exceeded for query: 601786 seconds (maximum is 86400)
hdx_query_timerange_required⚓︎
Set this parameter to true to ensure that queries include a WHERE clause on the primary column. Any query lacking a WHERE clause or filter on the primary column is canceled.
Requiring a time range for queries prevents accidental scanning of all data for all time.
Default is false.
Sample error: HdxStorageError hdx_query_timerange_required is set to true. Your query needs a time range filter in a WHERE clause
hdx_query_max_partitions⚓︎
Limit the number of partitions the query can read. The query is canceled if the total number of partitions required to execute the query exceeds this number.
Default value is 0 which sets no limit.
Sample error: HdxStorageError Maximum number of partitions exceeded for query: 168 partitions (maximum is 10)
hdx_query_max_execution_time⚓︎
Limit the total runtime of a query. The query is canceled after the specified number of seconds have passed since query execution began.
Default value is 0 which sets no limit.
Sample error: Code: 159. DB::Exception: Timeout exceeded: elapsed 2.179121784 seconds, maximum: 2. (TIMEOUT_EXCEEDED)
hdx_query_max_columns_to_read⚓︎
Limit the number of columns allowed in a SELECT statement. The query is canceled if the total number of columns required to complete the query exceeds this number.
Columnar data stores are built to optimize for retrieval of specific columns. See also Query Efficiency on SELECT *.
Default value is 0 which sets no limit. Use this setting if you intend to SELECT *.
Sample error: Code: 161. DB::Exception: Limit for number of columns to read exceeded. Requested: 246, maximum: 100. (TOO_MANY_COLUMNS)
hdx_query_max_memory_usage⚓︎
Limit total memory in bytes allowed for a query. The query is canceled if memory usage on any query peers or the query head exceeds this value.
Default value is 0 which sets no limit.
Sample error: Code: 241. DB::Exception: Memory limit (for query) exceeded: would use 65.18 MiB (attempt to allocate chunk of 33554526 bytes), maximum: 50.00 MiB.
This option can't be used with hdx_query_max_memory_usage_perc.
hdx_query_max_memory_usage_perc⚓︎
Limit the percentage of memory allowed for handling a query. Queries exceeding this percentage of memory on any query-head or query-peer container will be canceled.
When set, the following computation is used to derive a byte count limit before query processing begins
hdx_query_max_memory_usage = hdx_query_max_memory_usage_perc * CONTAINER_MEMORY_LIMIT_BYTES
Since hdx_query_max_memory_usage is derived from this query option, they can't be used together.
Default value is 0 which sets no limit.
To limit memory in Ingest and Merge operations, an identically named Hydrolix tunable name hdx_query_max_memory_usage_perc serves the same purpose. For Query operations, use the query option detailed here.
Query pooling and concurrency⚓︎
These options specify limits on the resources available to query processes.
hdx_query_max_peers⚓︎
Instruct the query head to use only a subset of available peers for the query. If the number is greater than the number of available peers, all available peers are used.
The effect of this setting may influence total query runtime by restricting work to a subset of query peers. For a systematic approach to separating query resources, see also Query pools.
Default is 0. The query head distributes work across all available peers to maximize parallel processing.
hdx_query_pool_name⚓︎
Send the query to a specific query pool. This option is only useful when multiple query pools are available, and the named query pool must exist already. See also Query pools.
Sample error: ClusterError Pool name unknown_pool does not exist
Default is empty string. The default query pool uses all available peers.
hdx_query_max_streams⚓︎
Limit the number of partition-reading threads on a query peer.
The range of acceptable values is 0 to the number of cores available, up to the container limit on CPU. See also Scale profiles.
When the value exceeds the number of cores, it is clamped to that number.
When 0, the value will be set to the number of cores.
See also interaction with hdx_query_max_concurrent_partitions.
Default value is 0.
hdx_query_max_concurrent_partitions⚓︎
Used by query peers only.
Each query peer spawns threads up to hdx_query_max_streams. Each thread opens up to hdx_query_max_concurrent_partitions.
Default is 3.
Example calculation
This demonstrates a calculation of the effects of this query option with hdx_query_max_streams using default values.
- hdx_query_max_concurrent_partitions = 3
- hdx_query_max_streams = 0
- no CPU limits set on query peer container or pod
- a node with 8 physical cores and Hyper-Threading provides 16 CPUs to a query peer running there.
16 cores * 3 concurrent partitions = 48.
A query peer running with this configuration can process up to 48 partitions concurrently.
Decreasing this value constrains CPU demand at the potential cost of query speed. Increasing this value might increase query responsiveness in clusters with sufficient query peers, bandwidth to storage, and available CPU.
Hydrolix query depends on available network bandwidth and CPU when extracting data from many HDX partitions in parallel. These query options provide controls for balancing memory and CPU usage to optimize for query response time.
Setting the value above 25 risks excessive memory pressure on the query peer. Tread carefully.
Advanced circuit breakers⚓︎
hdx_query_distributed_aggregation_memory_efficient⚓︎
Enable memory-saving mode of distributed aggregation.
Default is 0.
hdx_query_max_bytes_before_external_group_by⚓︎
Limit container memory byte count for GROUP BY operations. After exceeding this byte count in GROUP BY, write intermediate data to disk. Use this to cap demand for memory in aggregation operations, at the cost of local disk utilization.
Default is 0. The query-head uses only memory for group by operations.
This option can't be used with hdx_query_max_perc_before_external_group_by.
hdx_query_max_perc_before_external_group_by⚓︎
Limit percentage of container memory for GROUP BY before using disk to write intermediate data. Use this control demand for memory at the cost of local disk utilization.
Default is 0. The query-head uses only memory for group by operations.
When set, the following computation is used to derive a byte count limit before query processing begins
hdx_query_max_bytes_before_external_group_by = \ hdx_query_max_perc_before_external_group_by * CONTAINER_MEMORY_LIMIT_BYTES
Since hdx_query_max_bytes_before_external_group_by is derived from this query option, they can't be used together.
To control memory in other data lifecycle operations, like Merge, the identically named Hydrolix tunable name hdx_query_max_bytes_before_external_group_by is available. For Query operations, use the query option detailed here.
hdx_query_max_bytes_before_external_sort⚓︎
Limit memory in bytes used when applying SORT before spilling to disk in the query head process. This can help protect against out-of-memory errors at the cost of local disk utilization. Use with care.
Default is 0. The query-head uses only memory for sorting.
hdx_query_unlimited_cnf⚓︎
Disable limits on the number of clauses when converting the query to conjunctive normal form (CNF) by setting the value to 1. See the ClickHouse documentation for more information. Note that disabling limits on CNFs will likely cause the query to be very slow and potentially use much more memory.
Default is 0. The feature isn't enabled.
hdx_query_optimize_order_by_primary⚓︎
Employ an optimization to return early when an ORDER BY ... LIMIT query has collected sufficient results to satisfy the LIMIT.
Suppress this optimization by setting this value to 0. This will cause all partitions in the queried time range to be process completely for relevant queries. See also Latest N Rows with Row-Level Partition Pruning.
Default is 1. The feature is enabled.
Other flags⚓︎
hdx_summary_override_indexes⚓︎
Use exclusively in the context of summary transform SQL. See Summary Tables.
hdx_alter_reliable_update_enabled⚓︎
Use exclusively in the context of the alter table operations. See also Alter Table.