Advanced Query Options

Circuit breakers

Due to the nature of Hydrolix and the potential for queries to cover vast amounts of data, the following circuit breakers are provided to help ensure resources are consumed effectively.

The following options specify query limits, if the query goes above a specified limit it is cancelled automatically.

hdx_query_catalog_timeout_ms

Query OptionMin ValueMax ValueDefault Value
hdx_query_catalog_timeout_ms1n/aunlimited

The maximum amount of time, in milliseconds, a catalog query will run before timing out. If the setting hdx_query_max_execution_time is set, the value for that setting will supersede this one.

hdx_query_max_rows

Query OptionMin ValueMax ValueDefault Value
hdx_query_max_rows0n/aunlimited

Specifies the maximum number of rows that can be evaluated to answer a query

hdx_query_max_attempts

Query OptionMin ValueMax ValueDefault Value
hdx_query_max_attempts0n/aunlimited

Specifies the maximum number of failures that can occur

hdx_query_max_result_bytes

Query OptionMin ValueMax ValueDefault Value
hdx_query_max_result_bytes10000n/aunlimited

The maximum amount of bytes that can be stored on the Query Head before returning data.

hdx_query_max_result_rows

Query OptionMin ValueMax ValueDefault Value
hdx_query_max_result_rows0n/aunlimited

The maximum number of rows that can be stored as a result of a query on the Query Head before returning the response.

hdx_query_max_timerange_sec

Query OptionMin ValueMax ValueDefault Value
hdx_query_max_timerange_sec0n/aunlimited

Specify the maximum timerange allowed in the query, for example 86400 will limit the timerange to 1 day. If you don't have any timerange filtering in your query this options will not work as we calculate the difference between timestamp in the WHERE clause.

hdx_query_timerange_required

Query OptionPossible valuesDefault Value
hdx_query_timerange_requiredtrue / falsefalse

Boolean to allow a query to run if there's no time range specified in the WHERE clause. Due to the size of data that can be stored in Hydrolix this setting protects against a query forcing a scan of the whole data set.

hdx_query_max_partitions

Query OptionMin ValueMax ValueDefault Value
hdx_query_max_partitions1n/aunlimited

Number of partition a query is allowed to access. If that number is exceeded, the query will not be allowed to run as it'll read more partition than allowed.

hdx_query_max_execution_time

Query OptionMin ValueMax ValueDefault Value
max_execution_time0(Unbound)0

Number of seconds before a query is canceled. A value of 0 means that there is no limit. Any other value above 0 will cancel the query when the specified number of seconds is reached.

hdx_query_max_columns_to_read

Query OptionMin ValueMax ValueDefault Value
max_columns_to_read0(Unbound)0

Number of columns allowed in SELECT statement before a query is canceled. A value of 0 means that there is no limit. Any other value above 0 will cancel the query when the specified number of columns is reached.

hdx_query_max_memory_usage

Query OptionMin ValueMax ValueDefault Value
hdx_query_max_memory_usage0(Unbound)0

Max memory in bytes for a single query, if the query is exceeding the max_memory_usage it'll be cancel.
This settings is per query per query-peer and head.


Rate limiting

These options specify limits on the resources available to query processes.

hdx_query_max_peers

Query OptionMin ValueMax ValueDefault Value
hdx_query_max_peers1(Unbound)null (all peers)

By default, Hydrolix distributes query processing across all available query peers in order to maximize massively parallel processing. Setting this flag instructs the query head to instead use only a subset of available peers. If a number greater than the number of available peers is given, all available peers are used.

hdx_query_pool_name

Query OptionMin ValueMax ValueDefault Value
hdx_query_pool_namen/an/a"" (empty string)

A string with a pool name can be used to instruct Hydrolix where a query should run. Given a set of pools, using the name of a given pool will run the query only in peers belonging to the pool chosen. If the parameter is not set, the query will run in all available peers from all pools by default.

hdx_query_max_streams

Query OptionMin ValueMax ValueDefault Value
hdx_query_max_streams1Twice the count of available CPU coresnull (1 per core)

By default, each query peer will run one process per CPU core. To limit the number of processes a query might run, set a value here.

hdx_query_max_concurrent_partitions

Query OptionMin ValueMax ValueDefault Value
hdx_query_max_concurrent_partitions1n/a3

Hydrolix query processing generally requires each query process to extract data from many HDX partitions. This flag sets a limit on the number of partitions which a query peer reads from at the same time.

Note that this setting is applied per-process. For example, if four-core query peer runs four processes, and each of these opens up to 25 partitions, then each query peer may have as many as 100 partitions open at once.

Decreasing this setting from its default setting of 3 may slow down query performance. Increasing this setting beyond 25 risks excessive memory pressure on the peer. Tread carefully.

Other flags

hdx_summary_override_indexes

A summary table indexes all non-aggregate columns by default.

Add this line to the query to exclude columns from indexing:

hdx_summary_override_indexes = 'column_1,column_2,column_3'

In this example, the cab_type and trip_type columns are not indexed.

SELECT
  timestamp,
  trip_type,
  cab_type
FROM
  sample.taxi.trips
GROUP BY
  timestamp,
  trip_type,
  cab_type FORMAT HDX SETTINGS hdx_primary_key = 'timestamp',
  hdx_summary_override_indexes = 'trip_type,cab_type'

use_query_cache

Query OptionMin ValueMax ValueDefault Value
use_query_cachen/afalsetrue

Only useable in the SETTINGS clause of an SQL query, this marks the query as a candidate for using ClickHouse query cacheing. See "Use Query Caching" on the Query Efficiency page.

hdx_query_distributed_aggregation_memory_efficient

Query OptionMin ValueMax ValueDefault Value
hdx_query_distributed_aggregation_memory_efficient0n/aunlimited

Relates to https://github.com/ClickHouse/ClickHouse/pull/20599

hdx_query_max_bytes_before_external_group_by

Query OptionMin ValueMax ValueDefault Value
hdx_query_max_bytes_before_external_group_by0n/aunlimited

The maximum amount of bytes that can be used in memory before data is spilt to disk when applying group bys. These can be used to help protect out of memory errors, however disk will be utilized in replacement. Use with care.

hdx_query_max_bytes_before_external_sort

Query OptionMin ValueMax ValueDefault Value
hdx_query_max_bytes_before_external_sort0n/aunlimited

The maximum amount of bytes that can be used in memory before data is spilt to disk when applying sort bys. These can be used to help protect out of memory errors, however disk will be utilized in replacement. Use with care.

hdx_query_unlimited_cnf

Query OptionMin ValueMax ValueDefault Value
hdx_query_unlimited_cnf010

When set to 1, this disables limits on the number of clauses when converting the query to conjunctive normal form (CNF). See the ClickHouse documentation for more information. Note that disabling this cap on CNFs will likely cause the query to be very slow and potentially use much more memory.