Query Options

Hydrolix allows you to control the behaviour of the service and modify it at a query level

How to use query options

Parameters can be set via the 3 different mechanism:

  • Directly in SQL via SETTINGS
  • In the query parameters of the HTTP request.
  • Via HTTP Headers send to the query endpoint.

Parameters that are included in the SQL statement have priority over query string which itself have priority over parameters set in the headers.

Setting query options via SQL SETTINGS

You can add a special statement in your SQL query to specify Hydrolix parameters.
This statement is SETTINGS, it has to be included at the end of your query.
You can have multiple Hydrolix settings by separating each one by a ,.

SELECT COUNT() FROM
sample.cts
WHERE timestamp >= toDateTime(1636289714) 
AND timestamp <= toDateTime(1636376114) 
AND arrayJoin(data.leaf_cert.all_domains) LIKE '%hydrolix.live%' 
SETTINGS hdx_query_output_file_enabled='true', hdx_query_admin_comment='User: David Sztykman'

In this example we are writing the results of the query into S3 and we also provide a comment, that the user generating this query is David Sztykman.

🚧

Settings works only with SELECT

Using the settings in SQL works only for SELECT query, INSERT is not compatible with custom settings

Setting query options via query parameters

You can modify the behaviour of Hydrolix's query engine by attaching one or more optional parameters to your query API requests. These options simply take the form of additional HTTP parameters alongside the required query parameter.

Setting query options via HTTP headers

Additionally, you can set options in the HTTP headers via X-HDX-query-settings header. The header receives a comma-separated set of key=values. Note that you should not add a space after each comma separator.

The header, as required by HTTP protocol, can appear several times. If an option is repeated, the last value in the last header overrides the previous value.

Query example

In the example below, hdx_query_pool_name will be set to the value somepool and hdx_query_max_streamsto 12.

GET <YOUR-HYDROLIX-HOST>/?query=....&hdx_query_pool_name=somepool
X-HDX-query-settings: hdx_query_pool_name=mypool,hdx_query_max_streams=10
...
X-HDX-query-settings: hdx_query_max_streams=12

Available options

Output Formatting

While Hydrolix's query API returns results as JSON by default, it also supports every output format recognized by its underlying Clickhouse engine.

The specify the output format for a query, add a hdx_query_output_format parameter to your API request, setting that parameter's value to any of Clickhouse's supported output formats.

For example, to have the response to a query API GET request formatted as CSV:

https://YOUR-HYDROLIX-HOST.hydrolix.live/query/?query=YOUR-SQL-QUERY&hdx_query_output_format=CSV

Advanced options

The remainder of the API options described by this page set various fine-tuning attributes on how Hydrolix processes a given query.

In most cases, you won't need to change any of these settings from their default values. Hydrolix's query engine is already optimized to work with the resource-allocation and caching settings already represented by these defaults.

If you have any questions about improving your queries' performance, please contact Hydrolix support.

Circuit Breaker

These options specify limits allowed to query, if the query goes above those limits they aren't run and are canceled automatically.

hdx_query_max_timerange_sec

Query OptionMin ValueMax ValueDefault Value
hdx_query_max_timerange_sec0n/aunlimited

Specify the maximum timerange allowed in the query, for example 86400 will limit the timerange to 1 day. If you don't have any timerange filtering in your query this options will not work as we calculate the difference between timestamp in the WHERE clause.

hdx_query_max_partitions

Query OptionMin ValueMax ValueDefault Value
hdx_query_max_partitions1n/aunlimited

Number of partition a query is allowed to access. If that number is exceeded, the query will not be allowed to run as it'll read more partition than allowed.

hdx_query_max_timeout_sec

Query OptionMin ValueMax ValueDefault Value
hdx_query_max_timeout_sec0(Unbound)0

Number of seconds before a query is canceled. A value of 0 means that there is no limit. Any other value above 0 will cancel the query when the specified number of seconds is reached.

Rate Limiting

These options specify limits on the resources available to query processes.

hdx_query_max_peers

Query OptionMin ValueMax ValueDefault Value
hdx_query_max_peers1(Unbound)null (all peers)

By default, Hydrolix distributes query processing across all available query peers in order to maximize massively parallel processing. Setting this flag instructs the query head to instead use only a subset of available peers. If a number greater than the number of available peers is given, all available peers are used.

hdx_query_pool_name

Query OptionMin ValueMax ValueDefault Value
hdx_query_pool_namen/an/a"" (empty string)

A string with a pool name can be used to instruct Hydrolix where a query should run. Given a set of pools, using the name of a given pool will run the query only in peers belonging to the pool chosen. If the parameter is not set, the query will run in all available peers from all pools by default.

hdx_query_max_streams

Query OptionMin ValueMax ValueDefault Value
hdx_query_max_streams1Twice the count of available CPU coresnull (1 per core)

By default, each query peer will run one process per CPU core. To limit the number of processes a query might run, set a value here.

hdx_query_max_concurrent_partitions

Query OptionMin ValueMax ValueDefault Value
hdx_query_max_concurrent_partitions1n/a3

Hydrolix query processing generally requires each query process to extract data from many HDX partitions. This flag sets a limit on the number of partitions which a query peer reads from at the same time.

Note that this setting is applied per-process. For example, if four-core query peer runs four processes, and each of these opens up to 25 partitions, then each query peer may have as many as 100 partitions open at once.

Decreasing this setting from its default setting of 25 may slow down query performance. Increasing this setting beyond 25 risks excessive memory pressure on the peer. Tread carefully.

Other options

Other options that do not under the other categories.

hdx_query_output_file_enabled

Query OptionPossible valuesDefault Value
hdx_query_output_file_enabledtrue or falsefalse

Indicates whether you want to save a query result to a file on your cloud storage. The query is saved in the format instructed by hdx_query_output_format in a randomly generated filename.

hdx_query_output_filename

Query OptionPossible valuesDefault Value
hdx_query_output_filename""none

Indicates whether you want to save a query result to a file on your cloud storage. The query is saved in the format instructed by hdx_query_output_format in the filename specified, this will overwrite the file it's already present.

hdx_query_comment

Query OptionPossible valuesDefault Value
hdx_query_comment""none

Add a comment to the query which is stored in active query, this allows user to explain the query. For example if you have a query running every X min as part of a reporting tool you can include this information the comment of the query.

hdx_query_admin_comment

Query OptionPossible valuesDefault Value
hdx_query_admin_comment""none

Add an admin comment to the query which is stored in active query. This field is filled automatically by Superset or Grafana to include username information in order to track users activity.


Did this page help you?