Query API HTTP Options

You can modify the behaviour of Hydrolix's query engine by attaching one or more optional parameters to your query API requests. These options simply take the form of additional HTTP parameters alongside the required query parameter.

Output Formatting

While Hydrolix's query API returns results as JSON by default, it also supports every output format recognized by its underlying Clickhouse engine.

The specify the output format for a query, add a query.output_format parameter to your API request, setting that parameter's value to any of Clickhouse's supported output formats.

For example, to have the response to a query API GET request formatted as CSV:

https://YOUR-HYDROLIX-HOST.hydrolix.live/query/?query=YOUR-SQL-QUERY&query.output=CSV

Advanced options

The remainder of the API options described by this page set various fine-tuning attributes on how Hydrolix processes a given query.

In most cases, you won't need to change any of these settings from their default values. Hydrolix's query engine is already optimized to work with the resource-allocation and caching settings already represented by these defaults.

If you have any questions about improving your queries' performance, please contact Hydrolix support.

Rate Limiting

These options specify limits on the resources available to query processes.

query.max_peers

Query OptionMin ValueMax ValueDefault Value
query.max_peers1Total count of available query peersnull (all peers)

By default, Hydrolix distributes query processing across all available query peers in order to maximize massively parallel processing. Setting this flag instructs the query head to instead use only a subset of available peers.

storage.max_streams

Query OptionMin ValueMax ValueDefault Value
storage.max_streams1Twice the count of available CPU coresnull (1 per core)

By default, each query peer will run one process per CPU core. To limit the number of processes a query might run, set a value here.

storage.max_concurrent_partitions

Query OptionMin ValueMax ValueDefault Value
storage.max_concurrent_partitions1n/a3

Hydrolix query processing generally requires each query process to extract data from many HDX partitions. This flag sets a limit on the number of partitions which a query peer reads from at the same time.

Note that this setting is applied per-process. For example, if four-core query peer runs four processes, and each of these opens up to 25 partitions, then each query peer may have as many as 100 partitions open at once.

Decreasing this setting from its default setting of 25 may slow down query performance. Increasing this setting beyond 25 risks excessive memory pressure on the peer. Tread carefully.

storage.max_concurrent_iops

Query OptionMin ValueMax ValueDefault Value
storage.max_concurrent_iops1n/a1

Depending on the number of columns required and predicates applied to a particular query, the processing of each HDX partition may generate many independent HTTP range requests from the query peer to cloud storage to extract data. This flag sets a limit on the number of in-flight downloads which may be open at the same time.

Please note that this setting is applied per-partition. Extending the previous example, if a query peer has 100 partitions open, and each partition is allowed to have 2 concurrent iops, a single box may have 200 simultaneous HTTP requests awaiting data. Tread carefully.

Caching behavior

These options adjust whether the query engine caches metadata while processing a query, and how it makes use of that cached information.

storage.dhash.strategy

Query OptionPossible valuesDefault Value
storage.dhash.strategyconsistent_hashing or round_robinconsistent_hashing

Sets the strategy that the query head uses when assigning partitions to
query peers. The default strategy consistent_hashing takes advantage
of cached manifests for (usually) faster results from peers. The
round_robin strategy instead assigns partitions to peers at an equal
distribution, disregarding cached manifests.

storage.fs.cache.enabled

Query OptionPossible valuesDefault Value
storage.fs.cache.enabledtrue or falsetrue

If true, then query peers cache the content of
partitions' manifest files after reading them for the sake
of the current query.

storage.fs.http.keep_alive

Query OptionPossible valuesDefault Value
storage.fs.http.keep_alivetrue or falsefalse

If true, then query peers set a keep-alive header in the
HTTP requests they make to the data store.


Did this page help you?