Run SQL queries

The hdxsql command runs a user-authored ClickHouse SQL statement against a configured Hydrolix cluster and streams the result rows back into the Splunk pipeline.

Availability

hdxsql is available in Hydrolix Search for Splunk version 1.5.0 and later.

Use hdxsql when hdxsearch's option-based shape doesn't fit the query: JOINs, CTEs (WITH), UNION/INTERSECT/EXCEPT, window functions, system tables, EXPLAIN, or ad-hoc administrative queries such as SELECT version().

Unlike hdxsearch, hdxsql doesn't compose SQL from option slots and doesn't add the Splunk time window. You write the complete statement, and the connector validates it, merges its own safety and telemetry SETTINGS, optionally caps the result size, and sends it.

hdxsql runs only read-only, SELECT-shaped queries and rejects write and administrative statements before they reach the cluster.

Syntax⚓︎

hdxsql Syntax
1 2 3 4 5 6 7	`\| hdxsql query=<string> [cluster=<field>] [params=<json>] [maxrows=<int>] [timeout=<int>] [fetchsize=<int>] [comment=<string>]`

Parameters⚓︎

Parameter	Type	Required	Description
`query`	string	Yes	A complete ClickHouse SELECT-shaped statement: `SELECT`, `WITH … SELECT`, a `UNION`/`INTERSECT`/`EXCEPT` of `SELECT`s, or an `EXPLAIN` of any of those.
`cluster`	string	No	The name of the Hydrolix cluster to query. Defaults to the configured default cluster.
`params`	JSON	No	A JSON object mapping parameter names to values, bound to ClickHouse parameter placeholders (`{name:Type}`) in `query`. Requires Hydrolix 6.0 and later. See Bind parameters.
`maxrows`	integer	No	Overall row cap. Applied both as a `hdx_query_max_result_rows` setting and as a `LIMIT` clamp on the query. `0` (or omitted) means no cap.
`timeout`	integer	No	Query execution time cap in seconds, bound to `hdx_query_max_execution_time`. `0` (or omitted) means no connector-applied timeout, so the Hydrolix-side default applies.
`fetchsize`	integer	No	Server-side emit chunk size, bound to ClickHouse's `max_block_size` setting. Affects memory use and time-to-first-row. `0` (or omitted) leaves the ClickHouse default in place.
`comment`	string	No	A human-readable comment forwarded as `hdx_query_comment` on the Hydrolix query. Useful for attributing dashboard or alert traffic in Hydrolix query telemetry.

Map results to Splunk fields⚓︎

hdxsql doesn't rename or remap result columns: each result column becomes a Splunk field with exactly the column name or alias from the SELECT. Splunk-recognized fields such as _time, host, source, sourcetype, _raw, and _indextime are populated only when you alias a column to that name.

For _time in particular, the value must be Unix-epoch seconds, typically produced with toUnixTimestamp(<column>) AS _time.

Alias Columns to Splunk Fields
1 2 3 4	`\| hdxsql query="SELECT toUnixTimestamp(reqTimeSec) AS _time, reqHost AS host, URL AS _raw FROM akamai.logs"`

Without an alias, the result has a column-named field but no Splunk-recognized _time, and _raw is never populated by serializing the row.

Compose a time window⚓︎

hdxsql doesn't add the Splunk time picker's earliest/latest bounds. Compose the time window using one of two patterns.

SPL subsearch with `addinfo`⚓︎

Using addinfo to bind the time picker doesn't depend on query parameters, so this pattern works regardless of the Hydrolix version. The substitution happens at SPL's command-parsing layer, so by the time hdxsql runs, the query argument is a literal string.

addinfo exposes info_min_time and info_max_time as floating-point epoch seconds, so use toDateTime(...) (which accepts Float64) rather than fromUnixTimestamp(...) (which accepts only integers):

Bind the Time Picker with addinfo

| makeresults
| addinfo
| eval query="SELECT toUnixTimestamp(reqTimeSec) AS _time, reqHost
              FROM akamai.logs
              WHERE reqTimeSec >= toDateTime(" . info_min_time . ")
                AND reqTimeSec <  toDateTime(" . info_max_time . ")"
| return query
| hdxsql query=$query$

Parameter placeholders⚓︎

On Hydrolix version 6.0 and later, reference {earliest_time:Int64} / {latest_time:Int64} placeholders in the SQL and bind them through params. This is the preferred way to pass any user-supplied scalar value. See Bind parameters.

Bind parameters with `params`⚓︎

The params option accepts a JSON object that binds values to ClickHouse parameter placeholders ({name:Type}) referenced in the SQL. Values are bound at the request layer and are never interpolated into the SQL string, which keeps user-supplied values safe from injection.

Parameter Binding
1 2	`\| hdxsql query="SELECT count() FROM akamai.logs WHERE reqHost = {host:String}" params="{\"host\":\"docs.hydrolix.io\"}"`

params requires Hydrolix 6.0 and later

On clusters reporting a server_version below 6.0.0, a query that supplies params exits with a user-visible error. Older clusters should use the addinfo SPL-subsearch pattern instead.

Malformed JSON, or JSON whose top-level value isn't an object, is rejected before any request is sent. Type mismatches between a value and its declared placeholder Type, and mismatches between supplied parameter names and the placeholders in the SQL, surface as Hydrolix-side errors.

Connector-controlled SETTINGS⚓︎

When hdxsql sends a query, it merges a connector-owned SETTINGS bundle into the SETTINGS clause, if you supplied one. Non-reserved keys in the clause (for example, use_query_cache) pass through unchanged:

User SETTINGS Pass Through
1 2	`\| hdxsql query="SELECT reqHost, bytes FROM akamai.logs WHERE reqTimeSec >= now() - INTERVAL 1 HOUR SETTINGS use_query_cache=false" maxrows=1000`

The following keys are reserved. A user-supplied value for any of them is dropped before the connector's value is appended, regardless of whether the matching command option was supplied:

Reserved key	Source
`hdx_query_admin_comment`	Splunk-side attribution (always set)
`readonly`	Connector-controlled read-only flag (always set)
`hdx_query_max_memory_usage`	Connector-fixed at 2 GiB (always set)
`hdx_query_max_attempts`	Connector-fixed at `1` (always set)
`hdx_query_max_execution_time`	`timeout` option (only when supplied)
`hdx_query_max_result_rows`	`maxrows` option (only when supplied)
`max_block_size`	`fetchsize` option (only when supplied)

hdx_query_comment isn't reserved, because it carries user attribution rather than a safety guarantee. An inline SETTINGS hdx_query_comment value and the comment option each pass through, and when both are set they're joined into one entry (hdx_query_comment='X; Y').

Example queries⚓︎

Every query that scans a table must bound the primary timestamp column so Hydrolix can prune partitions. An unbounded query scans the entire table. These examples query the akamai.logs table and use a relative WHERE reqTimeSec >= now() - INTERVAL 1 HOUR bound for brevity; to bind a query to the Splunk time picker instead, use the addinfo or params patterns. The SELECT version() query is the exception. It reads cluster metadata and touches no table.

Report the percentage of requests served from cache and the average time to first byte for each request host.

Cache Offload per Host

| hdxsql query="SELECT reqHost,
                       count() AS requests,
                       round(100 * countIf(cacheStatus = 1) / count(), 2) AS cached_pct,
                       round(avg(timeToFirstByte), 1) AS avg_ttfb_ms
                FROM akamai.logs
                WHERE reqTimeSec >= now() - INTERVAL 1 HOUR
                GROUP BY reqHost
                HAVING requests > 1000
                ORDER BY requests DESC"

Build a per-minute request volume time series for charting by aliasing the time bucket to _time.

Per-Minute Request Volume

| hdxsql query="SELECT toUnixTimestamp(toStartOfMinute(reqTimeSec)) AS _time,
                       reqHost,
                       count() AS event_count
                FROM akamai.logs
                WHERE reqTimeSec >= now() - INTERVAL 1 HOUR
                GROUP BY reqHost, toStartOfMinute(reqTimeSec)
                ORDER BY _time"

Rank the request paths that serve the most traffic in each host using a CTE and a window function.

Top Request Paths by Traffic

| hdxsql query="WITH path_traffic AS (
                    SELECT reqHost, reqPath, sum(bytes) AS total_bytes
                    FROM akamai.logs
                    WHERE reqTimeSec >= now() - INTERVAL 1 HOUR
                    GROUP BY reqHost, reqPath
                )
                SELECT reqHost, reqPath, total_bytes,
                       row_number() OVER (PARTITION BY reqHost ORDER BY total_bytes DESC) AS rank
                FROM path_traffic
                ORDER BY reqHost, rank"

Query a named cluster and cap the result at 1,000 rows. maxrows applies both a LIMIT clamp and the hdx_query_max_result_rows server-side cap. This requires a cluster named SecondCluster defined in the Hydrolix Search app.

Named Cluster with Row Cap
1 2 3 4 5 6	`\| hdxsql cluster="SecondCluster" query="SELECT reqHost, reqPath, bytes FROM akamai.logs WHERE reqTimeSec >= now() - INTERVAL 1 HOUR ORDER BY bytes DESC" maxrows=1000`

Check connectivity and report the cluster's Hydrolix version.
Cluster Version
1
| hdxsql query="SELECT version()"

Query - the hdxsearch command for SPL-shaped log exploration
Explore the data catalog - the hdxdescribe command for listing projects, tables, and column schemas