Skip to content

Run SQL queries

The hdxsql command runs a user-authored ClickHouse SQL statement against a configured Hydrolix cluster and streams the result rows back into the Splunk pipeline.

Availability

hdxsql is available in Hydrolix Search for Splunk version 1.5.0 and later.

Use hdxsql when hdxsearch's option-based shape doesn't fit the query: JOINs, CTEs (WITH), UNION/INTERSECT/EXCEPT, window functions, system tables, EXPLAIN, or ad-hoc administrative queries such as SELECT version().

Unlike hdxsearch, hdxsql doesn't compose SQL from option slots and doesn't add the Splunk time window. You write the complete statement, and the connector validates it, merges its own safety and telemetry SETTINGS, optionally caps the result size, and sends it.

hdxsql runs only read-only, SELECT-shaped queries and rejects write and administrative statements before they reach the cluster.

Syntax⚓︎

hdxsql Syntax
1
2
3
4
5
6
7
| hdxsql query=<string>
         [cluster=<field>]
         [params=<json>]
         [maxrows=<int>]
         [timeout=<int>]
         [fetchsize=<int>]
         [comment=<string>]

Parameters⚓︎

Parameter Type Required Description
query string Yes A complete ClickHouse SELECT-shaped statement: SELECT, WITH … SELECT, a UNION/INTERSECT/EXCEPT of SELECTs, or an EXPLAIN of any of those.
cluster string No The name of the Hydrolix cluster to query. Defaults to the configured default cluster.
params JSON No A JSON object mapping parameter names to values, bound to ClickHouse parameter placeholders ({name:Type}) in query. Requires Hydrolix 6.0 and later. See Bind parameters.
maxrows integer No Overall row cap. Applied both as a hdx_query_max_result_rows setting and as a LIMIT clamp on the query. 0 (or omitted) means no cap.
timeout integer No Query execution time cap in seconds, bound to hdx_query_max_execution_time. 0 (or omitted) means no connector-applied timeout, so the Hydrolix-side default applies.
fetchsize integer No Server-side emit chunk size, bound to ClickHouse's max_block_size setting. Affects memory use and time-to-first-row. 0 (or omitted) leaves the ClickHouse default in place.
comment string No A human-readable comment forwarded as hdx_query_comment on the Hydrolix query. Useful for attributing dashboard or alert traffic in Hydrolix query telemetry.

Map results to Splunk fields⚓︎

hdxsql doesn't rename or remap result columns: each result column becomes a Splunk field with exactly the column name or alias from the SELECT. Splunk-recognized fields such as _time, host, source, sourcetype, _raw, and _indextime are populated only when you alias a column to that name.

For _time in particular, the value must be Unix-epoch seconds, typically produced with toUnixTimestamp(<column>) AS _time.

Alias Columns to Splunk Fields
1
2
3
4
| hdxsql query="SELECT toUnixTimestamp(reqTimeSec) AS _time,
                       reqHost AS host,
                       URL AS _raw
                FROM akamai.logs"

Without an alias, the result has a column-named field but no Splunk-recognized _time, and _raw is never populated by serializing the row.

Compose a time window⚓︎

hdxsql doesn't add the Splunk time picker's earliest/latest bounds. Compose the time window using one of two patterns.

SPL subsearch with addinfo⚓︎

Using addinfo to bind the time picker doesn't depend on query parameters, so this pattern works regardless of the Hydrolix version. The substitution happens at SPL's command-parsing layer, so by the time hdxsql runs, the query argument is a literal string.

addinfo exposes info_min_time and info_max_time as floating-point epoch seconds, so use toDateTime(...) (which accepts Float64) rather than fromUnixTimestamp(...) (which accepts only integers):

Bind the Time Picker with addinfo
1
2
3
4
5
6
7
8
| makeresults
| addinfo
| eval query="SELECT toUnixTimestamp(reqTimeSec) AS _time, reqHost
              FROM akamai.logs
              WHERE reqTimeSec >= toDateTime(" . info_min_time . ")
                AND reqTimeSec <  toDateTime(" . info_max_time . ")"
| return query
| hdxsql query=$query$

Parameter placeholders⚓︎

On Hydrolix version 6.0 and later, reference {earliest_time:Int64} / {latest_time:Int64} placeholders in the SQL and bind them through params. This is the preferred way to pass any user-supplied scalar value. See Bind parameters.

Bind parameters with params⚓︎

The params option accepts a JSON object that binds values to ClickHouse parameter placeholders ({name:Type}) referenced in the SQL. Values are bound at the request layer and are never interpolated into the SQL string, which keeps user-supplied values safe from injection.

Parameter Binding
| hdxsql query="SELECT count() FROM akamai.logs WHERE reqHost = {host:String}"
         params="{\"host\":\"docs.hydrolix.io\"}"

params requires Hydrolix 6.0 and later

On clusters reporting a server_version below 6.0.0, a query that supplies params exits with a user-visible error. Older clusters should use the addinfo SPL-subsearch pattern instead.

Malformed JSON, or JSON whose top-level value isn't an object, is rejected before any request is sent. Type mismatches between a value and its declared placeholder Type, and mismatches between supplied parameter names and the placeholders in the SQL, surface as Hydrolix-side errors.

Connector-controlled SETTINGS⚓︎

When hdxsql sends a query, it merges a connector-owned SETTINGS bundle into the SETTINGS clause, if you supplied one. Non-reserved keys in the clause (for example, use_query_cache) pass through unchanged:

User SETTINGS Pass Through
| hdxsql query="SELECT reqHost, bytes FROM akamai.logs WHERE reqTimeSec >= now() - INTERVAL 1 HOUR SETTINGS use_query_cache=false"
         maxrows=1000

The following keys are reserved. A user-supplied value for any of them is dropped before the connector's value is appended, regardless of whether the matching command option was supplied:

Reserved key Source
hdx_query_admin_comment Splunk-side attribution (always set)
readonly Connector-controlled read-only flag (always set)
hdx_query_max_memory_usage Connector-fixed at 2 GiB (always set)
hdx_query_max_attempts Connector-fixed at 1 (always set)
hdx_query_max_execution_time timeout option (only when supplied)
hdx_query_max_result_rows maxrows option (only when supplied)
max_block_size fetchsize option (only when supplied)

hdx_query_comment isn't reserved, because it carries user attribution rather than a safety guarantee. An inline SETTINGS hdx_query_comment value and the comment option each pass through, and when both are set they're joined into one entry (hdx_query_comment='X; Y').

Example queries⚓︎

Every query that scans a table must bound the primary timestamp column so Hydrolix can prune partitions. An unbounded query scans the entire table. These examples query the akamai.logs table and use a relative WHERE reqTimeSec >= now() - INTERVAL 1 HOUR bound for brevity; to bind a query to the Splunk time picker instead, use the addinfo or params patterns. The SELECT version() query is the exception. It reads cluster metadata and touches no table.

  • Report the percentage of requests served from cache and the average time to first byte for each request host.

    Cache Offload per Host
    1
    2
    3
    4
    5
    6
    7
    8
    9
    | hdxsql query="SELECT reqHost,
                           count() AS requests,
                           round(100 * countIf(cacheStatus = 1) / count(), 2) AS cached_pct,
                           round(avg(timeToFirstByte), 1) AS avg_ttfb_ms
                    FROM akamai.logs
                    WHERE reqTimeSec >= now() - INTERVAL 1 HOUR
                    GROUP BY reqHost
                    HAVING requests > 1000
                    ORDER BY requests DESC"
    
  • Build a per-minute request volume time series for charting by aliasing the time bucket to _time.

    Per-Minute Request Volume
    1
    2
    3
    4
    5
    6
    7
    | hdxsql query="SELECT toUnixTimestamp(toStartOfMinute(reqTimeSec)) AS _time,
                           reqHost,
                           count() AS event_count
                    FROM akamai.logs
                    WHERE reqTimeSec >= now() - INTERVAL 1 HOUR
                    GROUP BY reqHost, toStartOfMinute(reqTimeSec)
                    ORDER BY _time"
    
  • Rank the request paths that serve the most traffic in each host using a CTE and a window function.

    Top Request Paths by Traffic
    | hdxsql query="WITH path_traffic AS (
                        SELECT reqHost, reqPath, sum(bytes) AS total_bytes
                        FROM akamai.logs
                        WHERE reqTimeSec >= now() - INTERVAL 1 HOUR
                        GROUP BY reqHost, reqPath
                    )
                    SELECT reqHost, reqPath, total_bytes,
                           row_number() OVER (PARTITION BY reqHost ORDER BY total_bytes DESC) AS rank
                    FROM path_traffic
                    ORDER BY reqHost, rank"
    
  • Query a named cluster and cap the result at 1,000 rows. maxrows applies both a LIMIT clamp and the hdx_query_max_result_rows server-side cap. This requires a cluster named SecondCluster defined in the Hydrolix Search app.

    Named Cluster with Row Cap
    1
    2
    3
    4
    5
    6
    | hdxsql cluster="SecondCluster"
             query="SELECT reqHost, reqPath, bytes
                    FROM akamai.logs
                    WHERE reqTimeSec >= now() - INTERVAL 1 HOUR
                    ORDER BY bytes DESC"
             maxrows=1000
    
  • Check connectivity and report the cluster's Hydrolix version.

    Cluster Version
    | hdxsql query="SELECT version()"
    

See also⚓︎

  • Query - the hdxsearch command for SPL-shaped log exploration
  • Explore the data catalog - the hdxdescribe command for listing projects, tables, and column schemas