Skip to content

Splunk: Hydrolix Search for Splunk

Overview⚓︎

Use Hydrolix as a back-end datastore for your existing Splunk tables to take advantage of low-latency queries, long-term retention, and cost savings.

Hydrolix Search for Splunk can query raw data tables and summary tables for quick charting. It does this through a new hdxsearch command for Splunk SPL, which has the following features:

  • With minimal configuration, it queries your Hydrolix clusters.
  • It automatically finds the primary timestamp for the specified table.
  • It applies time range filtering from the Splunk UI.
  • It applies a configurable, removable default row limit to protect query infrastructure.

For simplicity, the hdxsearch command offers a simple list of fields for SELECT statements. This limitation doesn't apply to WHERE clauses. If you need more flexibility in the SELECT portion of your queries, see our Splunk with DB Connect method which gives you full ClickHouse SQL capabilities.

Hydrolix Search for Splunk also includes the hdxdescribe command, which lists available projects, tables, and column schemas from within Splunk.

Required Splunk permissions⚓︎

Query permissions⚓︎

To run queries using Hydrolix Search, the Splunk user must have the search and list_storage_passwords capabilities.

Setup permissions⚓︎

To configure Hydrolix Search, the Splunk user needs access to the setup page. Use either of these options to grant the required Splunk permissions.

Option 1: Admin role⚓︎

A user with the admin or sc_admin role already has the permissions required to use the setup page. No additional configuration is needed.

Option 2: Custom role⚓︎

For more fine-grained access control:

  1. Create a Splunk role (here called hdxsearch_write) with the following permissions:

    • list_storage_passwords
    • edit_storage_passwords
    • admin_all_objects
  2. Add the following stanza to the metadata/local.meta file:

    metadata/local.meta
    [passwords/credential%3Ahdxsearch_realm%3Apassword%3A]
    access = read : [ * ], write : [ admin, hdxsearch_write, power, sc_admin ]
    

    This stanza grants the admin, hdxsearch_write, power, and sc_admin roles write access to Hydrolix credential storage. admin, sc_admin, and power are pre-defined Splunk roles; hdxsearch_write is the custom role created in step 1. Any role not listed under write can't use the setup page to add or update Hydrolix credentials.

A user assigned the hdxsearch_write role can then use the setup page to add Hydrolix cluster credentials.

Installation⚓︎

In the Splunk Enterprise UI, inside the Apps menu, select Find More Apps.

Menu item Find More Apps under Splunk Enterprise Apps menu

Type Hydrolix into the search box and select the Hydrolix Search application from the results on the right.

Result of a search for the Hydrolix Search application

Click the Install button. The installation process may require logging in to Splunkbase with your Splunk username and password.

Configuration⚓︎

Cluster Credentials⚓︎

On installation, you will be directed to the Hydrolix Search setup form.

Select an authentication method for the Hydrolix cluster, selecting either basic authentication (username and password) or a service account token. For information on creating a service account and associated tokens, see Manage Service Accounts.

This example shows two cluster configurations, one for each authentication method.

Two cluster configurations, one using typical credentials, the other an auth token

Fill in the configuration fields:

Field Name Description Example
Cluster Name The name used to refer to this cluster in your Splunk Search Processing Language (SPL) queries cluster-1
Host:Port The hostname (and optional port number) of your Hydrolix cluster ${HDX_HOSTNAME}.hydrolix.live:443
Username The username you've chosen to query your Hydrolix cluster user@domain.tld
Password The password for the above user sdjf^wer%!k
API Token The token used for authorization to a Hydrolix cluster which is associated with a service account. See authorization tokens for more information. eyJhbGci...
Default result count limit The maximum number of records to retrieve per query (unless overridden by the query) 5000

Splunk Cloud: Use port 443 or don't specify a port

The default network rules for Splunk Cloud prohibit outbound TCP connections on port 8088. Configure port 443 or omit a port entirely. Configuring port 8088 may result in 404 responses in the Splunk console when querying the Hydrolix cluster.

Select a default cluster to run queries against by clicking the MAKE DEFAULT CLUSTER bubble on the right-hand side of the configuration line for that cluster.

Multiple Clusters⚓︎

If you will be using more than one cluster or user account from this Splunk instance, add them to the list with the OR ADD CLUSTER option. Clicking the plus sign will open up a new row of configuration.

Save the Configuration⚓︎

Once the configuration is done, select Save Changes to apply changes. Your new cluster settings will replace any previously-saved settings, and you will be automatically directed to the query screen of the Hydrolix Search for Splunk application.

Saving changes will overwrite all settings

Proxy configuration⚓︎

If your Splunk instance connects to Hydrolix clusters through an HTTP proxy, configure per-cluster proxy settings by editing $SPLUNK_HOME/etc/apps/hdxsearch/local/hdxsearch.conf. Create the file if it doesn't exist.

Add a [proxies] stanza with a proxies key containing a JSON array. Each entry in the array maps one cluster to one proxy:

local/hdxsearch.conf
[proxies]
proxies = [{"cluster": "my-cluster", "protocol": "http", "host": "proxy.example.com", "port": 8080}]

Each proxy entry supports the following fields:

Field Required Description
cluster Yes The cluster name as configured on the setup page. Must match exactly.
protocol Yes The proxy protocol: http or https.
host Yes The hostname or IP address of the proxy.
port Yes The port number of the proxy.
user No Username for proxy authentication.
password No Password for proxy authentication.

To configure proxy authentication, include the user and password fields:

local/hdxsearch.conf
[proxies]
proxies = [{"cluster": "my-cluster", "protocol": "http", "host": "proxy.example.com", "port": 8080, "user": "proxyuser", "password": "proxypass"}]

To route different clusters through different proxies, add multiple entries to the array:

local/hdxsearch.conf
[proxies]
proxies = [{"cluster": "cluster-1", "protocol": "http", "host": "proxy-us.example.com", "port": 8080}, {"cluster": "cluster-2", "protocol": "http", "host": "proxy-eu.example.com", "port": 8080}]

Clusters with no matching entry in the proxies array connect directly without a proxy.

Restart Splunk after saving changes to hdxsearch.conf for the configuration to take effect.

Reconfigure an existing cluster⚓︎

To reconfigure Hydrolix Search for Splunk, inside the Apps menu, select Manage Apps. Type Hydrolix into the search box and under Actions, click Set up. This will present the same cluster configuration screen as on initial installation, populated with the currently-configured clusters.

Explore the data catalog⚓︎

The hdxdescribe command lets you explore the Hydrolix data catalog from within Splunk. Use it to list the available projects and tables in a Hydrolix cluster, or to inspect the schema of a specific table before writing queries.

Syntax⚓︎

hdxdescribe Syntax
| hdxdescribe [cluster="<cluster_name>"] [table="<project_name>.<table_name>"] [project="<project_name>"]

Parameters⚓︎

Parameter Required Description
cluster No The name of the Hydrolix cluster to query. Defaults to the configured default cluster.
table No The fully-qualified table name in project.table format. Returns the schema for that table. Mutually exclusive with project.
project No A Hydrolix project name. Lists all tables within that project. Mutually exclusive with table.

Note

table and project can't be used together in the same command.

Return values⚓︎

Without a table argument, hdxdescribe returns one row per project. When project is specified, the result is scoped to that project but the schema is the same:

Field Description
project The name of the Hydrolix project.
tables The list of table names within that project, returned as a multivalue.

With a table argument, hdxdescribe returns one row per column:

Field Description
column_name The name of the column.
column_type The ClickHouse data type of the column (for example, DateTime, String, UInt64).

Examples⚓︎

  • List all projects and tables on the default cluster:

    List All Projects and Tables
    | hdxdescribe
    
  • List all tables in a specific project:

    List Tables in a Project
    | hdxdescribe project="hydro"
    
  • List all tables in a project on a named cluster:

    List Tables in a Project on a Named Cluster
    | hdxdescribe cluster="cluster-1" project="hydro"
    
  • Inspect the schema of a table:

    Inspect Table Schema
    | hdxdescribe table="hydro.logs"
    

Query⚓︎

The following are some example queries along with parameters and query settings which can be used to customize Splunk queries to Hydrolix.

Quickstart Sample Query⚓︎

Here's an example of the query screen showing a query and results.

Search results for the table hydro.logs using the Hydrolix Search for Splunk application

| hdxsearch table="hydro.logs" fields="app,rows_read,bytes_read,source_type" where="app='query-head'"

Which generates the following SQL query for a cluster using the default LIMIT clause:

1
2
3
4
5
6
SELECT app, rows_read, bytes_read, source_type
FROM hydro.logs
WHERE timestamp > toUnixTimestamp($SPLUNK_MIN_TIME)
AND timestamp < toUnixTimestamp($SPLUNK_MAX_TIME)
AND app='query-head'
LIMIT 5000

The following is a simple example. Replace my_project.my_table with the Hydrolix project and table of your own choosing:

| hdxsearch table="my_project.my_table" fields="*"

Though the results of this query can be large depending on how much data is in the cluster, the time picker in the upper right-hand corner of the query interface provides a time-based bound on results, and the LIMIT clause caps the number of rows returned to protect query infrastructure.

Note that Hydrolix Search for Splunk doesn't support Splunk's real-time UI, so the time picker only provides relative options.

Query Parameters⚓︎

As well as the required table and fields parameters, you can specify a WHERE clause, adjust the row limit, and adjust other settings as parameters to the hdxsearch command:

Parameter Name Type Required Description
table string (fieldname) Yes The Hydrolix table to query in the form project.table.
fields list of strings Conditional A comma-delimited list of fields to retrieve from the table, or *, which returns all the fields. Either fields or raw must be specified. * isn't supported for summary table queries; specify fields explicitly.
raw string (fieldname) Conditional The name of a field whose raw value should be sent to the "Event" column of the SPL query output. Either fields or raw must be specified.
where string No A SQL WHERE statement to filter the results of the query. Defaults to no filter.
time string (fieldname) No The name of a field in table to treat as the event timestamp. Defaults to the primary key of the table.
limit integer No Maximum number of rows to retrieve from the table or 0 to retrieve all rows. Defaults to the limit value configured for the cluster being queried.
cluster string (fieldname) No The name of the Hydrolix cluster to query. Defaults to the configured default cluster.
nocache boolean No If set to true, query results will be excluded from caching. Defaults to false to take advantage of caching by using Hydrolix query caching.

Performance Note: Limiting Fields

Because Hydrolix is a columnar data store, the number of fields returned by the query should be limited to accelerate execution and reduce compute resources. Instead of using wildcards in the fields parameter, specify only the required columns.

Timestamp requirement

Queries using hdxsearch must return a timestamp column.

Example queries⚓︎

  • Return all fields from my_project.my_table, bounded by the Splunk UI's time picker and the default row limit of 5,000

      | hdxsearch table="my_project.my_table" fields="*"
    
  • Return the reqHost and reqMethod columns from my_project.my_table.

      | hdxsearch table="my_project.my_table" fields="reqHost, reqMethod"
    
  • Bypass the 5,000-row limit and return rows where the reqHost field is my.hostname.com and the reqMethod is POST. The contents of the where parameter are passed along to Hydrolix in an SQL WHERE clause.

      | hdxsearch table="my_project.my_table" fields="reqHost, reqMethod" limit=0 where="reqHost IN ('my.hostname.com') AND reqMethod='POST'"
    
  • Aggregates aren't supported by the simple SELECT statements available, so we depend on Splunk's SPL to count the number of rows. Use limit=0 to request all data be included in the aggregation.

      | hdxsearch table="my_project.my_table" fields="reqHost, reqMethod" limit=0
      | stats count by reqHost
    
  • Output the raw value of the UA field into the Event column of the SQL query result.

      | hdxsearch table="my_project.my_table" fields="reqHost, reqMethod, UA" raw="UA"
    
  • Run a query on a named cluster. This requires a defined cluster in the Hydrolix Search App with the name SecondCluster.

      | hdxsearch table="my_project.my_table" fields="reqHost, reqMethod" cluster="SecondCluster"
    

Summary table queries⚓︎

Hydrolix Search for Splunk can query summary tables in addition to raw data tables. The SPL interface works the same way as for raw tables, but the SQL translation differs: Hydrolix Search automatically infers which fields are dimensions and which are metrics, then generates GROUP BY and HAVING clauses as needed.

One restriction applies: the fields="*" wildcard isn't supported for summary table queries. You must specify fields explicitly.

Example query⚓︎

Summary Table Query
| hdxsearch table="hydro.requests_summary" fields="app, total_requests, total_bytes" where="app='query-head' AND total_bytes > 500000"

This produces SQL equivalent to:

SQL Equivalent
1
2
3
4
5
6
7
SELECT app, total_requests, total_bytes
FROM hydro.requests_summary
WHERE timestamp BETWEEN ...
  AND app = 'query-head'
GROUP BY app
HAVING total_bytes > 500000
LIMIT 5000

Default query settings⚓︎

Query settings in Hydrolix clusters implement configurable limits to protect cluster resources and the applications receiving results.

Query options⚓︎

Hydrolix Search for Splunk uses the following default value for query caching with each query.

Query option Value
use_query_cache true

You can use the SQL SETTINGS clause when executing queries to disable this default query option or to use additional options.

LIMIT clause⚓︎

Hydrolix Search for Splunk attaches a LIMIT 5000 clause to each query by default. This limit exists to protect query infrastructure from unbounded queries and can be adjusted when setting up the Splunk configuration by setting the Default result count limit, or overridden per query using the limit parameter.

Troubleshoot⚓︎

"Invalid Argument" messages when making queries⚓︎

  • This usually means that your table name or field name don't exist. Verify your table and field names.

Splunk doesn't respect the default cluster's configuration⚓︎

  • If there is a non-default cluster named default, upgrade to Hydrolix Search for Splunk v1.0.6 or above, or rename the non-default cluster

Local Splunk UI limits results to 1,000⚓︎

  • When running Splunk locally, the UI limits results to 1,000 with subsequent pages being blank. To increase the number of results in the UI, do the following:

    • Navigate to or create the file $SPLUNK_HOME/etc/system/local/limits.conf. To increase the limit to 100,000, add the following:
    [search]
    max_events_per_bucket = 100000
    

    See Splunk's limits.conf for more information on this setting.