Skip to content

JSON Columns

This feature was introduced in Hydrolix version 6.0.

A JSON column stores semi-structured JSON data where the schema can vary between rows or isn't known at ingest time. Hydrolix flattens each unique path in the JSON at ingest and stores it as an independent column, so queries read only the paths they need and avoid loading unrelated fields.

Transform schema⚓︎

Define a JSON column by setting "type": "json" in the transform schema:

JSON Column Transform Schema
1
2
3
4
5
6
7
{
  "name": "json_col",
  "datatype": {
    "type": "json",
    "index": true
  }
}

Arrays and nested objects don't require additional configuration. Hydrolix detects the structure at ingest and stores each path.

Query a JSON column⚓︎

Access JSON fields using dot notation:

Access a JSON Field
SELECT json_col.name FROM my_table

For nested paths, continue chaining with dot notation:

Access a Nested Field
SELECT json_col.user.id FROM my_table

Arrays in JSON columns⚓︎

Access arrays inside a JSON column using standard ClickHouse bracket syntax with 1-based indexing.

Supported array types⚓︎

JSON columns support these array element types:

Type Description
Array(Int8), Array(Int32), Array(Int64) Signed integers
Array(Float64) Floating-point numbers
Array(String) Strings
Array(Map) Maps
Array(JSON) Arrays of JSON objects

Sample input data⚓︎

This example shows a JSON column containing an array of strings and an array of JSON objects:

Sample JSON Input
[
  {
    "json_col": {
      "tags": ["error", "timeout", "retry"],
      "events": [
        {"name": "login",  "user": "alice"},
        {"name": "logout", "user": "alice"}
      ]
    }
  }
]

Query examples⚓︎

Use 1-based indexing to access a specific element:

Access Element by Index
SELECT json_col.tags[1] FROM my_table  -- returns "error"

Use empty brackets to access a field across all elements of an array of JSON objects:

Access Field across All Elements
SELECT json_col.events[].name FROM my_table  -- returns Array(String): ["login", "logout"]

JSON column indexing⚓︎

This feature was introduced in Hydrolix version 5.10.3.

Hydrolix creates indexes at ingest for all fields in JSON columns, except floating-point values. No configuration is required.

At query time, Hydrolix uses these indexes for two patterns: homogeneous fields and simple arrays. A homogeneous field is a JSON sub-field whose values all have a single consistent type across the records in a partition. For example, data.age is always an integer, or data.country is always a string. A simple array is an array of primitive types such as Array(Int64) or Array(String).

Fields with mixed types in a single partition aren't used for index lookups, but Hydrolix still writes their indexes.

Mixed types and key changes

JSON columns support flexible schemas. If a field's type changes between partitions, each partition with a consistent type uses its index independently. If a field's type changes in a single partition, index lookups are skipped for that partition.

1
Adding or removing keys between records doesn't affect indexing. Each sub-column is indexed independently. Queries against `json_col.field` return `null` for rows where `field` isn't present.

To verify that a query uses JSON indexes, set hdx_query_debug=true. The indexes_used field lists JSON sub-columns by their full path, such as humans.name.

Exclude fields from JSON columns⚓︎

The json_skip and json_skip_regexp transform settings exclude matching fields from JSON columns entirely. Excluded fields aren't stored, indexed, or queryable.

Exclude Fields by Name
1
2
3
4
"settings": {
    "json_skip": ["trace_id", "request_body"],
    "output_columns": [...]
}
Exclude Fields by Regex Pattern
1
2
3
4
"settings": {
    "json_skip_regexp": ["^debug_.*", ".*_raw$"],
    "output_columns": [...]
}

Explicit index: true/false settings on individual output columns take precedence over automatic indexing.

Limitations⚓︎

  • Summary tables don't support JSON columns.
  • Map of JSON may cause column explosion (a large number of distinct column paths that degrades query performance) if map keys are unique across rows.
  • Hydrolix stores IP, UUID, and Datetime values inside JSON as strings. To use these values as their intended types in SQL queries, apply explicit type conversion functions.
  • Hydrolix stores UInt64 values above 9,223,372,036,854,775,807 as Float64, which may cause precision loss.
  • Accessing JSON fields in subqueries requires the hdx_allow_experimental_analyzer=1 query setting.