JSON Columns
This feature was introduced in Hydrolix version 6.0.
A JSON column stores semi-structured JSON data where the schema can vary between rows or isn't known at ingest time. Hydrolix flattens each unique path in the JSON at ingest and stores it as an independent column, so queries read only the paths they need and avoid loading unrelated fields.
Transform schema⚓︎
Define a JSON column by setting "type": "json" in the transform schema:
| JSON Column Transform Schema | |
|---|---|
Arrays and nested objects don't require additional configuration. Hydrolix detects the structure at ingest and stores each path.
Query a JSON column⚓︎
Access JSON fields using dot notation:
| Access a JSON Field | |
|---|---|
For nested paths, continue chaining with dot notation:
| Access a Nested Field | |
|---|---|
Arrays in JSON columns⚓︎
Access arrays inside a JSON column using standard ClickHouse bracket syntax with 1-based indexing.
Supported array types⚓︎
JSON columns support these array element types:
| Type | Description |
|---|---|
Array(Int8), Array(Int32), Array(Int64) |
Signed integers |
Array(Float64) |
Floating-point numbers |
Array(String) |
Strings |
Array(Map) |
Maps |
Array(JSON) |
Arrays of JSON objects |
Sample input data⚓︎
This example shows a JSON column containing an array of strings and an array of JSON objects:
| Sample JSON Input | |
|---|---|
Query examples⚓︎
Use 1-based indexing to access a specific element:
| Access Element by Index | |
|---|---|
Use empty brackets to access a field across all elements of an array of JSON objects:
| Access Field across All Elements | |
|---|---|
JSON column indexing⚓︎
This feature was introduced in Hydrolix version 5.10.3.
Hydrolix creates indexes at ingest for all fields in JSON columns, except floating-point values. No configuration is required.
At query time, Hydrolix uses these indexes for two patterns: homogeneous fields and simple arrays. A homogeneous field is a JSON sub-field whose values all have a single consistent type across the records in a partition. For example, data.age is always an integer, or data.country is always a string. A simple array is an array of primitive types such as Array(Int64) or Array(String).
Fields with mixed types in a single partition aren't used for index lookups, but Hydrolix still writes their indexes.
Mixed types and key changes
JSON columns support flexible schemas. If a field's type changes between partitions, each partition with a consistent type uses its index independently. If a field's type changes in a single partition, index lookups are skipped for that partition.
1 | |
To verify that a query uses JSON indexes, set hdx_query_debug=true. The indexes_used field lists JSON sub-columns by their full path, such as humans.name.
Exclude fields from JSON columns⚓︎
The json_skip and json_skip_regexp transform settings exclude matching fields from JSON columns entirely. Excluded fields aren't stored, indexed, or queryable.
Explicit index: true/false settings on individual output columns take precedence over automatic indexing.
Limitations⚓︎
- Summary tables don't support JSON columns.
- Map of JSON may cause column explosion (a large number of distinct column paths that degrades query performance) if map keys are unique across rows.
- Hydrolix stores IP, UUID, and Datetime values inside JSON as strings. To use these values as their intended types in SQL queries, apply explicit type conversion functions.
- Hydrolix stores UInt64 values above
9,223,372,036,854,775,807as Float64, which may cause precision loss. - Accessing JSON fields in subqueries requires the
hdx_allow_experimental_analyzer=1query setting.