Format Options

Hydrolix can process input data structured in many different ways. Hydrolix currently supports data in CSV or JSON formats compressed with several different algorithms.

CSV

The CSV (Character-Separated Values) format encodes data as collections of lines containing columns separated by a specified delimiter character.

To create a transform schema that handles CSV input, set the type property to "csv" and the format_details field to an object that includes the following configuration options:

Element	Type	Default	Description
`delimiter`	number, string		The delimiter substring, e.g. `,` or `\t`.
`escape`	number, string	"	The escape character.
`skip_head`	number	0	The number of rows to skip at the beginning of any ingestion.
`quote`	single-character string	"	The character that specifies the beginning and end of a string literal.
`comment`	single-character string	#	The character that marks the beginning of a comment line.
`skip_comments`	boolean	false	Whether or not to ignore lines beginning with the comment character.
`windows_ending`	boolean	false	Whether or not to allow Windows-style (CR-LF) line endings.

CSV Example

{
    "name": "my_special_transform",
    "type": "csv",
    "settings": {
        "format_details": {
            "skip_head": 2,
            "delimiter": ","
        },
        ...
    }
}

JSON

To create a transform schema that handles JSON input, set the type property to "json". Because JSON input allows for complex nested structures, Hydrolix provides options to flatten incoming JSON during a pre-processing step and explicitly source output column data from the resulting structure.

Datatype Source

By default, Hydrolix uses the names of the top-level keys to establish the mapping between output_columns and the source data. For example, if your source data contains a top-level property named employees that you wish to ingest, you must name the corresponding column definition in your transform employees. This convention also applies to flattened property names.

You can override this default mapping for any column definition in output_columns with the datatype.source field:

Source	Syntax
input data by field	`"source": { "from_input_fields": ["field-name-1", "field-name-2", ...] }`
input data by index	`"source": { "from_input_index": n }`
input data in a nested JSON structure	`"source": { "from_json_pointers": ["/path/alternative/1", "/path/alternative/2", ...] }`
variable	`"source": { "from_automatic_value": "variable-name" }`

Create a New Column from an Input

The following example creates a new column named test from the data in the third column of input CSV data:

{
  	"name": "copy_example",
  	"type": "csv",
  	"table": "{{tableid}}",
  	"settings": {
  	    "is_default": true,
  	    "compression": "none",
  	    "output_columns": [
  	        {
  	            "name": "test",
  	            "datatype": {
  	      	        "type": "string",
  	      	        "index": false,
  	                "source": { "from_input_index": 3 }
  					}
				]
 		}
}

The following example creates a new column named test from the data in a field called initial_name in input JSON data:

{
    "name": "copy_example",
    "type": "json",
    "table": "{{tableid}}",
    "settings": {
        "is_default": true,
        "compression": "none",
        "output_columns": [
            {
                "name": "test",
                "datatype": {
                    "type": "string",
                    "index": true,
                    "source": {
                        "from_input_field": "initial_name"
                    }
                }
            }
        ]
    }
}

📘
Copy Data
By copying data into multiple columns, you can index the same data multiple times with different types.

Create a New Column from Nested Input

Expressions can also query data stored in a nested JSON structure using JSON Pointer syntax. Assume input data stored in the following structure:

A JSON object called vegetables.
Within vegetables, a nested JSON object called legumes.
Within legumes, a field called species.

{
  "vegetables": {
    "legumes": {
      species: "example_data"
    }
  }
}

Given this input data, you could use the following transform to define an output column named example_species:

{
    "name": "copy_example",
    "type": "json",
    "table": "{{tableid}}",
    "settings": {
        "is_default": true,
        "compression": "none",
        "output_columns": [
            {
                "name": "example_species",
                "datatype": {
                    "type": "string",
                    "index": true,
                    "source": {
                       "from_json_pointers": ["/vegetables/legumes/species"]
                    }
                }
            }
        ]
    }
}

Flattening

When accepting JSON source data, you may optionally flatten each incoming object prior to ingestion. This can transform complex, multi-level JSON structures into simple objects comprising one level of key/value pairs, ready for storage in a single table row.

To do this, define a flattening property within your transform's format_details. Set the value to an object with the following properties:

Property	Value
`active`	If `1` (or any other true value), Hydrolix will flatten incoming JSON objects before ingesting them as rows.
`map_flattening_strategy`	Configuration for flattening any JSON objects within each row's main object.
`slice_flattening_strategy`	Configuration for flattening any JSON arrays within each row's main object.
`depth`	Configuration for specifying how "deep" flattening goes. Use a value of 0 to impose no limit.

The two strategy properties accept an object that defines the rules that Hydrolix should follow to create new key names for the resulting, flattened JSON object.

Property	Value
left	The substring to use when concatenating an element's key with its parent's key.
right	The substring to use when concatenating an element's key with its child's key.

Not defining (or defining as null) either of the "strategy" properties will deactivate flattening for either objects or arrays, respectively.

📘
JSON Flattening Impacts Source Naming
Use the flattened version of a property name when defining output columns. For example, consider the following input JSON:
{
  "employees" : {
    "departments" : [
      	"produce",
      	"meat"
      ]
  }
}
Assume the following flattening configuration:
"flattening": {
  "active": true,
  "map_flattening_strategy": {
    "left": ".",
    "right": ""
  },
  "slice_flattening_strategy": {
    "left": "[",
    "right": "]"
  }
}
To refer to the first value in the array stored in employees.departments, use the field name "employees.departments[0]".

Example

Consider the following JSON object:

{
  "date": "2020-01-01",
  "data": {
    "oranges": [ 1, 2, 3 ],
    "apples": [
      {
        "cortland": 6,
        "honeycrisp": [ 7, 8, 9 ]
      },
      [ 10, 11, 12 ]
    ]
  }
}

Assume the following flattening configuration in the ingestion transform:

"settings": {
    "format_details": {
        "flattening": {
            "active": true,
            "map_flattening_strategy": {
                "left": ".",
                "right": ""
            },
            "slice_flattening_strategy": {
                "left": "[",
                "right": "]"
            }
        }
    },
    ...
}

After applying JSON flattening, Hydrolix ingests the following, single-level JSON object:

{
    "date": "2020-01-01",
    "data.oranges[0]": 1,
    "data.oranges[1]": 2,
    "data.oranges[2]": 3,
    "data.apples[0].cortland": 6,
    "data.apples[0].honeycrisp[0]": 7,
    "data.apples[0].honeycrisp[1]": 8,
    "data.apples[0].honeycrisp[2]": 9,
    "data.apples[1][0]": 10,
    "data.apples[1][1]": 11,
    "data.apples[1][2]": 12
}

Depth Example

Consider the following transform that specifies a flattening depth of 1:

"settings": {
    "format_details": {
        "flattening": {
            "active": true,
            "depth": 1,
            "map_flattening_strategy": {
                "left": ".",
                "right": ""
            },
            "slice_flattening_strategy": {
                "left": "[",
                "right": "]"
            }
        }
    },
    ...
}

Applying the transform produces the following JSON:

{
    "date": "2020-01-01",
    "data.apples":[
        {
            "cortland":6,
            "honeycrisp":[7,8,9]
        },
        [10,11,12]
    ],
    "data.oranges":[1,2,3]
}

This is useful when you want to leverage the map datatype and only flatten at a specific level.

Compression

The compression property in a transform describes one or more compression algorithms that Hydrolix should expect to find already applied to input. Hydrolix uses the listed algorithms to decompress input into parseable data.

For example, setting the transform's compression property to "gzip" means that you expect the source data, in its entirety, to have had the GZIP compression method applied to it prior to its receipt by Hydrolix.

Algorithms

Valid values for the compression property include the following:

Value	Meaning
`gzip`	Content is compressed via `gzip` (LZ77 with 32-bit CRC).
`zip`	Content is ZIP-encoded via `zlib` (RFC 1950)
`deflate`	Content is encoded in `zlib` structure and the `deflate` compression algorithm.
`bzip2`	Content is compressed with the bzip2 algorithm.
`none`	Content is not compressed. (Equivalent to not specifying `compression` at all.)

During streaming ingestion, the request may also use the content-encoding header. Hydrolix automatically handles that encoding during the streaming process.

Layering

To define multiple layers of compression, specify them in a string separated by , :

"compression": "gzip, bzip2, zip"

In the above example, Hydrolix first decompresses received data with zlib, then bzip2, and finally with gzip.

🚧
Layered Order Matters
Hydrolix decompresses data using algorithms in right-to-left (<-) order.

Format Options

CSV

CSV Example

JSON

Datatype Source

Create a New Column from an Input

📘
Copy Data

Create a New Column from Nested Input

Flattening

📘
JSON Flattening Impacts Source Naming

Example

Depth Example

Compression

Algorithms

Layering

🚧
Layered Order Matters

CSV

CSV Example

JSON

Datatype Source

Create a New Column from an Input

📘Copy Data

Create a New Column from Nested Input

Flattening

📘JSON Flattening Impacts Source Naming

Example

Depth Example

Compression

Algorithms

Layering

🚧Layered Order Matters

📘
Copy Data

📘
JSON Flattening Impacts Source Naming

🚧
Layered Order Matters