Format Options
Hydrolix can import a number of different message and file types into the platform. To define the encoding of a file this is done in the format_details
section within the settings
object. The current list includes:
- CSV
- JSON
CSV
CSV in Hydrolix terms is defined as "Character" rather than Comma Separated encoded files or messages.
To create a transform schema that handles CSV-formatted incoming data, Set its type
property to "csv"
, and its format_details
object to include the following configuration options:
Element | Type | Default | Description |
---|---|---|---|
delimiter | number, string | The delimiter substring. | |
escape | number, string | " | The escape character. |
skip_head | number | 0 | The number of rows to skip before ingestion starts |
quote | number, string | " | The quote character. |
comment | number, string | # | The comment character. Only single characters are supported. |
skip_comments | boolean | false | If true, then the ingester will not process lines beginning with the comment character. |
windows_ending | boolean | false | If true, then Hydrolix will expect incoming data to use Windows-style (CR-LF) line endings. |
(Note that Hydrolix recognizes "\t"
as a tab character for the purposes of CSV configuration.)
CSV Example
{
"name": "my_special_transform",
"type": "csv",
"settings": {
"format_details": {
"skip_head": 2,
"delimiter": ","
},
...
}
}
JSON
When ingesting rows formatted as JSON objects, Hydrolix uses the names of the objects' top-level keys to establish the mapping between output_columns
and the source data. That is, if your source data contains a top-level property named "employees"
that you wish to ingest, then you must name corresponding column definition in your transform "employees"
as well.
This also applies to JSON flattening: your output columns must share the full names of any flattened data field whose value you wish to copy into them. So, if your flattened incoming data structure has a relevant property named "employees.departments[0]"
, and you wish to copy its values into your Hydrolix table, then one of your transform's output_columns
must also have its name
property set to the string "employees.departments[0]"
.
JSON Flattening
When accepting JSON-formatted source data, you may optionally flatten each incoming object as a pre-processing step prior to ingesting it. This can transform complex, multi-level JSON structures into simple objects comprising one level of key/value pairs, ready for storage in a single table row.
To do this, define a flattening
property within your transform's format_details
. Set its value an object with the following properties:
Property | Value |
---|---|
active | If 1 (or any other true value), Hydrolix will flatten incoming JSON objects before ingesting them as rows. |
map_flattening_strategy | Configuration for flattening any JSON objects within each row's main object. |
slice_flattening_strategy | Configuration for flattening any JSON arrays within each row's main object. |
depth | Configuration for specifying the depth which flattening is enable, 0 means everything |
The two "strategy" properties accept an object that defines the rules that Hydrolix should follow to create new key names for the resulting, flattened JSON object.
Property | Value |
---|---|
left | The substring to use when concatenating an element's key with its parent's key. |
right | The substring to use when concatenating an element's key with its child's key. |
Not defining (or defining as null
) either of the "strategy" properties will deactivate flattening for either objects or arrays, respectively.
JSON Flattening Example
Consider the following JSON object, which we wish to ingest as a single row:
{
"date": "2020-01-01",
"data": {
"oranges": [ 1, 2, 3 ],
"apples": [
{
"cortland": 6,
"honeycrisp": [ 7, 8, 9 ]
},
[ 10, 11, 12 ]
]
}
}
Imagine that the transform handling it contains the following flattening
configuration:
"settings": {
"format_details": {
"flattening": {
"active": true,
"map_flattening_strategy": {
"left": ".",
"right": ""
},
"slice_flattening_strategy": {
"left": "[",
"right": "]"
}
}
},
...
}
After applying these JSON flattening strategies, Hydrolix would end up ingesting the following, single-level JSON object:
{
"date": "2020-01-01"
"data.oranges[0]": 1
"data.oranges[1]": 2
"data.oranges[2]": 3
"data.apples[0].cortland": 6
"data.apples[0].honeycrisp[0]": 7
"data.apples[0].honeycrisp[1]": 8
"data.apples[0].honeycrisp[2]": 9
"data.apples[1][0]": 10
"data.apples[1][1]": 11
"data.apples[1][2]": 12
}
Example including depth: 1
:
depth: 1
:"settings": {
"format_details": {
"flattening": {
"active": true,
"depth": 1,
"map_flattening_strategy": {
"left": ".",
"right": ""
},
"slice_flattening_strategy": {
"left": "[",
"right": "]"
}
}
},
...
}
After applying the transform and limiting the depth to 1 level we would end up with the following JSON:
{
"date": "2020-01-01",
"data.apples":[
{
"cortland":6,
"honeycrisp":[7,8,9]
},
[10,11,12]
],
"data.oranges":[1,2,3]
}
This is useful when you want to leverage map datatype and just flatten at a specific level.
Compression
The compression
property in a transform describes one or more compression algorithms that Hydrolix should expect to find already applied to the data package as a whole, and which it will need to uncompress prior to working with the data.
For example, setting the transform's compression
property to "gzip"
means that you expect the source data, in its entirety, to have had the GZIP compression method applied to it prior to its receipt by Hydrolix.
Compression algorithms
Valid values for the compression
property include the following:
Value | Meaning |
---|---|
gzip | Content is compressed via gzip (LZ77 with 32-bit CRC). |
zip | Content is ZIP-encoded via zlib (RFC 1950) |
deflate | Content is encoded in zlib structure and the deflate compression algorithm. |
bzip2 | Content is compressed with the bzip2 algorithm. |
none | Content is not compressed. (Equivalent to not specifying compression at all.) |
Note that, in streaming ingestion, the request document may have compression represented via the content-encoding
header, but the data has its own compression potentially.
Compression Layering
To define multiple layers of compression, specify them in a comma-and-space-separated list:
"compression": "gzip, bzip2, zip"
The order matters: Hydrolix will attempt to apply decompression algorithms in the order specified, right-to-left.
In the above example, Hydrolix would apply zlib decompression to all received data, then further apply bzip2 decompression, and end with applying gzip decompression.
Updated 2 months ago