Message/File Encoding
Hydrolix can import a number of different message and file types into the platform. The current list includes:
- CSV
- JSON
- Parquet
To define the encoding of a file this is done in the format_details
section within the settings
object.
CSV
CSV in Hydrolix terms is defined as "Character" rather than Comma Separated encoded files or messages.
Format details
To create a transform schema that handles CSV-formatted incoming data, Set its type
property to "csv"
, and its format_details
object to include the following configuration options:
(Note that Hydrolix recognizes "\t"
as a tab character for the purposes of CSV configuration.)
example
{
"name": "my_special_transform",
"type": "csv",
"settings": {
"format_details": {
"skip_head": 2,
"delimiter": ","
},
...
}
}
JSON
When ingesting rows formatted as JSON objects, Hydrolix uses the names of the objects' top-level keys to establish the mapping between output_columns
and the source data. That is, if your source data contains a top-level property named "employees"
that you wish to ingest, then you must name corresponding column definition in your transform "employees"
as well.
This also applies to JSON flattening: your output columns must share the full names of any flattened data field whose value you wish to copy into them. So, if your flattened incoming data structure has a relevant property named "employees.departments[0]"
, and you wish to copy its values into your Hydrolix table, then one of your transform's output_columns
must also have its name
property set to the string "employees.departments[0]"
.
JSON Flattening
When accepting JSON-formatted source data, you may optionally flatten each incoming object as a pre-processing step prior to ingesting it. This can transform complex, multi-level JSON structures into simple objects comprising one level of key/value pairs, ready for storage in a single table row.
To do this, define a flattening
property within your transform's format_details
. Set its value an object with the following properties:
Property | Value |
---|---|
active | If 1 (or any other true value), Hydrolix will flatten incoming JSON objects before ingesting them as rows. |
map_flattening_strategy | Configuration for flattening any JSON objects within each row's main object. |
slice_flattening_strategy | Configuration for flattening any JSON arrays within each row's main object. |
The two "strategy" properties accept an object that defines the rules that Hydrolix should follow to create new key names for the resulting, flattened JSON object.
Property | Value |
---|---|
left | The substring to use when concatenating an element's key with its parent's key. |
right | The substring to use when concatenating an element's key with its child's key. |
Not defining (or defining as null
) either of the "strategy" properties will deactivate flattening for either objects or arrays, respectively.
An example of JSON flattening
Consider the following JSON object, which we wish to ingest as a single row:
{
"date": "2020-01-01",
"data": {
"oranges": [ 1, 2, 3 ],
"apples": [
{
"cortland": 6,
"honeycrisp": [ 7, 8, 9 ]
},
[ 10, 11, 12 ]
]
}
}
Imagine that the transform handling it contains the following flattening
configuration:
"settings": {
"format_details": {
"flattening": {
"active": true,
"map_flattening_strategy": {
"left": ".",
"right": ""
},
"slice_flattening_strategy": {
"left": "[",
"right": "]"
}
}
},
...
}
After applying these JSON flattening strategies, Hydrolix would end up ingesting the following, single-level JSON object:
{
"date": "2020-01-01"
"data.oranges[0]": 1
"data.oranges[1]": 2
"data.oranges[2]": 3
"data.apples[0].cortland": 6
"data.apples[0].honeycrisp[0]": 7
"data.apples[0].honeycrisp[1]": 8
"data.apples[0].honeycrisp[2]": 9
"data.apples[1][0]": 10
"data.apples[1][1]": 11
"data.apples[1][2]": 12
}
Parquet
Soon to be released!
Updated about 2 months ago