Data Types (Import)
To configure the data type for a column the Output Columns datatype
object needs to have a type
applied for each column.
"settings": {
"output_columns": [
{
"name": "a_string_column",
"datatype": {
"type": "string",
.........
}
},
{
"name": "a_uint64_column",
"datatype": {
"type": "uint64",
.........
}
},
{
..........
The following types are supported:
Type | Indexed | Nullable | Description | Stored as |
---|---|---|---|---|
boolean | Yes (Default) | Yes | Converted to a |
|
datetime | Yes (Default) | Yes (except Primary Column) | A string-based representation of a moment in time, e.g. |
|
double | No (can not be indexed). | Yes | A 64-bit floating-point number. |
|
epoch | Yes (Default) | Yes | Converted to a |
|
int8 | Yes (Default) | Yes | A signed 8-bit integer ( |
|
int32 | Yes (Default) | Yes | A signed 32-bit integer ( |
|
int64 | Yes (Default) | Yes | A signed 64-bit integer ( |
|
string | Yes (Default) | Yes | A variable-length string. Equivalent to |
|
uint8 | Yes (Default) | Yes | An unsigned 8-bit integer ( |
|
uint32 | Yes (Default) | Yes | An unsigned 32-bit integer ( |
|
uint64 | Yes (Default) | Yes | An unsigned 64-bit integer ( |
|
array | Yes (Default, unless Double) | Yes | An array of any one of the primitive types that Hydrolix supports. |
|
map | Yes (Default, unless double) | Yes | A map of any one of the primitive types that Hydrolix supports. |
|
It should be noted, a couple of datatypes, boolean
and epoch
, translate to more primitive types and are therefore just shorthand expressions to make it easier to describe in a transform. Columns can also be of type null
or missing a value except for the primary date time field.
Datetime
Datetime columns can be complex in nature due to the complexities of the type more details can be found here DateTime, Epoch's and timestamps.
Complex Types
Hydrolix supports a number of complex types - arrays and maps. These types have an additional object within datatype
called elements
.
The elements
object defines the structure of the map or array and the subsequent datatypes that are contained with them. It uses the same settings as the parent object ( type
, format
, index
fields etc) to define how data should be treated.
Maps
Hydrolix supports the use of Maps as a data-type. Maps are defined as { key : string }
definitions with the Key and the Value requiring their own configurations. An example is below where the Key is an indexed string and the value is a epoch millisecond timestamp.
{
"name": "map_column_name",
"datatype": {
"type": "map",
"index": false,
"elements": [
{
"type": "string",
"index": true
},
{
"type": "epoch",
"index": true,
"format": "us",
"resolution": "ms"
}
]
}
}
When being used in a query you would use something like the following for a uint64:
select mymap['uint64'] .... from ....... where mymap['uint64'] = 6288
Array of Maps
Hydrolix supports the use of Array of Maps data-type.
Array of maps are defined as:
"array":
[
{
"key": "string"
},
{
"key2": "other_string"
}
]
Array of maps only support the same datatype
We don't support mix datatype in array of maps, the value needs to always be the same type.
In the previous example it's a string.
The transform for array of maps is the following:
{
"name": "array",
"datatype": {
"type": "array",
"elements":
[
{
"type": "map",
"elements":
[
{
"type": "string"
},
{
"type": "string"
}
]
}
]
}
}
Indexing
Hydrolix has been specifically designed to allow the indexing of as many columns as possible, without the traditional penalties found in older data platforms. It is therefore strongly suggested that the user should only turn off indexing of a column when absolutely necessary. By default Hydrolix indexes all data types except doubles.
Indexing is turned on and off within the datatype descriptor in Output Columns.
"settings": {
"output_columns": [
{
"name": "indexed_string_column",
"datatype": {
"type": "string",
"index": true
}
},
{
"name": "not_indexed_string_column",
"datatype": {
"type": "string",
"index": false
}
}
]
}
Nullables
Nullables in Hydrolix are not directly defined in the type and instead are defined by the default
setting within the datatype
object. By default all columns (except primary columns) are nullable, to make a field non-nullable a default
is set forcing a value if one is not provided.
It should be noted that the default
setting should be appropriate to the type, e.g. setting ""
(empty string) for the type uint64 will cause unexpected and likely unwanted behaviours.
"settings": {
"output_columns": [
{
"name": "nullable_string",
"datatype": {
"type": "string",
"default": null
}
},
{
"name": "non_nullable_string",
"datatype": {
"type": "string",
"default": ""
}
},
{
"name": "nullable_uint64",
"datatype": {
"type": "uint64",
"default": null
}
},
{
"name": "non_nullable_string",
"datatype": {
"type": "uint64",
"default": 0
}
},
..................
Performance and Nulls
Hydrolix has seen performance improvements in query times if nulls are not required. For example setting a string
default
set as""
(empty string).
Updated about 2 months ago