Data Types (Import)

To configure the data type for a column the Output Columns datatype object needs to have a type applied for each column.

"settings": {
   "output_columns": [
        {
            "name": "a_string_column",
            "datatype": {
                "type": "string",
                .........
            }
        },
        {
            "name": "a_uint64_column",
            "datatype": {
                "type": "uint64",
                .........
            }
        },
        {
        ..........

The following types are supported:

Type

Indexed

Nullable

Description

Stored as

boolean

Yes (Default)

Yes

Converted to a uint8 prior to storage. The case-insensitive strings "false" or "0" get converted to 0. Any other non-0 value gets converted to 1.

uint8

datetime

Yes (Default)

Yes (except Primary Column)

A string-based representation of a moment in time, e.g. Mon Jan 2 15:04:05 -0700 2006.

datetime or datetime64 based on resolution. See Timestamps for more information.

double

No (can not be indexed).

Yes

A 64-bit floating-point number.

float64

epoch

Yes (Default)

Yes

Converted to a datetime or datetime64, using this mapping requires additional formatting information.

datetime or datetime64 based on resolution. See Timestamps for more information.

int8

Yes (Default)

Yes

A signed 8-bit integer (-128 : 127).

int8

int32

Yes (Default)

Yes

A signed 32-bit integer (-2147483648 : 2147483647).

int32

int64

Yes (Default)

Yes

A signed 64-bit integer (-9223372036854775808 : 9223372036854775807).

int64

string

Yes (Default)

Yes

A variable-length string. Equivalent to VARCHAR or CLOB in other data systems.

string

uint8

Yes (Default)

Yes

An unsigned 8-bit integer (0 : 255).

uint8

uint32

Yes (Default)

Yes

An unsigned 32-bit integer (0 : 4294967295).

uint32

uint64

Yes (Default)

Yes

An unsigned 64-bit integer (0 : 18446744073709551615).

uint64

array

Yes (Default, unless Double)

Yes

An array of any one of the primitive types that Hydrolix supports.

array

map

Yes (Default, unless double)

Yes

A map of any one of the primitive types that Hydrolix supports.

map

It should be noted, a couple of datatypes, boolean and epoch, translate to more primitive types and are therefore just shorthand expressions to make it easier to describe in a transform. Columns can also be of type null or missing a value except for the primary date time field.

πŸ“˜

Datetime

Datetime columns can be complex in nature due to the complexities of the type more details can be found here DateTime, Epoch's and timestamps.

Complex Types

Hydrolix supports a number of complex types - arrays and maps. These types have an additional object within datatype called elements.

The elements object defines the structure of the map or array and the subsequent datatypes that are contained with them. It uses the same settings as the parent object ( type, format, index fields etc) to define how data should be treated.

Maps

Hydrolix supports the use of Maps as a data-type. Maps are defined as { key : string } definitions with the Key and the Value requiring their own configurations. An example is below where the Key is an indexed string and the value is a epoch millisecond timestamp.

{
    "name": "map_column_name",
    "datatype": {
        "type": "map",
        "index": false,
        "elements": [
            {
                "type": "string",
                "index": true
            },
            {
                "type": "epoch",
                "index": true,
                "format": "us",
                "resolution": "ms"
            }
        ]
    }
}

When being used in a query you would use something like the following for a uint64:

select mymap['uint64'] .... from ....... where mymap['uint64'] = 6288

Array of Maps

Hydrolix supports the use of Array of Maps data-type.

Array of maps are defined as:

"array":
[
  {
    "key": "string"
  },
  {
    "key2": "other_string"
  }
]

🚧

Array of maps only support the same datatype

We don't support mix datatype in array of maps, the value needs to always be the same type.
In the previous example it's a string.

The transform for array of maps is the following:

{
    "name": "array",
    "datatype": {
        "type": "array",
        "elements": 
        [
            {
                "type": "map",
                "elements":
                [
                    {
                        "type": "string"
                    },
                    {
                        "type": "string"
                    }
                ]
            }
        ]
    }
}

Indexing

Hydrolix has been specifically designed to allow the indexing of as many columns as possible, without the traditional penalties found in older data platforms. It is therefore strongly suggested that the user should only turn off indexing of a column when absolutely necessary. By default Hydrolix indexes all data types except doubles.

Indexing is turned on and off within the datatype descriptor in Output Columns.

"settings": {
   "output_columns": [
        {
            "name": "indexed_string_column",
            "datatype": {
                "type": "string",
                "index": true
            }
        },
        {
            "name": "not_indexed_string_column",
            "datatype": {
                "type": "string",
                "index": false
            }
        }
    ]
}

Nullables

Nullables in Hydrolix are not directly defined in the type and instead are defined by the default setting within the datatype object. By default all columns (except primary columns) are nullable, to make a field non-nullable a default is set forcing a value if one is not provided.

It should be noted that the default setting should be appropriate to the type, e.g. setting "" (empty string) for the type uint64 will cause unexpected and likely unwanted behaviours.

"settings": {
   "output_columns": [
     {
            "name": "nullable_string",
            "datatype": {
                "type": "string",
                "default": null
            }
        },
        {
            "name": "non_nullable_string",
            "datatype": {
                "type": "string",
                "default": ""
            }
        },
        {
            "name": "nullable_uint64",
            "datatype": {
                "type": "uint64",
                "default": null
            }
        },
        {
            "name": "non_nullable_string",
            "datatype": {
                "type": "uint64",
                "default": 0
            }
        },
     ..................

πŸ“˜

Performance and Nulls

Hydrolix has seen performance improvements in query times if nulls are not required. For example setting a string default set as "" (empty string).


Did this page help you?