Custom Dictionaries

Hydrolix has the ability to leverage in-memory dictionaries for accelerated look-up functions.

Dictionaries

Dictionaries are a Clickhouse capability that provides a convenient mechanism to store and utilize an additional key value data. For example this could be used with Geographic lookups or additional metadata that maybe commonly used across tables.

To deploy a dictionary there are two steps:

  1. Upload your dictionary file using the API

  2. Create a new dictionary definition using the API.
    The dictionary definition can be specified within the dictionary specification below. It should follow one of the formats supported by Clickhouse Formats

The dictionary is specified with the following settings, using a similar format as can be found within a Transforms Output Columns.

{
    "name":"country_dict",
    "settings": {
        "filename": "country_dict.csv",
            "layout": "complex_key_hashed",
            "lifetime_seconds": 5,
            "output_columns": [
                {
                .............
                }
            ],
            "primary_key": [ "id" ],
            "format": "CSVWithNames"
    }
}

Dictionary Configuration details:

📘

NOTE: This configuration tries to match 1-to-1 with ClickHouse DDL Creation of Dictionaries, so it may help to read up on that first.

NameDefinition
filenameName of the dictionary file
layoutWill depend on if you want lookups to use a composite or singular column as a key. Recommended hashed for single key to single value, or complex_key_hashed for multiple keys to a single value. More information can be found here - https://clickhouse.com/docs/en/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-layout/
lifetime_secondsHow often ClickHouse refreshes dictionary from local file. Recommended set to 5. Reason: regardless of setting, Hydrolix checks every 5 seconds if the dictionary file in the cloud bucket has changed, and synchronises it locally.
primary_keyThe column/s that should be used for the primary key (note primary should not be used in the output_columns definition).
FormatThe format for the file being used as a dictionary. Supported Clickhouse Formats.

Output columns is similar to Transforms (Write Schema). The main differences are that a primary is not specified within the output_columns, a timestamp isn't required, and only name and datatype.type are specified.

Example

For example a lookup "Dictionary" for the country name where the country is stored as id can be implemented as follows:

  1. First, upload your dictionary file into Hydrolix
curl --request POST \
  --url 'https://{hostname}.hydrolix.live/config/v1/orgs/{org_id}/projects/{project_id}/dictionaries/files/' 
  --header 'Authorization: Bearer {{access_token}}' \
  -F file=@<filename>
  --form-string 'name=<name to reference the file>'
  1. Then create your dictionary definition:
POST https://{hostname}.hydrolix.live/config/v1/orgs/{org_id}/projects/{project_id}/dictionaries/
Authorization: Bearer {{access_token}}
Content-Type: application/json

{
    "name":"country_dict",
    "settings": {
        "filename": "country_dict.csv",
            "layout": "complex_key_hashed",
            "lifetime_seconds": 5,
            "output_columns": [
                {
                    "datatype": {
                        "type": "string"
                    },
                    "name": "country"
                },
                {
                    "datatype": {
                        "type": "uint64"
                    },
                    "name": "id"
                }
            ],
            "primary_key": [ "id" ],
            "format": "CSVWithNames"
    }
}
  1. Verify that your dictionary is present:
EXISTS {{projectname}}_country_dict

┌─result─┐
│      1 │
└────────┘

Or, to see all dictionaries available:

SHOW DICTIONARIES

📘

Using dictionaries

It should be noted that dictionaries naming in the select statement is different to a regular table. If for example you wanted to select * from a dictionary you would prepend the dictionary name with the project name separated by an _. For example

select * from myproject_mydictionary

The dictionary is now available for use both at time of data ingest, but also at Query time.

Using Dictionaries

To use dictionaries you can use the dictGet function.

More information about these functions can be found here https://clickhouse.com/docs/en/sql-reference/functions/ext-dict-functions/#dictget.