Custom Dictionaries

Hydrolix can use in-memory dictionaries for accelerated lookup functions.

Overview

Dictionaries are a ClickHouse capability that provides a convenient mechanism to store and utilize additional key-value data. For example, this could be used with geographic lookups or other metadata commonly used across tables.

You will need the correct RBAC permissions to use dictionaries.

Creating a Dictionary

To create a dictionary, there are two steps:

  1. Upload your dictionary file using the API.

  2. Create a new dictionary definition using the API.
    The dictionary definition can be specified within the dictionary specification below. It should follow one of the formats supported by ClickHouse Formats

Definition Format

Specify the dictionary via JSON using a similar format to Transforms Output Columns.

{
    "name":"country_dict",
    "settings": {
        "filename": "country_dict.csv",
            "layout": "complex_key_hashed",
            "lifetime_seconds": 5,
            "output_columns": [
                {
                .............
                }
            ],
            "primary_key": [ "id" ],
            "format": "CSVWithNames"
    }
}

Configuration

📘

ClickHouse DDL Queries

NOTE: This configuration is designed to match with the ClickHouse DDL Creation of Dictionaries

NameDefinition
filenameName of the dictionary file
layoutWill depend on if you want lookups to use a composite or singular column as a key. We recommend hashed for a single key to a single value, or complex_key_hashed for multiple keys to a single value. More information can be found in the ClickHouse documentation
lifetime_secondsHow often ClickHouse refreshes the dictionary from the local file. We recommend 5 seconds, since Hydrolix checks every 5 seconds if the dictionary file in the cloud bucket has changed, and synchronizes it locally.
primary_keyThe column/s that should be used for the primary key (note primary should not be used in the output_columns definition).
formatThe format of the dictionary file. This must be one of the supported Clickhouse Formats.

Output column definition is similar to Transforms (Write Schema). The main differences are:

  • a primary is not specified within the output_columns
  • a timestamp isn't required
  • only name and datatype.type are specified

Permissions

Make sure you have these RBAC permissions to perform operations on dictionaries:

add_dictionary
change_dictionary
delete_dictionary
view_dictionary
dictGet_sql

If you need to modify the dictionary files, these permissions are needed:

add_dictionaryfile
change_dictionaryfile
delete_dictionaryfile
view_dictionaryfile

For more information, see User Permissions (RBAC).

Example

Consider a lookup dictionary for country names where the country is stored as id. This can be implemented as follows:

  1. Upload the dictionary file into Hydrolix:
curl --request POST \
  --url 'https://{hostname}.hydrolix.live/config/v1/orgs/{org_id}/projects/{project_id}/dictionaries/files/' 
  --header 'Authorization: Bearer {{access_token}}' \
  -F file=@<filename>
  --form-string 'name=<name to reference the file>'
  1. Create the dictionary definition:
POST https://{hostname}.hydrolix.live/config/v1/orgs/{org_id}/projects/{project_id}/dictionaries/
Authorization: Bearer {{access_token}}
Content-Type: application/json

{
    "name":"country_dict",
    "settings": {
        "filename": "country_dict.csv",
            "layout": "complex_key_hashed",
            "lifetime_seconds": 5,
            "output_columns": [
                {
                    "datatype": {
                        "type": "string"
                    },
                    "name": "country"
                },
                {
                    "datatype": {
                        "type": "uint64"
                    },
                    "name": "id"
                }
            ],
            "primary_key": [ "id" ],
            "format": "CSVWithNames"
    }
}
  1. Verify that your dictionary is present:
EXISTS {{projectname}}_country_dict

┌─result─┐
│      1 │
└────────┘

Or, to see all dictionaries available:

SHOW DICTIONARIES

The dictionary is now available for use both at time of data ingest, but also at Query time.

Querying Dictionaries

📘

Dictionary Names Are Different From Table Names

To refer to dictionaries in SQL statements, prepend the dictionary name with the project name separated by an underscore. For example:

select * from myproject_mydictionary

To use dictionaries from SQL, use queries like the above callout, or you can use the dictGet function, allowing you to specify the dictionary, attribute, and key directly. See the ClickHouse documentation for a good description of the dictGet function and its variants.