Custom Dictionaries

Use custom dictionaries in Hydrolix

Overview

Dictionaries are a ClickHouse feature that provides a convenient mechanism to store and use additional key-value data. For example, use dictionaries with geographic lookups or other metadata commonly used across tables.

Prerequisites

Make sure you have these RBAC permissions to perform operations on dictionaries:

add_dictionary
change_dictionary
delete_dictionary
view_dictionary
dictGet_sql

If you need to modify the dictionary files, these permissions are needed:

add_dictionaryfile
change_dictionaryfile
delete_dictionaryfile
view_dictionaryfile

For more information, see User Permissions (RBAC).

Create a dictionary

  1. Upload the dictionary file using the API.
  2. Create a new dictionary definition using the API.
    • Specify the dictionary definition in the dictionary specification below.
    • Follow one of the formats supported by ClickHouse Formats.
    • The dictionary definition column names must match the field names in the input file exactly, including casing. Mismatched names result in an empty dictionary.

Dictionary format

Specify the dictionary in JSON using a similar format to Transform Output Columns.

In this example, a dictionary called country_dict is configured to use a local file named country_dict.csv. It uses the complex_key_hashed layout, which supports multiple keys, and refreshes every five seconds. The file format is CSVWithNames, and the dictionary uses the id field as its primary key. The output_columns section defines the structure and data types expected in the file.

{
    "name":"country_dict",
    "settings": {
        "filename": "country_dict.csv",
            "layout": "complex_key_hashed",
            "lifetime_seconds": 5,
            "output_columns": [
    {
        "name": "id",
        "datatype": { "type": "uint64" }
    },
    {
        "name": "country",
        "datatype": { "type": "string" }
    }
],
            "primary_key": [ "id" ],
            "format": "CSVWithNames"
    }
}

Dictionary file or dictionary format

This example shows a dictionary source file as a CSV:

id,country
1,Spain
2,France
3,Germany

This file uses the CSVWithNames format, which includes headers in the first row.

To match this, the dictionary format (output_columns) must define column names and types exactly:

{
  "name": "country_dict",
  "settings": {
    "filename": "country_dict.csv",
    "layout": "hashed",
    "lifetime_seconds": 5,
    "primary_key": ["id"],
    "format": "CSVWithNames",
    "output_columns": [
      {
        "name": "id",
        "datatype": { "type": "uint64" }
      },
      {
        "name": "country",
        "datatype": { "type": "string" }
      }
    ]
  }
}

📘

The dictionary file does not define types. Types are specified in the output_columns section of the dictionary JSON.

For more supported types and format details, see the ClickHouse Formats documentation.

Set custom delimiters

This feature was added in version 5.3.

SQL in Hydrolix supports comma-separated values (CSV) by default. Custom delimiters can be set to support multiple types of delimiters.

  1. Choose a file format:
  • CustomSeparated
  • CustomSeparatedWithNames
  • CustomSeparatedWithNamesAndTypes
  1. Use these fields to specify custom delimiters for columns and rows:
  • custom_column_delimiter: Characters that separate fields in each row.
  • custom_row_delimiter: Characters that separate rows.

In this example, the file format is CustomSeparatedWithNames, and the custom_column_delimiter is; and the custom_row_delimiter is |.

"settings": {
    "filename": "country_dict.csv",
    "layout": "hashed",
    "lifetime_seconds": 5,
    "primary_key": ["id"],
    "format": "CustomSeparatedWithNames",
    "custom_column_delimiter": ";",
    "custom_row_delimiter": "|",
    "output_columns": [
        {
            "datatype": {
                "type": "string"
            },
            "name": "country"
        },
        {
            "datatype": {
                "type": "uint64"
            },
            "name": "id"
        }
    ]
}

⚠️

Double quotes "" are not supported as delimiters.

Configure queries

📘

ClickHouse DDL Queries

NOTE: This configuration is designed to match with the ClickHouse DDL Creation of Dictionaries

NameDefinition
filenameName of the dictionary file
layoutDepends on whether lookups use a composite or singular column as a key. Hydrolix recommends usinghashed for a single key to a single value, or complex_key_hashed for multiple keys to a single value. More information can be found at ClickHouse Dictionaries.
lifetime_secondsHow often ClickHouse refreshes the dictionary from the local file. 5 seconds is the recommended time, since Hydrolix checks every 5 seconds if the dictionary file in the cloud bucket has changed, and synchronizes it locally.
primary_keyThe columns that for the primary key. Note primary should not be used in the output_columns definition.
formatThe format of the dictionary file. This must be one of the supported Clickhouse Formats.

Output column definition is similar to Transforms (Write Schema). The main differences are:

  • A primary is not specified within the output_columns.
  • A timestamp isn't required.
  • Only name and datatype.type are specified.

Example dictionary lookup

This example shows how to set up a lookup dictionary for country names where the country is stored as id.

  1. Upload the dictionary file into Hydrolix:
curl --request POST \
  --url 'https://{hostname}.hydrolix.live/config/v1/orgs/{org_id}/projects/{project_id}/dictionaries/files/' 
  --header 'Authorization: Bearer {{access_token}}' \
  -F file=@<filename>
  --form-string 'name=<name to reference the file>'
  1. Create the dictionary definition:
POST https://{hostname}.hydrolix.live/config/v1/orgs/{org_id}/projects/{project_id}/dictionaries/
Authorization: Bearer {{access_token}}
Content-Type: application/json

{
    "name":"country_dict",
    "settings": {
        "filename": "country_dict.csv",
            "layout": "complex_key_hashed",
            "lifetime_seconds": 5,
            "output_columns": [
                {
                    "datatype": {
                        "type": "string"
                    },
                    "name": "country"
                },
                {
                    "datatype": {
                        "type": "uint64"
                    },
                    "name": "id"
                }
            ],
            "primary_key": [ "id" ],
            "format": "CSVWithNames"
    }
}
  1. Verify that your dictionary is present:
EXISTS {{projectname}}_country_dict

┌─result─┐
│      1 │
└────────┘

Or, to see all dictionaries available:

SHOW DICTIONARIES

The dictionary is now available for use both at the time of data ingest and query.

Query dictionaries

📘

Dictionary Names are Different from Table Names

To refer to dictionaries in SQL statements, prepend the dictionary name with the project name separated by an underscore. For example:

select * from myproject_mydictionary

To use dictionaries from SQL, use queries like this example, or the dictGet function. This allows specification of the dictionary, attribute, and key directly. See the ClickHouse documentation for a description of the dictGet function and its variants.

See also