Custom Dictionaries
Overview⚓︎
Dictionaries are a ClickHouse feature that provides a convenient mechanism to store and use additional key-value data. For example, use dictionaries with geographic lookups or other metadata commonly used across tables.
Prerequisites⚓︎
Make sure you have these RBAC permissions to perform operations on dictionaries:
- add_dictionary
- change_dictionary
- delete_dictionary
- view_dictionary
- dictGet_sql
If you need to modify the dictionary files, these permissions are needed:
- add_dictionaryfile
- change_dictionaryfile
- delete_dictionaryfile
- view_dictionaryfile
For more information, see User Permissions (RBAC).
Create a dictionary⚓︎
- Upload the dictionary file using the API.
- Create a new dictionary definition using the API.
- Specify the dictionary definition in the
dictionary specificationbelow. - Follow one of the formats supported by ClickHouse Formats.
- The dictionary definition column names must match the field names in the input file exactly, including casing. Mismatched names result in an empty dictionary.
- Specify the dictionary definition in the
Dictionary format⚓︎
Specify the dictionary in JSON using a similar format to Transform Output Columns.
In this example, a dictionary called country_dict is configured to use a local file named country_dict.csv. It uses the complex_key_hashed layout, which supports multiple keys, and refreshes every five seconds. The file format is CSVWithNames, and the dictionary uses the id field as its primary key. The output_columns section defines the structure and data types expected in the file.
Dictionary file or dictionary format⚓︎
This example shows a dictionary source file as a CSV:
This file uses the CSVWithNames format, which includes headers in the first row.
To match this, the dictionary format (output_columns) must define column names and types exactly:
The dictionary file doesn't define types. Types are specified in the output_columns section of the dictionary JSON.
For more supported types and format details, see the ClickHouse Formats documentation.
Set custom delimiters⚓︎
This feature was added in version 5.3.
SQL in Hydrolix supports comma-separated values (CSV) by default. Custom delimiters can be set to support multiple types of delimiters.
-
Choose a file format:
CustomSeparatedCustomSeparatedWithNamesCustomSeparatedWithNamesAndTypes
-
Use these fields to specify custom delimiters for columns and rows:
custom_column_delimiter: Characters that separate fields in each row.custom_row_delimiter: Characters that separate rows.
In this example, the file format is CustomSeparatedWithNames, and the custom_column_delimiter is; and the custom_row_delimiter is |.
Double quotes "" aren't supported as delimiters.
Configure queries⚓︎
ClickHouse DDL Queries
NOTE: This configuration is designed to match with the ClickHouse DDL Creation of Dictionaries
| Name | Definition |
|---|---|
filename |
Name of the dictionary file |
layout |
Depends on whether lookups use a composite or singular column as a key. Hydrolix recommends usinghashed for a single key to a single value, or complex_key_hashed for multiple keys to a single value. More information can be found at ClickHouse Dictionaries. |
lifetime_seconds |
How often ClickHouse refreshes the dictionary from the local file. 5 seconds is the recommended time, since Hydrolix checks every 5 seconds if the dictionary file in the cloud bucket has changed, and synchronizes it locally. |
primary_key |
The columns that for the primary key. Note primary should not be used in the output_columns definition. |
format |
The format of the dictionary file. This must be one of the supported Clickhouse Formats. |
Output column definition is similar to Transforms (Write Schema). The main differences are:
- A
primaryis not specified within theoutput_columns. - A timestamp isn't required.
- Only
nameanddatatype.typeare specified.
Example dictionary lookup⚓︎
This example shows how to set up a lookup dictionary for country names where the country is stored as id.
-
Upload the dictionary file into Hydrolix:
-
Create the dictionary definition:
-
Verify that your dictionary is present:
Or, to see all dictionaries available:
Show Dictionaries
The dictionary is now available for use both at the time of data ingest and query.
Query dictionaries⚓︎
Dictionary Names are Different from Table Names
To refer to dictionaries in SQL statements, prepend the dictionary name with the project name separated by an underscore. For example:
select * from myproject_mydictionary
To use dictionaries from SQL, use queries like this example, or the dictGet function. This allows specification of the dictionary, attribute, and key directly. See the ClickHouse documentation for a description of the dictGet function and its variants.