Custom Dictionaries
Use custom dictionaries in Hydrolix
Overview
Dictionaries are a ClickHouse feature that provides a convenient mechanism to store and use additional key-value data. For example, use dictionaries with geographic lookups or other metadata commonly used across tables.
Prerequisites
Make sure you have these RBAC permissions to perform operations on dictionaries:
add_dictionary
change_dictionary
delete_dictionary
view_dictionary
dictGet_sql
If you need to modify the dictionary files, these permissions are needed:
add_dictionaryfile
change_dictionaryfile
delete_dictionaryfile
view_dictionaryfile
For more information, see User Permissions (RBAC).
Create a dictionary
- Upload the dictionary file using the API.
- Create a new dictionary definition using the API.
- Specify the dictionary definition in the
dictionary specification
below. - Follow one of the formats supported by ClickHouse Formats.
- The dictionary definition column names must match the field names in the input file exactly, including casing. Mismatched names result in an empty dictionary.
- Specify the dictionary definition in the
Dictionary format
Specify the dictionary in JSON using a similar format to Transform Output Columns.
In this example, a dictionary called country_dict
is configured to use a local file named country_dict.csv
. It uses the complex_key_hashed
layout, which supports multiple keys, and refreshes every five seconds. The file format is CSVWithNames
, and the dictionary uses the id
field as its primary key. The output_columns
section defines the structure and data types expected in the file.
{
"name":"country_dict",
"settings": {
"filename": "country_dict.csv",
"layout": "complex_key_hashed",
"lifetime_seconds": 5,
"output_columns": [
{
"name": "id",
"datatype": { "type": "uint64" }
},
{
"name": "country",
"datatype": { "type": "string" }
}
],
"primary_key": [ "id" ],
"format": "CSVWithNames"
}
}
Dictionary file or dictionary format
This example shows a dictionary source file as a CSV:
id,country
1,Spain
2,France
3,Germany
This file uses the CSVWithNames
format, which includes headers in the first row.
To match this, the dictionary format (output_columns
) must define column names and types exactly:
{
"name": "country_dict",
"settings": {
"filename": "country_dict.csv",
"layout": "hashed",
"lifetime_seconds": 5,
"primary_key": ["id"],
"format": "CSVWithNames",
"output_columns": [
{
"name": "id",
"datatype": { "type": "uint64" }
},
{
"name": "country",
"datatype": { "type": "string" }
}
]
}
}
The dictionary file does not define types. Types are specified in the
output_columns
section of the dictionary JSON.
For more supported types and format details, see the ClickHouse Formats documentation.
Set custom delimiters
This feature was added in version 5.3.
SQL in Hydrolix supports comma-separated values (CSV) by default. Custom delimiters can be set to support multiple types of delimiters.
- Choose a file format:
CustomSeparated
CustomSeparatedWithNames
CustomSeparatedWithNamesAndTypes
- Use these fields to specify custom delimiters for columns and rows:
custom_column_delimiter
: Characters that separate fields in each row.custom_row_delimiter
: Characters that separate rows.
In this example, the file format is CustomSeparatedWithNames
, and the custom_column_delimiter
is;
and the custom_row_delimiter
is |
.
"settings": {
"filename": "country_dict.csv",
"layout": "hashed",
"lifetime_seconds": 5,
"primary_key": ["id"],
"format": "CustomSeparatedWithNames",
"custom_column_delimiter": ";",
"custom_row_delimiter": "|",
"output_columns": [
{
"datatype": {
"type": "string"
},
"name": "country"
},
{
"datatype": {
"type": "uint64"
},
"name": "id"
}
]
}
Double quotes
""
are not supported as delimiters.
Configure queries
ClickHouse DDL Queries
NOTE: This configuration is designed to match with the ClickHouse DDL Creation of Dictionaries
Name | Definition |
---|---|
filename | Name of the dictionary file |
layout | Depends on whether lookups use a composite or singular column as a key. Hydrolix recommends usinghashed for a single key to a single value, or complex_key_hashed for multiple keys to a single value. More information can be found at ClickHouse Dictionaries. |
lifetime_seconds | How often ClickHouse refreshes the dictionary from the local file. 5 seconds is the recommended time, since Hydrolix checks every 5 seconds if the dictionary file in the cloud bucket has changed, and synchronizes it locally. |
primary_key | The columns that for the primary key. Note primary should not be used in the output_columns definition. |
format | The format of the dictionary file. This must be one of the supported Clickhouse Formats. |
Output column definition is similar to Transforms (Write Schema). The main differences are:
- A
primary
is not specified within theoutput_columns
. - A timestamp isn't required.
- Only
name
anddatatype.type
are specified.
Example dictionary lookup
This example shows how to set up a lookup dictionary for country names where the country is stored as id
.
- Upload the dictionary file into Hydrolix:
curl --request POST \
--url 'https://{hostname}.hydrolix.live/config/v1/orgs/{org_id}/projects/{project_id}/dictionaries/files/'
--header 'Authorization: Bearer {{access_token}}' \
-F file=@<filename>
--form-string 'name=<name to reference the file>'
- Create the dictionary definition:
POST https://{hostname}.hydrolix.live/config/v1/orgs/{org_id}/projects/{project_id}/dictionaries/
Authorization: Bearer {{access_token}}
Content-Type: application/json
{
"name":"country_dict",
"settings": {
"filename": "country_dict.csv",
"layout": "complex_key_hashed",
"lifetime_seconds": 5,
"output_columns": [
{
"datatype": {
"type": "string"
},
"name": "country"
},
{
"datatype": {
"type": "uint64"
},
"name": "id"
}
],
"primary_key": [ "id" ],
"format": "CSVWithNames"
}
}
- Verify that your dictionary is present:
EXISTS {{projectname}}_country_dict
┌─result─┐
│ 1 │
└────────┘
Or, to see all dictionaries available:
SHOW DICTIONARIES
The dictionary is now available for use both at the time of data ingest and query.
Query dictionaries
Dictionary Names are Different from Table Names
To refer to dictionaries in SQL statements, prepend the dictionary name with the project name separated by an underscore. For example:
select * from myproject_mydictionary
To use dictionaries from SQL, use queries like this example, or the dictGet
function. This allows specification of the dictionary, attribute, and key directly. See the ClickHouse documentation for a description of the dictGet
function and its variants.
See also
Updated 23 days ago