Custom Dictionaries
Hydrolix can use in-memory dictionaries for accelerated lookup functions.
Overview
Dictionaries are a ClickHouse capability that provides a convenient mechanism to store and utilize additional key-value data. For example, this could be used with geographic lookups or other metadata commonly used across tables.
You will need the correct RBAC permissions to use dictionaries.
Creating a Dictionary
To create a dictionary, there are two steps:
-
Upload your dictionary file using the API.
-
Create a new dictionary definition using the API.
The dictionary definition can be specified within thedictionary specification
below. It should follow one of the formats supported by ClickHouse Formats
Definition Format
Specify the dictionary via JSON using a similar format to Transforms Output Columns.
{
"name":"country_dict",
"settings": {
"filename": "country_dict.csv",
"layout": "complex_key_hashed",
"lifetime_seconds": 5,
"output_columns": [
{
.............
}
],
"primary_key": [ "id" ],
"format": "CSVWithNames"
}
}
Configuration
ClickHouse DDL Queries
NOTE: This configuration is designed to match with the ClickHouse DDL Creation of Dictionaries
Name | Definition |
---|---|
filename | Name of the dictionary file |
layout | Will depend on if you want lookups to use a composite or singular column as a key. We recommend hashed for a single key to a single value, or complex_key_hashed for multiple keys to a single value. More information can be found in the ClickHouse documentation |
lifetime_seconds | How often ClickHouse refreshes the dictionary from the local file. We recommend 5 seconds, since Hydrolix checks every 5 seconds if the dictionary file in the cloud bucket has changed, and synchronizes it locally. |
primary_key | The column/s that should be used for the primary key (note primary should not be used in the output_columns definition). |
format | The format of the dictionary file. This must be one of the supported Clickhouse Formats. |
Output column definition is similar to Transforms (Write Schema). The main differences are:
- a
primary
is not specified within the output_columns - a timestamp isn't required
- only
name
anddatatype.type
are specified
Permissions
Make sure you have these RBAC permissions to perform operations on dictionaries:
add_dictionary
change_dictionary
delete_dictionary
view_dictionary
dictGet_sql
If you need to modify the dictionary files, these permissions are needed:
add_dictionaryfile
change_dictionaryfile
delete_dictionaryfile
view_dictionaryfile
For more information, see User Permissions (RBAC).
Example
Consider a lookup dictionary for country names where the country is stored as id
. This can be implemented as follows:
- Upload the dictionary file into Hydrolix:
curl --request POST \
--url 'https://{hostname}.hydrolix.live/config/v1/orgs/{org_id}/projects/{project_id}/dictionaries/files/'
--header 'Authorization: Bearer {{access_token}}' \
-F file=@<filename>
--form-string 'name=<name to reference the file>'
- Create the dictionary definition:
POST https://{hostname}.hydrolix.live/config/v1/orgs/{org_id}/projects/{project_id}/dictionaries/
Authorization: Bearer {{access_token}}
Content-Type: application/json
{
"name":"country_dict",
"settings": {
"filename": "country_dict.csv",
"layout": "complex_key_hashed",
"lifetime_seconds": 5,
"output_columns": [
{
"datatype": {
"type": "string"
},
"name": "country"
},
{
"datatype": {
"type": "uint64"
},
"name": "id"
}
],
"primary_key": [ "id" ],
"format": "CSVWithNames"
}
}
- Verify that your dictionary is present:
EXISTS {{projectname}}_country_dict
┌─result─┐
│ 1 │
└────────┘
Or, to see all dictionaries available:
SHOW DICTIONARIES
The dictionary is now available for use both at time of data ingest, but also at Query time.
Querying Dictionaries
Dictionary Names Are Different From Table Names
To refer to dictionaries in SQL statements, prepend the dictionary name with the project name separated by an underscore. For example:
select * from myproject_mydictionary
To use dictionaries from SQL, use queries like the above callout, or you can use the dictGet
function, allowing you to specify the dictionary, attribute, and key directly. See the ClickHouse documentation for a good description of the dictGet
function and its variants.
Updated 4 months ago