Custom Dictionaries
Hydrolix has the ability to leverage in-memory dictionaries for accelerated look-up functions.
Dictionaries
Dictionaries are a Clickhouse capability that provides a convenient mechanism to store and utilize an additional key value data. For example this could be used with Geographic lookups or additional metadata that maybe commonly used across tables.
To deploy a dictionary there are two steps:
-
Upload your dictionary file using the API
-
Create a new dictionary definition using the API.
The dictionary definition can be specified within thedictionary specification
below. It should follow one of the formats supported by Clickhouse Formats
The dictionary is specified with the following settings, using a similar format as can be found within a Transforms Output Columns.
{
"name":"country_dict",
"settings": {
"filename": "country_dict.csv",
"layout": "complex_key_hashed",
"lifetime_seconds": 5,
"output_columns": [
{
.............
}
],
"primary_key": [ "id" ],
"format": "CSVWithNames"
}
}
Dictionary Configuration details:
NOTE: This configuration tries to match 1-to-1 with ClickHouse DDL Creation of Dictionaries, so it may help to read up on that first.
Name | Definition |
---|---|
filename | Name of the dictionary file |
layout | Will depend on if you want lookups to use a composite or singular column as a key. Recommended hashed for single key to single value, or complex_key_hashed for multiple keys to a single value. More information can be found here - https://clickhouse.com/docs/en/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-layout/ |
lifetime_seconds | How often ClickHouse refreshes dictionary from local file. Recommended set to 5 . Reason: regardless of setting, Hydrolix checks every 5 seconds if the dictionary file in the cloud bucket has changed, and synchronises it locally. |
primary_key | The column/s that should be used for the primary key (note primary should not be used in the output_columns definition). |
Format | The format for the file being used as a dictionary. Supported Clickhouse Formats. |
Output columns is similar to Transforms (Write Schema). The only difference is a primary
is not specified within the output_columns and a timestamp isn't required.
Example
For example a lookup "Dictionary" for the country name where the country is stored as id
can be implemented as follows:
- First, upload your dictionary file into Hydrolix
curl --request POST \
--url 'https://{hostname}.hydrolix.live/config/v1/orgs/{org_id}/projects/{project_id}/dictionaries/files/'
--header 'Authorization: Bearer {{access_token}}' \
-F file=@<filename>
--form-string 'name=<name to reference the file>'
- Then create your dictionary definition:
POST https://{hostname}.hydrolix.live/config/v1/orgs/{org_id}/projects/{project_id}/dictionaries/
Authorization: Bearer {{access_token}}
Content-Type: application/json
{
"name":"country_dict",
"settings": {
"filename": "country_dict.csv",
"layout": "complex_key_hashed",
"lifetime_seconds": 5,
"output_columns": [
{
"datatype": {
"type": "string"
},
"name": "country"
},
{
"datatype": {
"type": "uint64"
},
"name": "id"
}
],
"primary_key": [ "id" ],
"format": "CSVWithNames"
}
}
- Verify that your dictionary is present:
EXISTS {{projectname}}_country_dict
┌─result─┐
│ 1 │
└────────┘
Or, to see all dictionaries available:
SHOW DICTIONARIES
Using dictionaries
It should be noted that dictionaries naming in the select statement is different to a regular table. If for example you wanted to
select *
from a dictionary you would prepend the dictionary name with the project name separated by an_
. For example
select * from myproject_mydictionary
The dictionary is now available for use both at time of data ingest, but also at Query time.
Using Dictionaries
To use dictionaries you can use the dictGet function.
More information about these functions can be found here https://clickhouse.com/docs/en/sql-reference/functions/ext-dict-functions/#dictget.
Updated 14 days ago