Custom Dictionaries
Hydrolix has the ability to leverage in-memory dictionaries for accelerated look-up functions.
Dictionaries
Dictionaries are a Clickhouse capability that provides a convenient mechanism to store and utilize an additional key value data. For example this could be used with Geographic lookups or additional metadata that maybe commonly used across tables.
To deploy a dictionary there are two steps:
-
Upload your dictionary file using the API
-
Create a new dictionary definition using the API.
The dictionary definition can be specified within thedictionary specification
below. It should follow one of the formats supported by Clickhouse Formats
The dictionary is specified with the following settings, using a similar format as can be found within a Transforms Output Columns.
{
"name":"country_dict",
"settings": {
"filename": "country_dict.csv",
"layout": "complex_key_hashed",
"lifetime_seconds": 5,
"output_columns": [
{
.............
}
],
"primary_key": [ "id" ],
"format": "CSVWithNames"
}
}
Dictionary Configuration details:
NOTE: This configuration tries to match 1-to-1 with ClickHouse DDL Creation of Dictionaries, so it may help to read up on that first.
Name | Definition |
---|---|
filename | Name of the dictionary file |
layout | Will depend on if you want lookups to use a composite or singular column as a key. Recommended |
lifetime_seconds | How often ClickHouse refreshes dictionary from local file. Recommended set to |
primary_key | The column/s that should be used for the primary key (note |
Format | The format for the file being used as a dictionary. Supported Clickhouse Formats. |
Output columns is similar to Transforms (Write Schema). The only difference is a primary
is not specified within the output_columns and a timestamp isn't required.
Example
For example a lookup "Dictionary" for the country name where the country is stored as id
can be implemented as follows:
- First, upload your dictionary file into Hydrolix
curl --request POST \
--url 'https://{hostname}.hydrolix.live/config/v1/orgs/{org_id}/projects/{project_id}/dictionaries/files/'
--header 'Authorization: Bearer {{access_token}}' \
-F [email protected]<filename>
--form-string 'name=<name to reference the file>'
- Then create your dictionary definition:
POST https://{hostname}.hydrolix.live/config/v1/orgs/{org_id}/projects/{project_id}/dictionaries/
Authorization: Bearer {{access_token}}
Content-Type: application/json
{
"name":"country_dict",
"settings": {
"filename": "country_dict.csv",
"layout": "complex_key_hashed",
"lifetime_seconds": 5,
"output_columns": [
{
"datatype": {
"type": "string"
},
"name": "country"
},
{
"datatype": {
"type": "uint64"
},
"name": "id"
}
],
"primary_key": [ "id" ],
"format": "CSVWithNames"
}
}
- Verify that your dictionary is present:
EXISTS {{projectname}}_country_dict
ββresultββ
β 1 β
ββββββββββ
The dictionary is now available for use both at time of data ingest, but also at Query time.
Using Dictionaries
To use dictionaries you can use the dictGet function.
More information about these functions can be found here https://clickhouse.com/docs/en/sql-reference/functions/ext-dict-functions/#dictget.
Updated 17 days ago