Getting Started with Dictionary

Dictionaries are a convenient method for looking up and translating values on the fly. Hydrolix loads these lookup values into memory for speedy ingestion processing of high volume event data.

The best kind of data to store in a dictionary is reference data which, for most use cases, doesn’t change too often.

In our scenario below, we need to load in a dictionary which stores a list of servers so that we can look up which customer owns the server and whether the server is even owned by a known customer. Below, we will:

  1. Load data in a dictionaries
  2. View the data
  3. Work with the data in a transform

1. Load Data into a Dictionary

To load data into a dictionary, you can either use the UI to upload a file and create a dictionary which refers to the file (if you need to update the data, you can upload a new file to replace the previously uploaded version).

In our case, we’ll use the following commands with HDXCLI since this is really easy to do:

hdxcli --profile devplay --project gb_test_logs dictionary files upload serverlist.csv serverlist -t verbatim
hdxcli --profile devplay --project gb_test_logs dictionary create server_lookup.json serverlist

The first line uploads the list of servers into the gb_test_logs project of the Hydrolix cluster identified by the devplay HDXCLI profile. The second line creates the dictionary into the same project

Here is the sample serverlist.csv file for the above commands:

serverid,customername
111,ACME
222,Globex
333,Umbrella

Here is the simple dictionary create server_lookup.json JSON file:

{
	"name": "serverlist",
	"settings": {
		"filename": "serverlist",
		"layout": "hashed",
		"lifetime_seconds": 5,
		"output_columns": [
			{
				"name": "serverid",
				"datatype": {
					"type": "string",
					"denullify": true
				}
			},
			{
				"name": "customername",
				"datatype": {
					"type": "string",
					"denullify": true
				}
			}
		],
		"primary_key": [
			"serverid"
		],
		"format": "CSVWithNames"
	}
}

Figure 1 illustrates creating a dictionary using HDXCLI in VS Code.

Figure 1: Upload a Dictionary File and Create a Dictionary Using VS Code

Having created the dictionary, you can see your dictionary in the data section of the UI (see Figure 2).

Figure 2: View Dictionary in the Data Section of the UI

If you click on the dictionary name, you can look at the dictionary's meta-data including schema (see Figure 3).

Figure 3: View Dictionary Details Including Data Model

2. View Data in the Dictionary

SQL in can be used to look at rows in a dictionary. The syntax will be slightly different than tables. To refer to a dictionary as a table in a SQL SELECT statement, combine the project name with the dictionary name with a “_”. Please see Figure 4 is showing a sample query for rows of the serverlist dictionary we just created within the gb_test_logs project.

Figure 4: Query Table Rows Using the UI

3. Work with Data While Ingesting Records

In Figure 5, we are creating a new table transform which will take the incoming serverid and set the following using a Transform SQL (the middle box):

  • Look up a customername in the dictionary using the serverid with the dictGetOrNull function. If the serverid does not exist the value is set to null.
  • Check to see if the serverid exists in the dictionary and set a boolean to true or false accordingly (0 or 1) using the dictHas function.

Figure 5: Use Dictionaries to Transform Data When Ingesting Data

If you'd like to try out this example, using the UI to create a new table and then create a new CSV transform and then you can enter the following into the edit transform screen.

Here is the sample Output Columns (left box):

[
  {
    "name": "primary",
    "datatype": {
      "type": "datetime",
      "index": false,
      "primary": true,
      "source": null,
      "format": "02/01/2006 15:04:05",
      "resolution": "seconds",
      "default": null,
      "script": null
    }
  },
    {
    "name": "serverid",
    "datatype": {
      "type": "string",
      "index": true,
      "source": null
    }
  },
  {
    "name": "customername",
    "datatype": {
      "type": "string",
      "index": true,
      "source": {
              "from_input_field": "sql_transform"     
      }
    }
  },
  {
    "name": "dropped",
    "datatype": {
      "type": "boolean",
      "index": true,
      "source": {
              "from_input_field": "sql_transform"

Here is the sample Transform SQL (middle box):

SELECT
	primary,
	serverid,
	dictGetOrNull('gb_test_logs_serverlist', 'customername', serverid) as customername,
	if(dictHas('gb_test_logs_serverlist', serverid), false, true) as dropped
FROM {STREAM}

Here is the sample data (right box):

14/06/2022 13:06:47,111
14/06/2022 13:06:48,444

Once you've entered the data into the transforms boxes, you can click on the Validate transform button to see the outcome of ingesting the sample data and, when ready, can click the Publish button, further below on the edit transform page, to persist the changes.

If you’d like to go deeper, check out: