Scripting

Hydrolix transforms can execute scripts to manipulate incoming data.

Add the script property to a column definition to execute a JavaScript expression for every row of data ingested. The expression's output becomes the stored value for that column. Within the expression, you can access other columns from the same row with the following syntax:

"datatype": { "script": "row['field-name'])" }

Scripts run:

  • after assigning non-scripted values (including defaults) to the row
  • in the order defined in the transform

Therefore, script expressions have access to any non-scripted value in the same row as well as the expression output for any script defined earlier within the transform.

🚧

Performance

Avoid excessive scripting in transforms. Scripts execute each time Hydrolix ingests a row.

Create a New Column from an Existing Column

The following example creates a new column named ts_millis from data in the existing column named timestamp:

....
"settings": {
   "output_columns": [
        {
        	"name": "timestamp",
        	"datatype": {
            	"type": "epoch",
            	"primary": true,
            	"format": "s"
            }
        },
        {
        	"name": "ts_millis",
        	"datatype": {
            	"type": "uint64",
            	"virtual": true,
            	"script": "new Date(row['timestamp']).getMilliseconds()"
            }
        }
    ]
}
....

Create a New Column from Multiple Existing Columns

For example, logs in W3C extended log file format use 2 separate fields for date and time, separated by a tab.

We can use the script function to create a virtual field that combines the date and time fields into a single timestamp to use as a primary key:

{
  "name": "aws_cloudfront_transform",
  "type": "csv",
  "table": "{{tableid}}",
  "settings": {
      "is_default": true,
      "compression": "gzip",
      "output_columns": [
          {
              "name": "timestamp",
              "source": { "from_input_index": 0 },
              "datatype": {
                  "type": "datetime",
                  "script": "new Date(row['date'] + '	' + row['hour'])",
                  "format": "2006-01-02 15:04:05",
                  "virtual": true,
                  "primary": true
              }
          },
          {
              "name": "date",
              "source": { "from_input_index": 1 },
              "datatype": {
                  "type": "string"
              }
          },
          {
              "name": "hour",
              "source": { "from_input_index": 2 },
              "datatype": {
                  "type": "string"
              }
          }
     ]
  }
...