Scripting
Hydrolix transforms can execute scripts to manipulate incoming data.
Add the script
property to a column definition to execute a JavaScript expression for every row of data ingested. The expression's output becomes the stored value for that column. Within the expression, you can access other columns from the same row with the following syntax:
"datatype": { "script": "row['field-name'])" }
Scripts run:
- after assigning non-scripted values (including defaults) to the row
- in the order defined in the transform
Therefore, script expressions have access to any non-scripted value in the same row as well as the expression output for any script defined earlier within the transform.
Performance
Avoid excessive scripting in transforms. Scripts execute each time Hydrolix ingests a row.
Create a New Column from an Existing Column
The following example creates a new column named ts_millis
from data in the existing column named timestamp
:
....
"settings": {
"output_columns": [
{
"name": "timestamp",
"datatype": {
"type": "epoch",
"primary": true,
"format": "s"
}
},
{
"name": "ts_millis",
"datatype": {
"type": "uint64",
"virtual": true,
"script": "new Date(row['timestamp']).getMilliseconds()"
}
}
]
}
....
Create a New Column from Multiple Existing Columns
For example, logs in W3C extended log file format use 2 separate fields for date and time, separated by a tab.
We can use the script function to create a virtual field that combines the date and time fields into a single timestamp to use as a primary key:
{
"name": "aws_cloudfront_transform",
"type": "csv",
"table": "{{tableid}}",
"settings": {
"is_default": true,
"compression": "gzip",
"output_columns": [
{
"name": "timestamp",
"source": { "from_input_index": 0 },
"datatype": {
"type": "datetime",
"script": "new Date(row['date'] + ' ' + row['hour'])",
"format": "2006-01-02 15:04:05",
"virtual": true,
"primary": true
}
},
{
"name": "date",
"source": { "from_input_index": 1 },
"datatype": {
"type": "string"
}
},
{
"name": "hour",
"source": { "from_input_index": 2 },
"datatype": {
"type": "string"
}
}
]
}
...
Updated about 2 months ago