Input Scripting

Hydrolix offers the ability within transforms to execute a script to manipulate incoming data.

The script property, when present in a column definition, contains a JavaScript expression that executes for every row of data ingested. The expression's output becomes the stored value for that column.

The expression may, as part of its computation, read the values of other fields in the same row. Hydrolix runs each new row's JavaScript expressions in the order the transform defines them, and after it has set all of the row's non-scripted values, including defaults. A script-defined expression, then, has access to any non-scripted value in the same row, as well as the row's value for any scripted column defined earlier within the transform.

In the following example, the field named "ts_millis" derives its value based on data in the field named "timestamp":

.....
"settings": {
   "output_columns": [
        {
            "name": "timestamp",
            "datatype": {
                "type": "epoch",
                "primary": true,
                "format": "s"
            }
        },
        {
            "name": "ts_millis",
            "datatype": {
                "type": "uint64",
                "virtual": true,
                "script": "new Date(row['timestamp']).getMilliseconds()"
            }
        }
    ]
}
......

Another example if you have logs in W3C extended log file format those have 2 separate fields for date and time separated by a TAB.
We can use the script function to create a virtual field which combined the date and time field into a single timestamp which will be used as our primary key:

{
  "name": "aws_cloudfront_transform",
  "type": "csv",
  "table": "{{tableid}}",
  "settings": {
      "is_default": true,
      "compression": "gzip",
      "output_columns": [
          {
              "name": "timestamp",
              "position": 0,
              "datatype": {
                  "type": "datetime",
                  "script": "new Date(row['date'] +' '+ row['hour'])",
                  "format": "2006-01-02 15:04:05",
                  "virtual": true,
                  "primary": true
              }
          },
          {
              "name": "date",
              "position": 0,
              "datatype": {
                  "type": "string"
              }
          },
          {
              "name": "hour",
              "position": 1,
              "datatype": {
                  "type": "string"
              }
          }

Did this page help you?