Input Scripting

Hydrolix offers the ability within transforms to execute a script to manipulate incoming data.

The script property, when present in a column definition, contains a JavaScript expression that executes for every row of data ingested. The expression's output becomes the stored value for that column.

The expression may, as part of its computation, read the values of other fields in the same row. Hydrolix runs each new row's JavaScript expressions in the order the transform defines them -- after it has set all of the row's non-scripted values (including defaults). A script-defined expression thus has access to any non-scripted value in the same row, as well as the row's value for any scripted column defined earlier within the transform.

In the following example, the field named "ts_millis" derives its value based on data in the field named "timestamp":

.....
"settings": {
   "output_columns": [
        {
            "name": "timestamp",
            "datatype": {
                "type": "epoch",
                "primary": true,
                "format": "s"
            }
        },
        {
            "name": "ts_millis",
            "datatype": {
                "type": "uint64",
                "virtual": true,
                "script": "new Date(row['timestamp']).getMilliseconds()"
            }
        }
    ]
}
......

Another example if you have logs in W3C extended log file format those have 2 separate fields for date and time separated by a TAB.
We can use the script function to create a virtual field which combined the date and time field into a single timestamp which will be used as our primary key:

{
  "name": "aws_cloudfront_transform",
  "type": "csv",
  "table": "{{tableid}}",
  "settings": {
      "is_default": true,
      "compression": "gzip",
      "output_columns": [
          {
              "name": "timestamp",
              "position": 0,
              "datatype": {
                  "type": "datetime",
                  "script": "new Date(row['date'] +' '+ row['hour'])",
                  "format": "2006-01-02 15:04:05",
                  "virtual": true,
                  "primary": true
              }
          },
          {
              "name": "date",
              "position": 0,
              "datatype": {
                  "type": "string"
              }
          },
          {
              "name": "hour",
              "position": 1,
              "datatype": {
                  "type": "string"
              }
          }

You can also create a new column based on an existing on from your data:
This will create a new column named test which is going to be a copy from the column #3 for CSV

{
  "name": "copy_example",
  "type": "csv",
  "table": "{{tableid}}",
  "settings": {
      "is_default": true,
      "compression": "none",
"output_columns": [
  {
  "name": "test",
  "datatype": {
    "type": "string",
    "index": false,
    "source": { "from_input_index": 3 }
  }
]
}

And finally the same example this time for a JSON input, this will copy the field initial_name and create a new one called test.

{
    "name": "copy_example",
    "type": "json",
    "table": "{{tableid}}",
    "settings": {
        "is_default": true,
        "compression": "none",
        "output_columns": [
            {
                "name": "test",
                "datatype": {
                    "type": "string",
                    "index": true,
                    "source": {
                        "from_input_field": "initial_name"
                    }
                }
            }
          ]
        }
    }

This solution also allows you to index multiple times the same data with a different type.


Did this page help you?