Skip to content

Pre-transforms

Pre-transforms are a transform configuration option that chains JSON subtypes together before parsing. Specify multiple subtypes in your transform's format_details and process data from sources like Amazon CloudWatch logs delivered through Amazon Data Firehose without requiring AWS Lambda functions for preprocessing. Use pre-transforms to unwrap each layer of your data sequentially.

This feature was introduced in Hydrolix version v5.6. UI support for pre-transforms is available in Hydrolix version v5.8 and later.

Pre-transform subtypes⚓︎

Pre-transforms support chaining the following JSON subtypes:

  • firehose - Unwraps Amazon Data Firehose message format
  • firehose/gzip - Unwraps gzip-compressed Amazon Data Firehose messages
  • cloudwatch - Unwraps Amazon CloudWatch log format
  • cloudtrail - Unwraps Amazon CloudTrail log format
  • mPulse - Unwraps Akamai mPulse data format

Chaining rules⚓︎

List subtypes in the order the data needs to be unwrapped. The outermost layer comes first. For example, if Firehose wraps CloudWatch data, use ["firehose", "cloudwatch"], not ["cloudwatch", "firehose"].

Each subtype can be used on its own. The firehose and firehose/gzip subtypes expect an Amazon Data Firehose envelope, so they'll always be at the beginning of the chain if used. If using cloudtrail or mPulse, make sure they're the last entries in the chain.

Examples⚓︎

Pre-transform chain Description
["cloudtrail"] CloudTrail logs ingested directly
["firehose", "cloudwatch"] CloudWatch logs delivered through Firehose
["firehose/gzip", "cloudtrail"] CloudTrail logs delivered through compressed Firehose
["firehose", "firehose/gzip", "mPulse"] mPulse data through multiple Firehose layers
["firehose/gzip", "cloudwatch", "mPulse"] mPulse data through CloudWatch through compressed Firehose

Configure pre-transforms⚓︎

To create or update a transform with pre-transforms, include the pretransforms array in the format_details section of your transform definition.

The following examples show transforms configured with pre-transforms for different data sources.

{
  "name": "cloudwatch_firehose_transform",
  "type": "json",
  "table": "your_table_name",
  "settings": {
    "output_columns": [
      {
        "name": "timestamp",
        "datatype": {
          "type": "datetime",
          "format": "2006-01-02T15:04:05.000Z",
          "primary": true
        }
      },
      {
        "name": "message",
        "datatype": {
          "type": "string"
        }
      }
    ],
    "format_details": {
      "pretransforms": ["firehose/gzip", "cloudwatch"]
    }
  }
}
{
  "name": "cloudtrail_transform",
  "type": "json",
  "table": "your_table_name",
  "settings": {
    "output_columns": [
      {
        "name": "eventTime",
        "datatype": {
          "type": "datetime",
          "format": "2006-01-02T15:04:05Z",
          "primary": true
        }
      },
      {
        "name": "eventSource",
        "datatype": {
          "type": "string"
        }
      },
      {
        "name": "eventName",
        "datatype": {
          "type": "string"
        }
      },
      {
        "name": "awsRegion",
        "datatype": {
          "type": "string"
        }
      },
      {
        "name": "sourceIPAddress",
        "datatype": {
          "type": "string"
        }
      }
    ],
    "format_details": {
      "pretransforms": ["cloudtrail"]
    }
  }
}

Troubleshooting⚓︎

Data ingestion fails with "object is missing expected 'records' key"⚓︎

Cause: Your subtypes are in the wrong order inside your pretransforms array, or you're using an invalid combination.

Solution:

  • Ensure the order matches how your data was wrapped (last applied subtype should be first in the array).
  • Review the chaining rules above.

Can't create transform with both "pretransforms" and "subtype"⚓︎

Cause: The pretransforms and subtype fields are mutually exclusive.

Solution: Remove the subtype field from your transform definition. Use only pretransforms.

Data from mPulse or CloudTrail silent failure⚓︎

Cause: Once data is unwrapped to mPulse or cloudtrail format, additional subtypes applied will cause data to be dropped.

Solution: This is expected behavior. Ensure these subtypes are always last in your pre-transform chain.

Additional resources⚓︎