Pre-transforms for JSON subtype chaining

Pre-transforms are a transform configuration option that allows you to chain JSON subtypes together before parsing. Specify multiple subtypes in your transform's format_details and process data from sources like Amazon CloudWatch logs delivered through Amazon Data Firehose without requiring AWS Lambda functions for preprocessing. Use pre-transforms to unwrap each layer of your data sequentially.

This feature is available in Hydrolix v5.6 and later. UI support for pre-transforms is available in Hydrolix v5.8 and later.

Pre-transform subtypes

Pre-transforms support chaining the following JSON subtypes:

  • firehose - Unwraps Amazon Data Firehose message format
  • firehose/gzip - Unwraps gzip-compressed Amazon Data Firehose messages
  • cloudwatch - Unwraps Amazon CloudWatch log format
  • mPulse - Unwraps Akamai mPulse data format

Valid subtype combinations

Subtypes must be listed in the order the data needs to be unwrapped. For example, if Firehose wraps Cloudwatch data, use ["firehose", "cloudwatch"], not ["cloudwatch", "firehose"].

Pre-transform lists with more than one subtype must end with either cloudwatch or mPulse as the final subtype.

Subtype ChainDescription
["firehose"]Firehose data only
["firehose/gzip"]Compressed Firehose data only
["cloudwatch"]CloudWatch logs only
["mPulse"]mPulse data only
["firehose", "cloudwatch"]CloudWatch logs through Firehose
["firehose", "firehose/gzip"]Compressed Firehose wrapped in Firehose
["firehose", "mPulse"]mPulse data through Firehose
["firehose/gzip", "firehose"]Firehose wrapped in compressed Firehose
["firehose/gzip", "cloudwatch"]Compressed CloudWatch logs through Firehose
["firehose/gzip", "mPulse"]Compressed mPulse data through Firehose
["firehose", "firehose/gzip", "cloudwatch"]CloudWatch through multiple Firehose layers
["firehose", "firehose/gzip", "mPulse"]mPulse through multiple Firehose layers
["firehose/gzip", "firehose", "cloudwatch"]CloudWatch through compressed and standard Firehose
["firehose/gzip", "firehose", "mPulse"]mPulse through compressed and standard Firehose
["firehose/gzip", "cloudwatch", "mPulse"]mPulse wrapping CloudWatch through compressed Firehose

Configure pre-transforms

To create or update a transform with pre-transforms, include the pretransforms array in the format_details section of your transform definition.

The transform in the following example ingests compressed Cloudwatch logs through Firehose.

{
  "name": "cloudwatch_firehose_transform",
  "type": "json",
  "table": "your_table_name",
  "settings": {
    "output_columns": [
      {
        "name": "timestamp",
        "datatype": {
          "type": "datetime",
          "format": "2006-01-02T15:04:05.000Z",
          "primary": true
        }
      },
      {
        "name": "message",
        "datatype": {
          "type": "string"
        }
      }
    ],
    "format_details": {
      "pretransforms": ["firehose/gzip", "cloudwatch"]
    }
  }
}

Troubleshooting

Data ingestion fails with "object is missing expected 'records' key"

Cause: Your subtypes are in the wrong order inside your pretransforms array, or you're using an invalid combination.

Solution:

  • Verify that cloudwatch or mPulse is the last subtype in your chain.
  • Ensure the order matches how your data was wrapped (last applied subtype should be first in the array).
  • Review the valid combinations above.

Cannot create transform with both "pretransforms" and "subtype"

Cause: The pretransforms and subtype fields are mutually exclusive.

Solution: Remove the subtype field from your transform definition. Use only pretransforms.

Data from mPulse or CloudWatch cannot be further processed

Cause: Once data is unwrapped to mPulse or cloudwatch format, it cannot have additional subtypes applied.

Solution: This is expected behavior. Ensure these subtypes are always last in your pre-transform chain.

Additional resources