Shadow Transforms
Test data transforms safely before publishing
Overview
Shadow transforms are a safe way to test transforms without putting production data at risk.
This feature mirrors a sample of incoming data from a production table into a separate testing table to validate changes before publishing.
Shadow transform use cases
- Validate changes before deployment. See how a new or modified transform behaves under real conditions before publishing it to production.
- Catch anomalies. Test real data to find edge cases.
- Compare side-by-side results. Production and test paths run in parallel, so you can query both tables to see the differences.
- Measure performance impact. Test how your changes affect resource usage, output size, or processing time under an actual load.
Limitations
- Shadow transforms consume double the memory and CPU resources when active.
- Sampling is random per record, with no way to guarantee an identical set between runs.
- Continuous data mirroring on high-volume tables can generate large amounts of data.
- The shadow transform must use the same file type and compression as the production transform.
- Shadow transforms are only available for HTTP Stream API intake.
- The shadow table must be in the same project as the production table.
Shadow transform process
Intake uses data mirroring to split incoming data into two processing tasks:
- One task processes all data with the production transform and writes it to the production table.
- The other task randomly samples a small percentage, up to 5%, of the same incoming data and sends it to a target transform and table for testing.
Both tasks run at the same time, so tests can use live production input without affecting production results. This parallel processing uses roughly double the amount of CPU and memory load while active.
Shadow transforms with summary tables
When you enable a shadow transform, any summary tables built on that table also generate shadow versions. These shadow summaries use the sampled data, ingested as if it was a new request, so their results reflect only that subset of data.
Summary values may not match production summaries exactly, and shadow summaries should only be used for testing or comparison.
Enable shadow transforms
Shadow transforms require enabling the shadow_table
setting.
To enable the shadow transform, add the config JSON block and a non-zero rate
. Set the rate
from between 0.01
(one percent) up to 0.05
(five percent).
-
Choose the production transform to use as the baseline.
-
Choose or create a target table in the same project. It must be different from the source table.
-
(Optional) Choose or create a target transform on that table. The default transform is used otherwise.
-
Update the production transform's settings to include this block, and edit the table, transform, and rate values:
"shadow_table": { "table_id": "<shadow table id>", "transform_id": "<shadow transform id>", "rate": <Rate from 0.01 to 0.05> -> from 1% to 5% }
-
Save and deploy the updated transform config.
-
Verify that ingest requests to the production transform now produce data in the target table.
Set rate
to enable or disable shadow transforms
rate
to enable or disable shadow transformsA shadow transform is enabled when it has a rate
of more than 0
.
To leave it inactive, set the rate
to 0
. This disables the shadow transform until the rate is greater than zero.
Disable shadow transforms
Remove the data_mirroring_target
block from the production transform to disable the shadow transform.
Updated about 5 hours ago