Decay and Reaper
Data lifecycle management operates as a two-step process involving two services:
-
Decay identifies partitions containing data that has exceeded the configured retention period and marks them as "deactivated" in the Hydrolix catalog. Once deactivated, these partitions are hidden from new queries but remain available to complete any in-flight queries. The decay service runs continuously, checking partition ages based on the primary timestamp of the data itself.
-
Reaper permanently deletes partitions that have been deactivated for a specified grace period. It removes these partitions from both the catalog and object storage. The reaper service is also used by other Hydrolix services like merge-cleanup to handle partition deletion.
Configure data retention policies at the table level to manage how long data should remain alive before Decay and Reaper delete it.
Decay⚓︎
Decay is a cronjob that identifies partitions that contain old data and marks them for deletion based on configured data retention policies. It prevents unbounded storage consumption by deactivating partitions that exceed configured age thresholds, and sending deletion tasks to the Reaper service for physical removal from the catalog and object storage.
What Decay does⚓︎
Partition deactivation⚓︎
- Examines all tables with age.max_age_days configured (tables with value 0 are skipped)
- Calculates cutoff time as
current_time - age.max_age_days - Identifies active partitions where both
max_timestampandmin_timestampare older than the cutoff - Marks matching partitions as
inactivein the Catalog, which makes them unavailable to future queries - Sets deactivation_reason to
decayin partition metadata for audit purposes
Important notes:
- Age is calculated relative to the partition's primary datetime values, not the ingestion time.
- Decay only processes unlocked partitions. It skips partitions that are locked by Merge.
- Deactivated partitions continue serving any in-flight queries that were active at deactivation time.
Reaper queue management⚓︎
- Examines all tables with
reaper.max_age_daysconfigured (tables with value 0 are skipped) - Calculates cutoff time as
current_time - reaper.max_age_days - Identifies inactive, unlocked partitions where the modified timestamp (deactivation time) is older than the cutoff
- Creates ReapEvent objects for each partition and sends them to the Reaper queue in RabbitMQ
Important notes:
- Reaper age is calculated from the deactivation date set by Decay, not the original data timestamp.
Configuration⚓︎
Cluster-Level Settings (Kubernetes tunables):
| Parameter | Description | Default |
|---|---|---|
decay_enabled |
Enable/disable the Decay CronJob | true |
decay_schedule |
CRON schedule (daily at midnight UTC) | 0 0 * * * |
decay_batch_size |
Entries fetched per catalog request for deactivation phase | 5000 |
decay_max_deactivate_iterations |
Max deactivation loops per table. Default is unlimited | None |
decay_reap_batch_size |
Entries fetched per catalog request for reaping phase | 5000 |
decay_max_reap_iterations |
Max reaping loops per table. Default is unlimited | None |
Environment Variables:
BATCH_SIZE- Maps todecay_batch_sizeREAP_BATCH_SIZE- Maps todecay_reap_batch_sizeDECAY_MAX_ITERATIONS- Maps todecay_max_deactivate_iterationsREAP_MAX_ITERATIONS- Maps todecay_max_reap_iterationsREAPER_QUEUE- Queue ARN for reaper jobs
Table-Level Settings:
| Parameter | Description | Default |
|---|---|---|
age.max_age_days |
Days data remains active/queryable before deactivation | 0 (indefinite) |
Review Data Retention Policies for details on how to configure Time-to-Live (TTL) policies.
Manual execution⚓︎
Run Decay manually by creating a job with kubectl.
| Kubectl command to run Decay | |
|---|---|
In K9s, use command :cronjobs, select your decay job and press <t> for trigger.
The decay pod relating to the manual trigger will appear in the list of pods.
Metrics⚓︎
Decay does not have dedicated Prometheus metrics. Its database operations are tracked with database metrics labelled method="deactivate_entries" and db="catalog".
| Metric Name | Type | Description |
|---|---|---|
query_latency_summary |
Summary | Latency summary for deactivating partitions (p50, p90, p99) |
query_latency_histo |
Histogram | Latency histogram for deactivating partitions |
query_count |
Counter | Count of deactivation queries executed |
query_failure |
Counter | Count of failed deactivation queries |
Reaper⚓︎
The Reaper service deletes partitions that have been marked for deletion by Decay. It's also used by other services like Merge-Cleanup to handle partition deletion. It runs as a separate, continuous service consuming the reaper message queue from RabbitMQ.
What Reaper does⚓︎
For each ReapEvent read from RabbitMQ, the service:
- locates partition files on storage
- verifies file count
- deletes all files
- removes catalog entry
Configuration⚓︎
Table-Level Settings:
| Parameter | Description | Default |
|---|---|---|
reaper.max_age_days |
Days after deactivation before physical deletion | 1 |
Review Data Retention Policies for details on how to configure Time-to-Live (TTL) policies.
How to check the Reaper queue⚓︎
Navigate to the RabbitMQ pod shell and run command
| RabbitMQ command for queue checks | |
|---|---|
Metrics⚓︎
Reaper has dedicated Prometheus metrics
| Metric Name | Type | Labels | Description |
|---|---|---|---|
reaper_reap_duration |
Summary | storage_id, project_id, table_id |
Duration in milliseconds of a reaper's reap cycle (p50, p90, p99) |
reaper_failure |
Counter | storage_id, project_id, table_id |
Count of reaper failures |
When Reaper deletes catalog entries, they are tracked with database metrics labelled method="delete_deactivated_entry" and db="catalog".
| Metric Name | Type | Description |
|---|---|---|
query_latency_summary |
Summary | Latency for deleting catalog entries |
query_latency_histo |
Histogram | Latency histogram for catalog deletions |
query_count |
Counter | Count of deletion queries |
query_failure |
Counter | Count of failed deletions |