Skip to content

Decay and Reaper

Data lifecycle management operates as a two-step process involving two services:

  • Decay identifies partitions containing data that has exceeded the configured retention period and marks them as "deactivated" in the Hydrolix catalog. Once deactivated, these partitions are hidden from new queries but remain available to complete any in-flight queries. The decay service runs continuously, checking partition ages based on the primary timestamp of the data itself.

  • Reaper permanently deletes partitions that have been deactivated for a specified grace period. It removes these partitions from both the catalog and object storage. The reaper service is also used by other Hydrolix services like merge-cleanup to handle partition deletion.

Configure data retention policies at the table level to manage how long data should remain alive before Decay and Reaper delete it.

Decay⚓︎

Decay is a cronjob that identifies partitions that contain old data and marks them for deletion based on configured data retention policies. It prevents unbounded storage consumption by deactivating partitions that exceed configured age thresholds, and sending deletion tasks to the Reaper service for physical removal from the catalog and object storage.

What Decay does⚓︎

Partition deactivation⚓︎

  1. Examines all tables with age.max_age_days configured (tables with value 0 are skipped)
  2. Calculates cutoff time as current_time - age.max_age_days
  3. Identifies active partitions where both max_timestamp and min_timestamp are older than the cutoff
  4. Marks matching partitions as inactive in the Catalog, which makes them unavailable to future queries
  5. Sets deactivation_reason to decay in partition metadata for audit purposes

Important notes:

  • Age is calculated relative to the partition's primary datetime values, not the ingestion time.
  • Decay only processes unlocked partitions. It skips partitions that are locked by Merge.
  • Deactivated partitions continue serving any in-flight queries that were active at deactivation time.

Reaper queue management⚓︎

  1. Examines all tables with reaper.max_age_days configured (tables with value 0 are skipped)
  2. Calculates cutoff time as current_time - reaper.max_age_days
  3. Identifies inactive, unlocked partitions where the modified timestamp (deactivation time) is older than the cutoff
  4. Creates ReapEvent objects for each partition and sends them to the Reaper queue in RabbitMQ

Important notes:

  • Reaper age is calculated from the deactivation date set by Decay, not the original data timestamp.

Configuration⚓︎

Cluster-Level Settings (Kubernetes tunables):

Parameter Description Default
decay_enabled Enable/disable the Decay CronJob true
decay_schedule CRON schedule (daily at midnight UTC) 0 0 * * *
decay_batch_size Entries fetched per catalog request for deactivation phase 5000
decay_max_deactivate_iterations Max deactivation loops per table. Default is unlimited None
decay_reap_batch_size Entries fetched per catalog request for reaping phase 5000
decay_max_reap_iterations Max reaping loops per table. Default is unlimited None

Environment Variables:

  • BATCH_SIZE - Maps to decay_batch_size
  • REAP_BATCH_SIZE - Maps to decay_reap_batch_size
  • DECAY_MAX_ITERATIONS - Maps to decay_max_deactivate_iterations
  • REAP_MAX_ITERATIONS - Maps to decay_max_reap_iterations
  • REAPER_QUEUE - Queue ARN for reaper jobs

Table-Level Settings:

Parameter Description Default
age.max_age_days Days data remains active/queryable before deactivation 0 (indefinite)

Review Data Retention Policies for details on how to configure Time-to-Live (TTL) policies.

Manual execution⚓︎

Run Decay manually by creating a job with kubectl.

Kubectl command to run Decay
kubectl create job --from=cronjob/decay {decay-manual-job-name} -n $HDX_KUBERNETES_NAMESPACE

In K9s, use command :cronjobs, select your decay job and press <t> for trigger.

The decay pod relating to the manual trigger will appear in the list of pods.

Metrics⚓︎

Decay does not have dedicated Prometheus metrics. Its database operations are tracked with database metrics labelled method="deactivate_entries" and db="catalog".

Metric Name Type Description
query_latency_summary Summary Latency summary for deactivating partitions (p50, p90, p99)
query_latency_histo Histogram Latency histogram for deactivating partitions
query_count Counter Count of deactivation queries executed
query_failure Counter Count of failed deactivation queries

Reaper⚓︎

The Reaper service deletes partitions that have been marked for deletion by Decay. It's also used by other services like Merge-Cleanup to handle partition deletion. It runs as a separate, continuous service consuming the reaper message queue from RabbitMQ.

What Reaper does⚓︎

For each ReapEvent read from RabbitMQ, the service:

  1. locates partition files on storage
  2. verifies file count
  3. deletes all files
  4. removes catalog entry

Configuration⚓︎

Table-Level Settings:

Parameter Description Default
reaper.max_age_days Days after deactivation before physical deletion 1

Review Data Retention Policies for details on how to configure Time-to-Live (TTL) policies.

How to check the Reaper queue⚓︎

Navigate to the RabbitMQ pod shell and run command

RabbitMQ command for queue checks
rabbitmqctl --node rabbit@rabbitmq-0.rabbit list_queues

Metrics⚓︎

Reaper has dedicated Prometheus metrics

Metric Name Type Labels Description
reaper_reap_duration Summary storage_id, project_id, table_id Duration in milliseconds of a reaper's reap cycle (p50, p90, p99)
reaper_failure Counter storage_id, project_id, table_id Count of reaper failures

When Reaper deletes catalog entries, they are tracked with database metrics labelled method="delete_deactivated_entry" and db="catalog".

Metric Name Type Description
query_latency_summary Summary Latency for deleting catalog entries
query_latency_histo Histogram Latency histogram for catalog deletions
query_count Counter Count of deletion queries
query_failure Counter Count of failed deletions