Skip to content

Partition Cleaner

The partition cleaner deletes partitions no longer referenced by the Hydrolix catalog.

When Hydrolix optimizes storage usage, it sometimes merges data from multiple partitions or time periods into a single partition. During this optimization step, Hydrolix doesn't automatically delete unused partitions or their data. Instead, the catalog removes references or pointers to the unused partitions.

The partition cleaner deletes partitions that have been rendered redundant for at least 24 hours. By default, partition cleaner runs weekly on Monday at 12:00 AM UTC.

The default partition cleaning schedule changed from daily to weekly in v5.6.

Configuration⚓︎

The partition cleaner runs as a scheduled job and can be configured using environment variables on the partition-cleaner deployment. To enable the partition cleaner, set the partition-cleaner replica to 1.

Schedule configuration:

  • Environment variable: PARTITION_CLEANER_SCHEDULE
  • Format: UNIX cron notation
  • Default: weekly on Monday at 12:00 AM UTC
  • Example: 0 0 * * 1 (default)

Dry run

  • Environment variable: PARTITION_CLEANER_DRY_RUN
  • Values: true (default) or false

Grace period:

  • Environment variable: PARTITION_CLEANER_GRACE_PERIOD
  • Format: Duration string (24h)
  • Default: 24 hours

Metrics⚓︎

The partition-cleaner exports metrics to monitor partition cleaner behavior:

Metric Name Purpose Example
bulk_delete_bytes How much data was deleted bulk_delete_bytes{app="partition-cleaner",bucket="hdx-test-test",storage="hdx_primary"} 1.520361e+06
bulk_delete_duration How long deletes take (percentiles) bulk_delete_duration{app="partition-cleaner",bucket="hdx-test-test",storage="hdx_primary",quantile="0.99"} 466
bulk_delete_duration_sum Total time spent on deletes bulk_delete_duration_sum{app="partition-cleaner",bucket="hdx-test-test",storage="hdx_primary"} 603
bulk_delete_duration_count Number of delete operations measured bulk_delete_duration_count{app="partition-cleaner",bucket="hdx-test-test",storage="hdx_primary"} 24
bulk_delete_success_count Number of deletes that worked bulk_delete_success_count{app="partition-cleaner",bucket="hdx-test-test",storage="hdx_primary"} 33
bulk_delete_failure_count Number of deletes that failed bulk_delete_failure_count{app="partition-cleaner",bucket="hdx-test-test",storage="hdx_primary"} 0

Differences between Decay and Reaper and the partition cleaner⚓︎

Both decay/reaper and the partition cleaner delete partitions from object storage, but they serve different purposes. Decay is part of the Hydrolix data lifecycle management system and selects old data for Reaper to delete based on configured data retention policies. The partition cleaner performs maintenance by removing orphaned partitions that exist in object storage but aren't referenced in the catalog due to system errors like catalog write failures or failed merge operations.

  • Decay and reaper manage data retention. These services also delete data from the catalog.
  • Partition cleaner maintains storage integrity. This service deletes data solely from object storage.

Differences between partition cleaner and periodic service⚓︎

The periodic service manages operational files generated during cluster operations(service logs, rejected data files, Keycloak backups) and runs its own cleaning vacuums to delete these items from object storage.

Periodic service cleans up ancillary operational data in the primary storage bucket. It does not look at ingested data or partitions.