Vacuum
The Hydrolix vacuums and partition cleaner delete unneeded and redundant data from your cluster to free up storage space. The partition cleaner removes partitions, and the vacuums delete logs and rejects.
The default partition cleaning schedule changed from daily to weekly in v5.6.
Partition cleaner
The partition cleaner deletes partitions no longer referenced by the Hydrolix catalog. Each partition corresponds to a particular period of timestamped data. When Hydrolix optimizes storage usage, it sometimes merges data from multiple partitions or time periods into a single partition. During optimization, Hydrolix doesn't automatically delete unused partitions or their data. Instead, the catalog removes references or pointers to the unused partitions.
The partition cleaner deletes partitions that have been rendered redundant for at least 24 hours. By default, partition vacuum runs weekly on Monday at 12:00 AM UTC, unless otherwise specified with the partition_cleaner_schedule
tunable.
The partition cleaner is a Kubernetes CronJob resource that cleans up object storage, looking for partitions that exist in object storage, but not in the catalog.
To configure the partition cleaner in the hydrolixcluster.yaml
, search the Configuration Options Reference page for settings that begin with partition_cleaner
.
Log vacuum
Hydrolix logs data regarding various components, user interactions, and events that occur within your cluster. Different clusters store different kinds of logs for different lengths of time, based on the needs of your application. By default, Hydrolix deletes all logs after seven days. You can configure this using the log_vacuum_max_age
tunable.
The log vacuum deletes logs older than your cluster's configured maximum log age (in days). By default, log vacuum runs nightly unless otherwise specified with the log_vacuum_schedule
tunable.
Rejects vacuum
Hydrolix can selectively ignore ingested data that meets custom criteria. Hydrolix calls these ignored rows rejects. When Hydrolix ignores an ingested row, it records that reject as a JSON object in a reject file.
The reject file includes all the ingested data, the originating project and table, and the reason for the rejection. Over time, clusters that reject large quantities of data can accumulate large volumes of reject files. To keep reject files from consuming an ever-growing amount of space in your cluster, we introduced the rejects vacuum, which cleans up old, no-longer-needed reject files.
By default, the rejects vacuum deletes rejects older than 7 days. You can configure this using the rejects_vacuum_max_age
tunable. By default, rejects vacuum runs nightly unless otherwise specified with the rejects_vacuum_schedule
tunable.
Vacuum metrics
The partition-cleaner
now exports metrics to monitor vacuum behavior:
Metric Name | Purpose | Example |
---|---|---|
bulk_delete_bytes | How much data was deleted | bulk_delete_bytes{app="partition-cleaner",bucket="hdx-test-test",storage="hdx_primary"} 1.520361e+06 |
bulk_delete_duration | How long deletes take (percentiles) | bulk_delete_duration{app="partition-cleaner",bucket="hdx-test-test",storage="hdx_primary",quantile="0.99"} 466 |
bulk_delete_duration_sum | Total time spent on deletes | bulk_delete_duration_sum{app="partition-cleaner",bucket="hdx-test-test",storage="hdx_primary"} 603 |
bulk_delete_duration_count | Number of delete operations measured | bulk_delete_duration_count{app="partition-cleaner",bucket="hdx-test-test",storage="hdx_primary"} 24 |
bulk_delete_success_count | Number of deletes that worked | bulk_delete_success_count{app="partition-cleaner",bucket="hdx-test-test",storage="hdx_primary"} 33 |
bulk_delete_failure_count | Number of deletes that failed | bulk_delete_failure_count{app="partition-cleaner",bucket="hdx-test-test",storage="hdx_primary"} 0 |
Updated about 1 month ago