Vacuum

The Hydrolix vacuums and partition cleaner delete unneeded and redundant data from your cluster to free up storage space. The partition cleaner removes partitions, and the vacuums delete logs and rejects.

Partition cleaner

The partition cleaner deletes partitions no longer referenced by the Hydrolix catalog. Each partition corresponds to a particular period of timestamped data. When Hydrolix optimizes storage usage, it sometimes merges data from multiple partitions or time periods into a single partition. During optimization, Hydrolix doesn't automatically delete unused partitions or their data. Instead, the catalog removes references or pointers to the unused partitions.

The partition cleaner deletes partitions that have been rendered redundant for at least 24 hours. By default, partition vacuum runs nightly at 1:00 AM cluster local time, unless otherwise specified with the
partition_cleaner_schedule tunable.

The partition cleaner is a Kubernetes CronJob resource that cleans up object storage, looking for partitions that exist in object storage, but not in the catalog.

To configure the partition cleaner in the hydrolixcluster.yaml, search the Configuration Options Reference page for settings that begin with partition_cleaner.

Log vacuum

Hydrolix logs data regarding various components, user interactions, and events that occur within your cluster. Different clusters store different kinds of logs for different lengths of time, based on the needs of your application. By default, Hydrolix deletes all logs after seven days. You can configure this using the log_vacuum_max_age tunable.

The log vacuum deletes logs older than your cluster's configured maximum log age (in days). By default, log vacuum runs nightly at 4:00 AM cluster local time, unless otherwise specified with the log_vacuum_schedule tunable.

Rejects vacuum

Hydrolix can selectively ignore ingested data that meets custom criteria. Within Hydrolix, we call these ignored rows rejects. Whenever Hydrolix ignores an ingested row, it records that reject as a JSON object in a reject file. The reject file includes the full piece of ingested data, the originating project and table, and the reason for the rejection. Over time, clusters that reject large quantities of data can accumulate large volumes of reject files. To keep reject files from consuming an ever-growing amount of space in your cluster, we introduced the reject vacuum, which cleans up old, no-longer-needed reject files.

By default, the rejects vacuum deletes rejects older than 7 days. You can configure this using the rejects_vacuum_max_age tunable. By default, rejects vacuum runs nightly at 12:00 AM cluster local time, unless otherwise specified with the rejects_vacuum_schedule tunable.