Hydrolix's vacuum feature deletes unneeded and redundant data from your cluster to free up storage space. Vacuum cleans up three different kinds of unneeded data: partitions, logs, and rejects.

Partition Vacuum

The partition vacuum deletes partitions no longer referenced by the Hydrolix catalog. Each partition corresponds to a particular period of timestamped data. When Hydrolix optimizes storage usage, it sometimes merges data from multiple partitions (or time periods) into a single partition. During optimization, Hydrolix doesn't automatically delete unused partitions or the data stored within. Instead, the catalog removes references (or pointers) to the unused partitions. Vacuum cleans up those unused partitions.

The partition vacuum deletes empty partitions that have been rendered redundant for at least 24 hours. By default, partition vacuum runs nightly at 1AM cluster local time, unless otherwise specified with the partition_vacuum_schedule tunable.

Log Vacuum

Hydrolix logs data regarding various components, user interactions, and events that occur within your cluster. Different clusters store different kinds of logs for different lengths of time, based on the needs of your application. By default, Hydrolix deletes all logs after 7 days. You can configure this using the log_vacuum_max_age tunable.

The log vacuum deletes logs older than your cluster's configured maximum log age (in days). By default, log vacuum runs nightly at 4AM cluster local time, unless otherwise specified with the log_vacuum_schedule tunable.

Rejects Vacuum

Hydrolix can selectively ignore ingested data that meets custom criteria. Within Hydrolix, we call these ignored rows rejects. Whenever Hydrolix ignores an ingested row, it records that reject as a JSON object in a reject file. The reject file includes the full piece of ingested data, the originating project and table, and the reason for the rejection. Over time, clusters that reject large quantities of data can accumulate large volumes of reject files. To keep reject files from consuming an ever-growing amount of space in your cluster, we introduced the reject vacuum, which cleans up old, no-longer-needed reject files.

By default, the rejects vacuum deletes rejects older than 7 days. You can configure this using the rejects_vacuum_max_age tunable. By default, rejects vacuum runs nightly at 12AM cluster local time, unless otherwise specified with the rejects_vacuum_schedule tunable.