Vacuum
Hydrolix provides automated cleanup services to delete unneeded and redundant data from your cluster, freeing up storage space. These services run on scheduled intervals to maintain optimal cluster performance.
This page describes the Periodic Service, which consolidates multiple cleanup operations including rejected data, cluster log deletion, and Keycloak backup management.
See data lifecycle for an overview of decay, reaper, and partition cleaner services which operate on table partitions.
Periodic Service⚓︎
The periodic service manages operational data cleanup tasks and runs daily scheduled jobs for three vacuums:
The periodic service operates only on the cluster's default storage. It does not support multi-bucket configurations as the logs, rejects and keycloak backups are saved in the cluster's default storage.
To enable the periodic service, set the periodic-service replica to 1.
Rejects vacuum⚓︎
Hydrolix can selectively ignore ingested data that meets custom criteria. Hydrolix calls these ignored rows rejects. When Hydrolix ignores an ingested row, it records that reject as a JSON object in a reject file.
The reject file includes all the ingested data, the originating project and table, and the reason for the rejection. Over time, clusters that reject large quantities of data can accumulate large volumes of reject files.
The rejects vacuum is part of the periodic service, which consolidates several maintenance operations into a single service. The rejects vacuum cleans up old, no-longer-needed reject files from object storage to prevent them from consuming an ever-growing amount of space.
Configuration⚓︎
The rejects vacuum runs as a scheduled job within the periodic service and can be configured using environment variables on the periodic-service deployment. To enable the periodic service, set the periodic-service replica to 1.
Schedule configuration:
- Environment variable:
REJECTS_CLEAN_SCHED - Format: Tokio-cron notation (see tokio-cron syntax)
- Default: Daily at 02:00 UTC
- Example:
0 0 2 * * *(every day at 2 AM)
Enable/Disable:
- Environment variable:
REJECTS_CLEAN_ENABLED - Values:
true(default) orfalse
Max age configuration:
- Only available as a command-line override:
--max-age-duration <DURATION> - Format: Duration string (
168h,7d) - Default: 7 days (168 hours)
Manual execution⚓︎
You can manually trigger the rejects vacuum by executing commands within the periodic-service pod. By default, manual execution runs in dry-run mode and shows what would be deleted without actually deleting anything.
Available options:
--perform-delete- Execute deletions (without this flag, runs in dry-run mode)--max-age-duration <DURATION>- Override the default max age (168hor7d)--rpc-port <PORT>- RPC port (default: 9000)--rpc-host <HOST>- RPC host (default:http://localhost)-h, --help- Display help information
Metrics⚓︎
The periodic service exports metrics to monitor rejects vacuum behavior. You can filter these further with labels organization_id, project_id and table_id.
| Metric Name | Purpose |
|---|---|
reject_deletes_total |
Total number of reject files deleted |
reject_visited_total |
Total number of reject files examined |
reject_failures_total |
Number of failed deletion attempts |
reject_delete_size_total |
Total size in bytes of deleted reject files |
Log vacuum⚓︎
Hydrolix logs data regarding various components, user interactions, and events that occur within your cluster. Different clusters store different kinds of logs for different lengths of time, based on the needs of your application. Visit Logging Configuration for information on how to set up logs for your cluster.
The log vacuum is part of the periodic service, which consolidates several maintenance operations into a single service. The log vacuum deletes log files from object storage that are older than the configured maximum age. By default, Hydrolix deletes all logs after 7 days.
Configuration⚓︎
The log vacuum runs as a scheduled job within the periodic service and can be configured using environment variables on the periodic-service deployment. To enable the periodic service, set the periodic-service replica to 1.
Schedule configuration:
- Environment variable:
LOG_CLEAN_SCHED - Format: Tokio-cron notation (see tokio-cron syntax)
- Default: Daily at 00:00 UTC
- Example:
0 0 * * * *(every day at midnight)
Enable/Disable:
- Environment variable:
LOG_CLEAN_ENABLED - Values:
true(default) orfalse
Max age configuration:
- Only available as a command-line override:
--log_vacuum_max_age <DURATION> - Format: Duration string (
168h,7d) - Default: 7 days (168 hours)
Manual execution⚓︎
Manually trigger the log vacuum by executing commands within the periodic-service pod. By default, manual execution runs in dry-run mode and shows what would be deleted without actually deleting anything.
| Manual Access to Log Vacuum | |
|---|---|
Available options:
--perform-delete- Execute deletions (without this flag, runs in dry-run mode)--log_vacuum_max_age <DURATION>- Override the default max age (168hor7d)--rpc-port <PORT>- RPC port (default: 9000)--rpc-host <HOST>- RPC host (default:http://localhost)-h, --help- Display help information
Metrics⚓︎
The periodic service exports metrics to monitor log vacuum behavior.
| Metric Name | Purpose |
|---|---|
log_deletes_total |
Total number of log files deleted |
log_visited_total |
Total number of log files examined |
log_failures_total |
Number of failed deletion attempts |
log_delete_size_total |
Total size in bytes of deleted log files |
Keycloak vacuum⚓︎
Hydrolix uses Keycloak for authentication and authorization services. The system automatically creates backup files of Keycloak data to ensure configuration and user data can be recovered if needed. Over time, these backup files can accumulate and consume storage space.
The Keycloak vacuum is part of the periodic service, which consolidates several maintenance operations into a single service. The Keycloak vacuum automatically deletes Keycloak backup files that are older than the configured retention period.
Configuration⚓︎
The Keycloak vacuum runs as a scheduled job within the periodic service and can be configured using environment variables on the periodic-service deployment. To enable the periodic service, set the periodic-service replica to 1.
Schedule configuration:
- Environment variable:
KEYCLOAK_CLEAN_SCHED - Format: Tokio-cron notation (see tokio-cron syntax)
- Default: Daily at 01:00 UTC
- Example:
0 0 1 * * *(every day at 1 AM)
Enable/Disable:
- Environment variable:
KEYCLOAK_CLEAN_ENABLED - Values:
true(default) orfalse
Max age configuration:
- Only available as a command-line override:
--max-age-duration <DURATION> - Format: Duration string (
168h,7d) - Default: 7 days (168 hours)
Manual execution⚓︎
You can manually trigger the Keycloak vacuum by executing commands within the periodic-service pod. By default, manual execution runs in dry-run mode and shows what would be deleted without actually deleting anything.
Available options:
--perform-delete- Execute deletions (without this flag, runs in dry-run mode)--max-age-duration <DURATION>- Override the default max age (168hor7d)--rpc-port <PORT>- RPC port (default: 9000)--rpc-host <HOST>- RPC host (default:http://localhost)-h, --help- Display help information
Metrics⚓︎
The periodic service exports metrics to monitor Keycloak vacuum behavior.
| Metric Name | Purpose |
|---|---|
keycloak_deletes_total |
Total number of Keycloak backup files deleted |
keycloak_visited_total |
Total number of Keycloak backup files examined |
keycloak_failures_total |
Number of failed deletion attempts |
keycloak_delete_size_total |
Total size in bytes of deleted Keycloak backup files |