Intake Spill

Overview⚓︎

This feature was introduced in Hydrolix version 4.20.1 and is disabled by default.

See Configure intake spill to enable it.

The intake-head service can automatically offload raw and catalog data to a cluster's default external object storage if there is a delay in transferring incoming data buckets to the indexer. When the delay is resolved, the offloaded data will be fetched back into the processing pipeline so no data is lost.

This feature ensures that ingest operates continuously under slow response-time conditions, during unusually heavy load, and when there are temporary outages to the catalog.

In Hydrolix version 5.2, the spill architecture was redesigned with a central spill-controller that coordinates spilled data distribution. This controller reduces operational costs and storage contention by having intake-head nodes poll the controller instead of independently scanning object storage.

Key features⚓︎

The spill-controller architecture in version 5.2 and later provides:

Centralized coordination: The spill-controller manages spilled data distribution, preventing race conditions and reducing overhead from independent object storage scanning.
Reduced operational costs: intake-head pods poll the controller instead of scanning object storage, lowering storage API costs and contention.
Partition-specific storage: Spilled data is stored in the storage target defined for each partition, not always in the default storage.
Local buffering: Buffering before spills reduces the number of catalog-add operations during normal operation.
Automatic cleanup: The controller cleans up any remaining spilled data hourly, preventing orphaned files in object storage.

When to use intake spill⚓︎

Intake spill can be helpful if your ingest pipeline periodically emits 429 responses resulting in data loss, or experiences backpressure from any of the following events:

Ingest spikes
High-latency partition creation
Slow catalog inserts

Spill raw or catalog data⚓︎

Intake spill of raw or catalog data can be enabled independently. When both are enabled, ingested data may be pushed to object storage prior to indexing (raw data) and when populating the database catalog (catalog data). In version 5.2 and later, intake-head pods poll a central spill-controller that coordinates retrieval of spilled data. In earlier versions, intake-heads independently scanned object storage.

Spilled files are fetched back into the processing pipeline once an indexer is able to create partitions again, or once the catalog is accepting insertions. Files fetched and successfully re-processed by an intake-head pod are subsequently cleaned from object storage.

When data is spilled, it's written to object storage. In version 5.2 and later, spilled data is stored in the storage target defined for each partition. In earlier versions, data is always written to the cluster's default storage. Once written, the spilled data is shared among intake-head pods. This means that an intake-head pod that isn't experiencing delays can fetch and process raw data received by a different intake-head pod.

Newer data is prioritized by Last In, First Out (LIFO) when retrieving spilled data from object storage.

Resolve `429` errors⚓︎

If your ingest pipeline generates 429 Too Many Requests errors, first identify the source of the errors. There are typically two reasons why this error occurs:

Latency in parsing and applying transforms to the ingested data
Latency when creating partitions with the ingested data

The `hdx_sink_bucket_maint_duration_ns` metric is not high⚓︎

Use the hdx_sink_bucket_maint_duration_ns Prometheus metric to identify the root cause of the error.

For example, if the metric is only showing several seconds in the 99th percentile, and the intake-head is returning 429 errors, it may be due to a bottleneck while parsing or applying transforms to the incoming data.

This can mean there are too many requests for the intake-head to process. Rather than use the intake spill feature, scale up the intake-head to allow more requests.

The `hdx_sink_bucket_maint_duration_ns` metric is high⚓︎

If the hdx_sink_bucket_maint_duration_ns metric is higher than several seconds, it indicates the partition creation process is the bottleneck. This is typically caused by slowness in one of the downstream services. For example, latency in object storage can slow down the partition creation and ingestion process and generate 429 errors.

Enable the intake spill feature for this type of error.

Architecture diagram⚓︎

Intake Spill Architecture

Spilled data file paths⚓︎

The following details the file paths in which raw and catalog data are buffered. This can be helpful for locating spilled data, but isn't required to use this feature.

When raw data is spilled to object storage, it's saved in the path:

1	`<bucket>/spill/raw`

Each raw data bucket is organized within these folders using the following structured path:

<partition folder>/<reverse timestamp millis>/<project_id>/<table_id>/<shard key hash>/<unique id>.tar.gz

Unprocessed catalog data is spilled to the path:

1	`<bucket>/spill/catalog`

Catalog buckets are organized using the same path but in .json format.

<partition folder>/<reverse timestamp millis>/<project_id>/<table_id>/<shard key hash>/<unique id>.json

The <partition folder> structure reduces contention on retrieval. Each fetcher can only scan one partition folder at a time, in randomized order.

Configure intake spill⚓︎

The intake spill feature is disabled by default. You can enable it in your hydrolixcluster.yaml config file with the intake_head_raw_data_spill_config and intake_head_catalog_spill_config tunables.

Both intake_head_catalog_data_spill_config.enabled and intake_head_raw_data_spill_config.enabled tunables expect string values ("true" and "false"), though boolean values (true/false) are also accepted for historical reasons.

These tunables are dictionaries containing the following identical fields and default values:

Config with Default ValuesDescriptionsTypes

intake_head_raw_data_spill_config:
  enabled: false
  max_concurrent_spill: 20
  max_attempts_spill: 5

intake_head_catalog_spill_config:
  enabled: false
  max_concurrent_spill: 20
  max_attempts_spill: 5

intake_head_raw_data_spill_config:
  enabled: when true, ingested data is spilled to object storage when partition generation is slowed for a particular intake-head pod
  max_concurrent_spill: maximum number of raw spill actions that can be concurrently executing
  max_attempts_spill: maximum number of sequential upload attempts before failing

intake_head_catalog_spill_config:
  enabled: when true, catalog adds are spilled to object storage when catalog interactions are slowed or fail for a particular intake-head pod
  max_concurrent_spill: maximum number of catalog spill actions that can be concurrently executing
  max_attempts_spill: maximum number of sequential upload attempts before failing

intake_head_raw_data_spill_config:
  enabled: string
  max_concurrent_spill: uint
  max_attempts_spill: uint

intake_head_catalog_spill_config:
  enabled: string
  max_concurrent_spill: uint
  max_attempts_spill: uint

These options apply to each individual intake-head pod.

Metrics⚓︎

These metrics have been added to support this feature:

Metric Name	Description
`hdx_sink_spill_catalog_accepted_count`	Count of spilled catalog entries accepted into the indexing pipeline for processing
`hdx_sink_spill_catalog_duration_ns`	A histogram of the duration in nanoseconds taken to spill catalog adds
`hdx_sink_spill_catalog_failure_count`	Count of spill failures for catalog adds resulting in lost data
`hdx_sink_spill_catalog_race_lost_count`	Count of lost races attempting to add spilled data entry to catalog
`hdx_sink_spill_raw_accepted_count`	Count of spilled raw data entries accepted into the indexing pipeline for processing
`hdx_sink_spill_raw_duration_ns`	A histogram of the duration in nanoseconds taken to spill raw data buckets
`hdx_sink_spill_raw_failure_count`	Count of spill failures for raw data resulting in lost data

Risks and limitations⚓︎

Intake spill has the following limitations and potential issues.

Temporary data gaps⚓︎

This can occur when data is currently spilled and hasn't yet been fetched and processed. Subsequent fetching and processing of spilled data is prioritized by most recent data first.

To determine if raw or catalog data buckets remain unprocessed in object storage, you can take the difference between the counter metric for number of things spilled and the counter metric for number of things fetched. A duration histogram metric is captured for each spill occurrence.

For raw and catalog data:

hdx_sink_spill_raw_duration_ns
hdx_sink_spill_catalog_duration_ns

A histogram metric generates a corresponding counter metric in addition to the duration it captures. To access the counter, append _count to the original histogram metric name. The counts of raw and catalog data spilled can therefore be obtained from the metrics:

hdx_sink_spill_raw_duration_ns_count
hdx_sink_spill_catalog_duration_ns_count

Making the difference between the counts of spilled data and fetched data the following:

Raw: hdx_sink_spill_raw_duration_ns_count - hdx_sink_spill_raw_accepted_count
Catalog: hdx_sink_spill_catalog_duration_ns_count - hdx_sink_spill_catalog_accepted_count

Count of fetched data may be higher than count of spilled data

The "accepted" count (number of fetched buckets) can be higher than the count of buckets spilled because there may be a race fetching a given spilled item which can result in it being fetched by more than one intake-head pod. However, this race won't result in duplicated data. Due to this scenario, a positive result from this calculation indicates at least one bucket of spilled data hasn't been fetched; however, a zero or negative result doesn't guarantee that all spilled data has been fetched and processed. Because spilled data is cleaned up after being fetched to continue processing, you can manually check for existing records in object storage within the appropriate file paths to determine if there is spilled data that has yet to backfill gaps in your queryable data.

Increased CPU usage during ingest spikes⚓︎

Tarballing and gzipping large amounts of data may notably increase CPU usage on intake-head pods.

Fetching doesn't respect worker pool isolation⚓︎

If you have set up resource pools to keep ingest workloads isolated, enabling this feature means that raw or catalog data can be ingested by an intake-head from one pool, spilled to the shared object storage, then picked up and have its processing completed by an intake-head from another worker pool.

No multi-bucket targeting⚓︎

Data can only be spilled to the default object store for the Hydrolix cluster. It's not currently possible to configure a secondary storage for this feature.

Wasted resources due to race conditions on fetch⚓︎

It's possible that multiple intake-head pods will try to claim the same spilled data for processing. For catalog data, to ensure that data is written exactly once, an intake-head will write to the locks table in the catalog database in PostgreSQL using the unique ID for the spilled data bucket it's attempting to process as part of the primary key. Upon success, it will then write the generated partition to the catalog. Any subsequent intake-head will encounter this lock and be unable to process the same spilled data bucket, ensuring that only one writer can win any potential race. The metric to measure how often this race condition occurs for catalog data is hdx_sink_spill_catalog_race_lost_count. A corresponding metric doesn't exist for raw data.