Skip to content

Spread List

The spread_list feature writes partition data across buckets using random selection. It can prevent you from encountering limits when querying data.

If your organization encounters bucket limits when reading or writing data, or operation limits when retrieving data, the spread_list feature can reduce the number of bucket limits triggered.

You can use spread_list in any table, including summary tables. It randomly distributes partitions created in the table across the specified buckets.

If a summary table has no spread_list defined, but is summarizing a table that has a spread_list, the summary table will inherit the underlying table's spread_list settings.

Merge behavior isn't affected when using spread_list.

Learn more about improving Query Efficiency.

Enable spread_list⚓︎

This example enables the spread_list feature in the cluster configuration. We specify two storage locations to distribute data: storage-uuid-1 and storage-uuid-2.

Enable spread_list example
1
2
3
4
5
6
7
spec:
  settings:
    storage_map:
      spread_list:
        - storage-uuid-1
        - storage-uuid-2
        - '...'

When you generate partitions in this table, spread_list assigns them randomly to one of the two storage locations added in the example.

Missing buckets⚓︎

A bucket must exist and be usable when creating a storage object. This configuration time check prevents usage of invalid or unavailable buckets.

If a bucket becomes unavailable during operation, Hydrolix continues to write data to other, available buckets. For example, if one of your three configured storage locations in a spread list becomes unreachable, your data are written randomly across the two remaining available choices.

Hydrolix generates a warning log in the intake-head pods for missing or unreachable storages. These logs contain the table, table_uuid, and storage_id fields.

WARN "Nonexistent storage found in spread list."

Exceptions⚓︎

Other storage mapping settings, like column_value_mapping, can be included in the hydrolixcluster.yaml file, but won't be active if spread_list is enabled.

See Storage Settings for more about other storage settings you can enable.