Distribute Data to Multiple Buckets

Use this write-time feature to distribute large amounts of data to multiple randomized buckets.

If your organization frequently hits bucket Reads Per Second (RPS), Writes Per Second (WPS), or operation limits when reading and querying data, the spread_list feature can reduce the number of bucket limits hit.

The spread_list feature uses a write-time process to store partition data in multiple randomized buckets.. This can significantly improve read-time tasks like queries.

Merge behavior is not affected when using spread_list.

Use spread_list in a table

You can use spread_list in any table, including summary tables. It randomly distributes partitions created in the table across the specified buckets.

Enable spread_list

In this example, we enable the spread_listfeature in the hydrolixcluster.yaml file. We specify two buckets to distribute data: storage-uuid-1 and storage-uuid-2.

Use spread_list in a table

You can use spread_list in any table, including summary tables. It will affect partitions created in the table by setting them to distribute data throughout the specified buckets.

Enable spread_list example

In this example, we use the API to enable the spread_list feature in a table. We specify two buckets to distribute data, storage-uuid-1 and storage-uuid-2.


{
  "settings": {
    "storage_map": {
      "spread_list": [
        "storage-uuid-1",
        "storage-uuid-2",
        ...
      ]
    }
  }
}

When you generate partitions in this table, spread_list assigns them randomly to one of the two buckets added in the example.

Troubleshooting

If you specify a combination of valid and missing or nonexistent storage, data will go to the valid storage only. For example, if you specify three buckets, but one is unreachable, your data is only written to the two functional buckets.

Hydrolix generates an error log for missing or unreachable storage. Use the error message to determine what's missing.

Exceptions

Other storage mapping settings, like column_value_mapping, can be included in the hydrolixcluster.yaml file, but will not be active if spread_list is enabled.

See Storage Settings for more about other storage settings you can enable.