Distribute Data to Multiple Buckets
Use this write-time feature to distribute large amounts of data to multiple randomized buckets.
If your organization frequently hits bucket Reads Per Second (RPS), Writes Per Second (WPS), or operation limits when reading and querying data, the spread_list
feature can reduce the number of bucket limits hit.
The spread_list
feature uses a write-time process to store partition data in multiple randomized buckets.. This can significantly improve read-time tasks like queries.
Merge behavior is not affected when using spread_list
.
Use spread_list in a table
You can use spread_list
in any table, including summary tables. It randomly distributes partitions created in the table across the specified buckets.
Enable spread_list
spread_list
In this example, we enable the spread_list
feature in the hydrolixcluster.yaml
file. We specify two buckets to distribute data: storage-uuid-1
and storage-uuid-2
.
Use spread_list
in a table
spread_list
in a tableYou can use spread_list
in any table, including summary tables. It will affect partitions created in the table by setting them to distribute data throughout the specified buckets.
Enable spread_list
example
spread_list
exampleIn this example, we use the API to enable the spread_list
feature in a table. We specify two buckets to distribute data, storage-uuid-1
and storage-uuid-2
.
{
"settings": {
"storage_map": {
"spread_list": [
"storage-uuid-1",
"storage-uuid-2",
...
]
}
}
}
When you generate partitions in this table, spread_list
assigns them randomly to one of the two buckets added in the example.
Troubleshooting
If you specify a combination of valid and missing or nonexistent storage, data will go to the valid storage only. For example, if you specify three buckets, but one is unreachable, your data is only written to the two functional buckets.
Hydrolix generates an error log for missing or unreachable storage. Use the error message to determine what's missing.
Exceptions
Other storage mapping settings, like column_value_mapping, can be included in the hydrolixcluster.yaml
file, but will not be active if spread_list is enabled.
See Storage Settings for more about other storage settings you can enable.
Updated 1 day ago