Table Settings
Tables use a number of options to define how data is stored:
| Table Settings Overview | |
|---|---|
Reject Future Data⚓︎
The max_future_days option filters out rows with a primary timestamp in the future. By default, Hydrolix uses a value of 0, and rejects any events with a timestamp greater than now().
If your application stores future primary timestamps, you can configure a value above 0 that meets your needs.
If you prefer not to reject any incoming future data, you can set max_future_days to null.
For example, the following settings example accepts primary timestamps up to 14 days in the future:
Ingestion Sort Priority⚓︎
Hydrolix sorts data during ingestion by column cardinality. For most use cases, the default sorting is sufficient. However, some query patterns benefit from custom sorting.
To change the sort priority:
- Determine a column or columns you would like to sort by.
- Add the column name to the
sort_keysarray.
Hydrolix sorts incoming data first by the declared order of the sort_keys array, then by other columns in order of cardinality.
Consider an application where many queries sort first by customer_id, then by locale. If Hydrolix stores data sorted by these columns, iteration over these queries can benefit from spatial locality. The following settings example sorts data first by customer_id, then by locale:
| Custom Sort Keys | |
|---|---|
Sorting isn't retroactive
Changes to the sort_keys field only effect data ingested after applying the change. Hydrolix doesn't change the sort order of existing data.
Shard key⚓︎
The shard_key_algo, enable_sharding, and legacy_sharding settings were introduced in Hydrolix version 5.10.3.
Hydrolix always shards data by time. You can apply additional column-based sharding to improve query performance through partition pruning. When a query filters on the shard key column, Hydrolix reads only the partitions matching the filter value instead of scanning all partitions in the time range.
Sharding settings⚓︎
| Setting | Type | Default | Description |
|---|---|---|---|
shard_key |
string | null |
The column to shard by. Must be a string column. Can't be changed or cleared once set. |
enable_sharding |
boolean | null |
Enables or disables column-based sharding. When true, both shard_key and shard_key_algo must be set. |
shard_key_algo |
string | null |
The sharding algorithm: strict or performance. Can't be cleared once set, but can switch between the two. |
legacy_sharding |
boolean | null |
Read-only. Set to true during migration for tables that used shard_key before v5.10. |
Sharding algorithms⚓︎
Hydrolix supports two sharding algorithms:
-
strictcreates a unique hash (CRC32) for each shard key value. Each distinct value gets its own partition directory. This provides the best query pruning for low-cardinality columns but can create too many directories for high-cardinality columns. -
performancedistributes shard key values across 7 fixed buckets using CRC32 mod 7. This reduces the number of partition directories and cloud storage LIST API calls, making it better suited for high-cardinality columns likecustomer_idorsession_id. The tradeoff is that queries filtering on a single value may read partitions containing other values that hash to the same bucket.
Enable sharding on a new table⚓︎
To add a shard key to a table, set shard_key to a string column name, choose a sharding algorithm, and enable sharding:
| Enable Sharding | |
|---|---|
Switch sharding algorithms⚓︎
You can switch between strict and performance on an existing table. New partitions use the new algorithm, and queries check both hash formats.
Disable and re-enable sharding⚓︎
Set enable_sharding to false to stop column-based sharding for new data. Time-based sharding remains in effect. Existing sharded partitions remain queryable. You can re-enable column-based sharding later by setting enable_sharding back to true.
Sharding configuration is permanent
The shard_key column and shard_key_algo can't be cleared once set. You can disable column-based sharding with enable_sharding: false and switch between strict and performance, but you can't remove the shard key column or reset the algorithm to null.
Upgrade migration from pre-v5.10⚓︎
Tables that had shard_key set before v5.10 are migrated on upgrade. The migration sets enable_sharding to true, shard_key_algo to strict, and legacy_sharding to true. No action is required.
The legacy_sharding flag ensures backward compatibility. Queries on migrated tables check partitions created under the old hashing algorithm alongside partitions created under the new CRC32 algorithm.
For more details on the upgrade migration, see Upgrade to v5.10.
Fragmentation
Shard with care. Use performance mode for high-cardinality columns to avoid creating excessive partition directories. Individual shards should contain billions of rows. Small shards degrade system performance.
See also⚓︎
- Reduce Columnar Database Costs by Analyzing Cardinality - How to analyze column cardinality to optimize storage and query performance
- Solving the Challenge of High-Cardinality Data with SQL Transformation - Techniques for handling high-cardinality columns without degrading system performance