Table Settings Reference
Tables use a number of options to define how data is stored:
{
"name": "mytable",
"settings": {
"sort_keys": [],
"shard_key": null,
"max_future_days": 0,
"expected_tb_per_day": 1
}
}
Reject Future Data
The max_future_days
option filters out rows with a primary timestamp in the future. By default, Hydrolix uses a value of 0
, and rejects any events with a timestamp greater than now()
.
If your application stores future primary timestamps, you can configure a value above 0
that meets your needs.
If you prefer not to reject any incoming future data, you can set max_future_days
to null
.
For example, the following settings example accepts primary timestamps up to 14 days in the future:
{
"name": "mytable",
"settings": {
"max_future_days": 14
}
}
...
}
Ingestion Sort Priority
Hydrolix automatically sorts data during ingestion by column cardinality. For most use cases, the default sorting is sufficient. However, some query patterns benefit from custom sorting.
To change the sort priority:
- Determine a column or columns you would like to sort by.
- Add the column name to the
sort_keys
array.
Hydrolix sorts incoming data first by the declared order of the sort_keys
array, then by other columns in order of cardinality.
Consider an application where many queries sort first by customer_id
, then by locale
. If Hydrolix stores data sorted by these columns, iteration over these queries can benefit from spatial locality. The following settings example sorts data first by customer_id
, then by locale
:
{
"name": "mytable",
"settings": {
"sort_keys": [ customer_id, locale ]
}
...
}
Sorting is not Retroactive
Changes to the
sort_keys
field only effect data ingested after applying the change. Hydrolix does not change the sort order of existing data.
Shard Key
Hydrolix automatically shards data based on time. You can apply additional sharding based on other columns. Hydrolix shards data based on unique column values, so columns with high cardinality generate lots of shards.
This feature can improve query time if used properly. However, you must ensure that there is an equal distribution of data amongst your shard key values, and you must make sure that most, if not all queries include the shard key predicate.
To add a shard key to a table:
- Determine a column or columns you would like to shard by.
- Use the column name as the value of the
shard_key
setting. - The Shard keys values MUST be a string.
The following settings example shards data by region
:
{
"name": "mytable",
"settings": {
"shard_key": "region"
}
...
}
Fragmentation
Shard with care. Individual shards should contain billions of rows. Small shards degrade system performance.
Updated about 2 months ago