Table Settings Reference
Tables use a number of options to define how data is stored:
{
"name": "mytable",
"settings": {
"sort_keys": [],
"shard_key": null,
"max_future_days": 0,
"expected_tb_per_day": 1
}
}
Reject Future Data
The max_future_days
option filters out rows with a primary timestamp in the future. By default, Hydrolix uses a value of 0
, and rejects any events with a timestamp greater than now()
.
If your application stores future primary timestamps, you can configure a value above 0
that meets your needs.
If you prefer not to reject any incoming future data, you can set max_future_days
to null
.
For example, the following settings example accepts primary timestamps up to 14 days in the future:
{
"name": "mytable",
"settings": {
"max_future_days": 14
}
}
...
}
Ingestion Sort Priority
Hydrolix automatically sorts data during ingestion by column cardinality. For most use cases, the default sorting is sufficient. However, some query patterns benefit from custom sorting.
To change the sort priority:
- Determine a column or columns you would like to sort by.
- Add the column name to the
sort_keys
array.
Hydrolix sorts incoming data first by the declared order of the sort_keys
array, then by other columns in order of cardinality.
Consider an application where many queries sort first by customer_id
, then by locale
. If Hydrolix stores data sorted by these columns, iteration over these queries can benefit from spatial locality. The following settings example sorts data first by customer_id
, then by locale
:
{
"name": "mytable",
"settings": {
"sort_keys": [ customer_id, locale ]
}
...
}
Sorting is not Retroactive
Changes to the
sort_keys
field only effect data ingested after applying the change. Hydrolix does not change the sort order of existing data.
Shard Key
Hydrolix automatically shards data based on time. You can apply additional sharding based on other columns. Hydrolix shards data based on unique column values, so columns with high cardinality generate lots of shards.
To add a shard key to a table:
- Determine a column or columns you would like to shard by.
- Use the column name as the value of the
shard_key
setting. - The Shard keys values MUST be a string.
The following settings example shards data by customer_id
:
{
"name": "mytable",
"settings": {
"shard_key": "customer_id"
}
...
}
Fragmentation
Shard with care. Individual shards should contain billions of rows. Small shards degrade system performance.
Expected Data Volume
The expected_tb_per_day
setting controls the scale of ingestion components. The following example configures scaling settings for a table expected to ingest 2 terabytes of data per day:
{
"name": "mytable",
"settings": {
"scale": {
"expected_tb_per_day": 2
}
}
...
}
Scale Settings Override
This setting overrides manual Kubernetes scale settings for table components.
Updated about 1 month ago