Table Settings Reference

Tables use a number of options to define how data is stored:

{
   "name": "mytable",
   "settings": {
      "sort_keys": [],
      "shard_key": null,
      "max_future_days": 0,
      "expected_tb_per_day": 1
   }
}  

Reject Future Data

The max_future_days option filters out rows with a primary timestamp in the future. By default, Hydrolix uses a value of 0, and rejects any events with a timestamp greater than now().

If your application stores future primary timestamps, you can configure a value above 0 that meets your needs.

If you prefer not to reject any incoming future data, you can set max_future_days to null.

For example, the following settings example accepts primary timestamps up to 14 days in the future:

{
   "name": "mytable",
   "settings": {
         "max_future_days": 14
      }
   }
   ...
}

Ingestion Sort Priority

Hydrolix automatically sorts data during ingestion by column cardinality. For most use cases, the default sorting is sufficient. However, some query patterns benefit from custom sorting.

To change the sort priority:

  1. Determine a column or columns you would like to sort by.
  2. Add the column name to the sort_keys array.

Hydrolix sorts incoming data first by the declared order of the sort_keys array, then by other columns in order of cardinality.

Consider an application where many queries sort first by customer_id, then by locale. If Hydrolix stores data sorted by these columns, iteration over these queries can benefit from spatial locality. The following settings example sorts data first by customer_id, then by locale:

{
   "name": "mytable",
   "settings": {
      "sort_keys": [ customer_id, locale ]
   }
   ...
}  

🚧

Sorting is not Retroactive

Changes to the sort_keys field only effect data ingested after applying the change. Hydrolix does not change the sort order of existing data.

Shard Key

Hydrolix automatically shards data based on time. You can apply additional sharding based on other columns. Hydrolix shards data based on unique column values, so columns with high cardinality generate lots of shards.

This feature can improve query time if used properly. However, you must ensure that there is an equal distribution of data amongst your shard key values, and you must make sure that most, if not all queries include the shard key predicate.

To add a shard key to a table:

  1. Determine a column or columns you would like to shard by.
  2. Use the column name as the value of the shard_key setting.
  3. The Shard keys values MUST be a string.

The following settings example shards data by region :

{
   "name": "mytable",
   "settings": {
      "shard_key": "region"
   }
   ...   
}  

❗️

Fragmentation

Shard with care. Individual shards should contain billions of rows. Small shards degrade system performance.

Expected Data Volume

The expected_tb_per_day setting controls the scale of ingestion components. The following example configures scaling settings for a table expected to ingest 2 terabytes of data per day:

{
   "name": "mytable",
   "settings": {
      "scale": {
         "expected_tb_per_day": 2
      }
   }
   ...
}

🚧

Scale Settings Override

This setting overrides manual Kubernetes scale settings for table components.