Additional Table Settings
Within a tables settings there are a number of options that can be used to define how data is stored, how it is sorted at time of storage and the maximum acceptable timestamp.
{
"name": "mytable",
"settings": {
"sort_keys": [],
"shard_key": null,
"max_future_days": 0
}
}
Max future days
Max future days is used to filter out rows that have a primary timestamp that is beyond now()
. By default this is set to 0 - i.e. any timestamps in the future of now()
are rejected.
If your application would like to store primary
timestamps in the future this should be set to the maximum amount of days beyond now() that can be accepted.
For example to allow rows to be added that are 14 days in advance of now()
you would use:
{
"name": "mytable",
"settings": {
...........
"max_future_days": 14
}
}
Rejected rows are placed within Rejects.
Sorting
Be Careful
Incorrectly sorting data can cause performance degradation. It is suggested that you contact Hydrolix support if you are considering using this function.
Hydrolix automatically sorts data as it is ingested and the majority of the time this will be sufficient (and optimal) for the majority of datasets and use cases. There may however be occasion where you will want to influence how the data is sorted due to the queries to be executed.
Changing the sort priority is achieved by adding the columns name to the sort_keys
array. This forces the ingesting system to sort incoming data by the sort_key
first, with then other columns being sorted in order of cardinality.
For example if the majority of your queries executed always use a customer_id
it may be prudent to sort data by this column.
{
"name": "mytable",
"settings": {
"sort_keys": [ locale, customer_id ]
}
}
The benefit to doing this is better performing queries as data blocks are more likely to be contiguous in the retrieval process. It should however be noted that if incorrectly used this can actually make queries slower as more requests will be needed to retrieve data.
Sorting is not retroactive
Data is only sorted from the time at which the
sort_key
is applied. Data is not re-retroactively applied.
Shard Key
Usage guidance
Shard Keys can cause high levels of stored data fragmentation and should be used with care. Where possible individual shards should still equate to billions of rows. Any shard causing fragmentation of rows to thousands or millions or rows should be avoided. Hydrolix would suggest contacting us before applying a shard key.
As a timeseries database, Hydrolix automatically shards data based on time. Additional sharding can be applied should it be required for performance or regulatory reasons., however this should be done carefully.
Adding a shard key causes data to be sorted at time of ingest with a columns values being used to generate an additional prefix within object storage. This means that if a columns values have a high amount of variability stored data can become overly fragmented causing potentially slow-downs in query performance and sub-optimal levels of compression.
To add a shard key the table settings are adjusted by adding the columns on which you want to shard by to the shard_key
object. For example to shard by customer_id
:
{
"name": "mytable",
"settings": {
"shard_key": "customer_id"
}
}
Scale Excepted TB per day
User can now indicate to Hydrolix information regarding the load generated per table.
{
"name": "mytable",
"settings": {
"scale": {
"expected_tb_per_day": 2
}
}
We use this information to figure out the scale of each component related to that table.
Everything is autoscaling but adding the expected_tb_per_day
helps setting a good mininmal scale.
Updated 2 months ago