Merge is a data lifecycle service that organizes Hydrolix data into an optimal state. Merge runs periodically in Hydrolix clusters.

Hydrolix can ingest data out of order. Because Hydrolix makes data available quickly, out-of-order ingestion can initially create sub-optimal partitions. This sub-optimal partition structure can lead to inefficient compression and performance.

Merge uses the following architecture:

ComponentDescriptionScale to 0
Merge-HeadUses the Catalog to determine which partitions to combine. Sends messages describing these combine tasks to a queue.Yes
QueueContains a list of partition combine tasks to be worked on.No
Merge PeerA group of workers that take partition combine tasks from the queue. Reads partitions from storage and combines them to create new partitions. Writes the new combined partitions to the Hydrolix Database. Finally, updates the catalog with the new partition and removes the old partitions.Yes
Hydrolix Database Storage BucketContains the partitions that comprise the database. Part of the core infrastructure.No
CatalogContains metadata regarding data stored in Hydrolix. Part of the core infrastructure.No

πŸ› οΈ

Configure Merge

To configure Merge in your Hydrolix cluster, see Merge.


What’s Next