Merge is a data lifecycle service that organizes Hydrolix data into an optimal state. Merge runs periodically in Hydrolix clusters.
Hydrolix can ingest data out of order. Because Hydrolix makes data available quickly, out-of-order ingestion can initially create sub-optimal partitions. This sub-optimal partition structure can lead to inefficient compression and performance.
Merge uses the following architecture:
|Component||Description||Scale to 0|
|Merge-Head||Uses the Catalog to determine which partitions to combine. Sends messages describing these combine tasks to a queue.||Yes|
|Queue||Contains a list of partition combine tasks to be worked on.||No|
|Merge Peer||A group of workers that take partition combine tasks from the queue. Reads partitions from storage and combines them to create new partitions. Writes the new combined partitions to the Hydrolix Database. Finally, updates the catalog with the new partition and removes the old partitions.||Yes|
|Hydrolix Database Storage Bucket||Contains the partitions that comprise the database. Part of the core infrastructure.||No|
|Catalog||Contains metadata regarding data stored in Hydrolix. Part of the core infrastructure.||No|
To configure Merge in your Hydrolix cluster, see Merge.
Updated 5 months ago