Merge

Overview

Merge is a data lifecycle service that organizes Hydrolix data into an optimal state. Merge runs periodically in Hydrolix clusters.

Hydrolix can ingest data out of order. Because Hydrolix makes data available quickly, out-of-order ingestion can initially create sub-optimal partitions. This sub-optimal partition structure can lead to inefficient compression and performance.

Architecture

Merge uses the following architecture:

Component	Description	Scale to 0
Merge head	Uses the Catalog to determine which partitions to combine. Sends messages describing these combine tasks to a queue.	Yes
Queue (RabbitMQ)	Contains a list of partition combine tasks to be worked on.	No
Merge peer	A group of workers that take partition combine tasks from the queue. Reads partitions from storage and combines them to create new partitions. Writes the new combined partitions to the Hydrolix Database. Finally, updates the catalog with the new partition and removes the old partitions.	Yes
Hydrolix database storage bucket	Contains the partitions that comprise the database. Part of the core infrastructure.	No
Catalog	Contains metadata regarding data stored in Hydrolix. Part of the core infrastructure.	No

🛠️
Configure Merge
To configure Merge in your Hydrolix cluster, see Merge.

Merge controller (v5.3.0+)

See Enable merge controller to enable the merge controller.

Merge controller is a recommended drop-in replacement for merge head. Architecturally, merge controller and merge head only differ in how they communicate with merge peers.

merge-head uses RabbitMQ as an intermediate communication channel with the merge-peer pools. Each pool has its own queue, and merge-head dispatches messages to a queue for a merge-peer to consume.

merge-controller doesn't use RabbitMQ. Instead, each merge-peer connects directly to the merge-controller via gRPC. This direct line of communication allows merge-controller greater visibility and control over the system as a whole, eliminating some of the largest inefficiencies in the merge system.

Architecture

Component	Description	Scale to 0
Merge controller	A drop-in replacement for merge head. Determine which partitions should be combined to improve query performance and storage costs. Communicates merge tasks to merge peers using gRPC channels.	Yes
gRPC channel	Intermediate component between the merge controller which issues merge tasks and the merge peers which combine smaller partitions into larger partitions.	Yes
Merge peer	A group of workers that take partition combine tasks from the queue. Reads partitions from storage and combines them to create new partitions. Writes the new combined partitions to the Hydrolix Database. Finally, updates the catalog with the new partition and removes the old partitions.	Yes
Hydrolix database storage bucket	Contains the partitions that comprise the database. Part of the core infrastructure.	No
Catalog	Contains metadata regarding data stored in Hydrolix. Part of the core infrastructure.	No

Overview

Architecture

🛠️Configure Merge

Merge controller (v5.3.0+)

Architecture

🛠️
Configure Merge