As data can be retrieved out of order or needs to be made accessible quickly smaller partitions can be created as part of the initial load process, this sub-optimal partition structure can lead to inefficient compression and performance.

The merge service helps ensure that a database's tables are in an optimal state. Once scaled the service runs automatically, periodically looking for partitions to combine, combining them and then updating the database all transparently to the Query service.


Merge: Configure it!

To configure Merge for your platform information can be found here - Merge


The following components are used within Merge:

ComponentDescriptionScale to 0
Merge-HeadA deployed pod the Merge head uses the Catalog to work out which partitions should be combined. The Merge Head sends jobs with the files to be combined to a queue for processing.Yes
QueueA listing queue for a list of jobs to be worked on.No
Merge PeerThis is a group of workers that take the source partitions listed in the queue and combine them to create new partitions. These new partitions are then uploaded to the HDX Database bucket. The last step in the process is to update the catalog with the new partition and remove the old ones from being listed.Yes
HDX DB Storage BucketContains the database (including partitions), configuration and other state information concerning the platform. Forms part of the core infrastructure.No
CatalogContains metadata on the database, partitions and job tasks. Forms part of the core infrastructure.No