As data can be retrieved out of order or needs to be made accessible quickly smaller partitions can be created as part of the initial load process, this sub-optimal partition structure can lead to inefficient compression and performance degradation.

The merge service helps ensure that a database's tables are in an optimal state. Once scaled the service runs automatically, periodically looking for partitions to combine, combining them and then updating the database all transparently to the Query service.

πŸ‘

Merge: Configure it!

To configure Merge for your platform information can be found here - Merge

The following components are used within Merge:

ComponentDescriptionScale to 0
Intake-MiscA deployed server running the Merge Head micro service. The Merge head uses the Catalog to work out which partitions should be combined. The Merge Head sends jobs with the files to be combined to a queue for processing.Yes
QueuesTwo queues are operated, a listing queue for a list of jobs to be worked on and a dead-letter queue, where jobs are expired to after a period of time if they are not accessed.No
Merge Peer Server PoolThis is a group of servers that are the workers that take the source partitions listed in the queue and combine them to create new partitions. These new partitions are then uploaded to the HDX Database bucket. The last step in the process is to update the catalog with the new partition and remove the old ones from being listed.Yes
HDX DB Storage BucketContains the database (including partitions), configuration and other state information concerning the platform. Forms part of the core infrastructure.No
CatalogContains metadata on the database, partitions and job tasks. Forms part of the core infrastructure.No

Did this page help you?