Merge
As data can be retrieved out of order or needs to be made accessible quickly smaller partitions can be created as part of the initial load process, this sub-optimal partition structure can lead to inefficient compression and performance.
The merge service helps ensure that a database's tables are in an optimal state. Once scaled the service runs automatically, periodically looking for partitions to combine, combining them and then updating the database all transparently to the Query service.
Merge: Configure it!
To configure Merge for your platform information can be found here - Merge
The following components are used within Merge:
Component | Description | Scale to 0 |
---|---|---|
Merge-Head | A deployed pod the Merge head uses the Catalog to work out which partitions should be combined. The Merge Head sends jobs with the files to be combined to a queue for processing. | Yes |
Queue | A listing queue for a list of jobs to be worked on. | No |
Merge Peer | This is a group of workers that take the source partitions listed in the queue and combine them to create new partitions. These new partitions are then uploaded to the HDX Database bucket. The last step in the process is to update the catalog with the new partition and remove the old ones from being listed. | Yes |
HDX DB Storage Bucket | Contains the database (including partitions), configuration and other state information concerning the platform. Forms part of the core infrastructure. | No |
Catalog | Contains metadata on the database, partitions and job tasks. Forms part of the core infrastructure. | No |
Updated 7 months ago