Compact & Optimize Data (Merge)

Hydrolix includes an automated compaction and optimization Merge service as part of its data lifecycle. The Merge service takes small partitions and combines them into larger, more efficient ones, optimizing data for more performant queries and a smaller storage footprint.

Below is a diagram of data partitions in storage. During the merge process, three smaller partitions on the left are combined to create one larger partition on the right with a longer time interval.

Logo Light Logo Dark

For more information about how the Merge service fits into the rest of Hydrolix, see the Merge page in our platform documentation.

Enable Merge on Tables

All tables have merge enabled by default. You can enable and disable merge with the PATCH table API endpoint. Disabling merge will immediately stop new merge jobs from running, but any merge jobs already in the queue will still run.

🚧

Disable merge only under special circumstances

Disabling merge is not recommended, and may result in performance degradation.

For example, this API request will enable merge for a given table:

PATCH {{base_url}}orgs/{{org_id}}/projects/{{project_id}}/tables/{{table_id}}
Authorization: Bearer {{access_token}}
Content-Type: application/json
Accept: application/json

{
    "settings": {
        "merge": {
            "enabled": true
        }
    }
}

You can also disable merge for a given table through the web UI. Navigate to "Data," select the table you want, then find "merge settings" under “Advanced options” and click on the three dots in that row on the right. You can then select the "Disable Merge" checkbox from the menu.

🚧

Disable merge only under special circumstances

Disabling merge is not recommended, and may result in performance degradation.

Merge Pools

Hydrolix clusters create merge components in three pools: small, medium and large. These three sizes each handle different partitions that are differentiated by several criteria. This ensures that your cluster achieves optimal partition sizing and spreads merge workloads across old and new data.

The following table shows the criteria used to assign partitions to merge pools, as of version 4.17.0.

If the max Primary Timestamp is:...and the partition width is within:Resulting Merge PoolTarget Partition Size
Less than 10 minutes old1 hoursmall (merge-i)1 GB
From 10 to 70 minutes old1 hourmedium (merge-ii)2 GB
Greater than 70 minutes old1 hourlarge (merge-iii)4 GB

For example, reading across the table above from left to right: if a partition's last timestamp was 15 minutes ago, and it was 513 MB in size and 37 minutes in width, it would be sent to the medium pool.

📘

Primary timestamp

For more information on primary timestamps, see Timestamp Data Types.

Custom Merge Pools

All tables have merge enabled by default, but sometimes you might want to create additional merge pools targeted at specific tables to separate merge workloads and avoid “noisy neighbor” effects. For example, you might create a special merge pool to handle merge within a Summary Table, distancing that workload from the main merge process.

Create custom merge pools with the pools API endpoint, then apply those pools by configuring the small, medium, or large body parameters for thesettings.merge.pools object when creating or updating a table.

You can read more about pools, including merge pools, on our Resource Pools page.

Creating Pools

The following Config API command creates a custom pool by means of the pools API endpoint over HTTP:

POST {{base_url}}pools/
Authorization: Bearer {{access_token}}
Content-Type: application/json

{
     "settings": {
          "is_default": false,
          "k8s_deployment": {
               "service": "merge-peer",
               "scale_profile": "II"
          }
     },
     "name": "my-pool-name-II"
}

You can also do this in the UI by selecting the "Add new" upper right-hand menu, then "Resource pool."

Use the following settings to configure your pool:

ObjectDescriptionValue/Example
serviceThe service workload the pool will utilize. For merge, this is merge-peer.merge-peer
scale_profileThe merge pool size, corresponding to small, medium, or large.I, II or III
nameThe name used to identify your pool.Example: my-pool-name-II
cpuThe amount of CPU provided to pods.A numeric value, defaults are specified in Scale Profiles. Example : 2
memoryThe amount of memory provided to pods.A string value, defaults are specified in Scale Profiles. Default units are Gi. Example:10Gi
replicasThe number of pods to run in the pool.A numeric value or hyphenated range. Defaults are specified in Scale Profiles. Examples: 3 and 1-5
storageThe amount of ephemeral storage provided to pods.A string value, defaults are specified in Scale Profiles. Default units are Gi. Example: 5Gi

Assigning Pools to Tables

The following API request assigns a set of custom pools to a table with the tables API endpoint:

PATCH {{base_url}}/orgs/{{org_uuid}}/projects/{{project_uuid}}/tables/{{table_uuid}}/
Authorization: Bearer {{access_token}}
Content-Type: application/json

{
    "name": "my-table",
    "settings": {
        "merge": {
            "enabled": true,
            "pools": {
                "large": "my-pool-name-III",
                "medium": "my-pool-name-II",
                "small": "my-pool-name-I"
            }
        }
    }
}

You can also configure this in the UI. Navigate to "Data", select the table to which you want to assign new pools, then find "Merge settings" under "Advanced options." You'll see this menu:

📘

Use all three pools

For optimal merge performance, provide a large, medium, and small pool.