Skip to content

Merge controller: HTTP endpoints

Prerequisites⚓︎

You should have already deployed a Tooling pod by following the steps on the tooling pod page.

Merge controller: HTTP endpoints⚓︎

The merge-controller pod's HTTP interface provides 3 endpoints on port 9001 which return the same data for different target groups in JSON representation:

  • /admin/efficiencies - Merge efficiencies for all projects, tables, and targets
  • /admin/efficiency/{project_id} - Merge efficiencies for all tables and targets for a given project
  • /admin/efficiency/{project_id}/{table_id} - Merge efficiencies for all targets for a given project and table

Each endpoint produces the same structure for its output with different filtering:

[
  {
    "project": {
      "id": "{project_uuid}",
      "name": "hydro"
    },
    "table": {
      "id": "{table_uuid}",
      "name": "logs"
    },
    "pool": "merge-peer-iii",
    "efficiency": {
      "target": 4000000000,
      "ideal_partition_count": 18,
      "actual_partition_count": 751,
      "efficiency": 0.02
    },
    "distribution": {
      "buckets": [
        {
          "upper_bound": 440000000,
          "count": 7501
        },
        {
          "upper_bound": 880000000,
          "count": 9
        }
      ],
      "over_range": 0
    }
  }
]

The efficiency and distribution keys are optional. They are present when merge-controller has enough data to calculate them for a given merge target.

Example usage⚓︎

  1. Get the IP for the Merge controller component by running

    kubectl get service/merge-controller --namespace=$HDX_KUBERNETES_NAMESPACE
    
  2. Shell into the tooling pod and run:

    curl -X GET "http://{merge-controller-ip}:9001/admin/efficiencies" | jq .
    

To get the actual and ideal partition counts:

sum by (target, project_name, table_name) (actual_partition_count{project_name="$Project", table_name="$Table"})
sum by (target, project_name, table_name) (ideal_partition_count{project_name="$Project", table_name="$Table"})

Given the current required memory for all partitions in the target, what is the ideal count if each partition was at the target size:

sum by (target, project_name, table_name) (efficiency{project_name="$Project", table_name="$Table"})
histogram_quantile(0.99, sum by(le) ((partition_distribution_bucket{project_name="$Project", table_name="$Table"})))

Merge controller: Metrics⚓︎

In addition to the HTTP endpoints, the merge-controller tracks and provides canonical metrics describing merge efficiency for any given merge work. A merge target is a unique triple of project, table, and era, with era being a time bound. Each merge target's era matches one of the defined Merge Pools, in increasing lengths of time range.

Each merge target itself has a configured memory target:

  • Era I: 1Gi
  • Era II: 2Gi
  • Era III: 4Gi

Maximum efficiency for a merge target would be the fewest number of partitions whose memory calculation was up to the configured memory target.

Put another way, the calculation for efficiency of a merge target is:

actual_partition_count / ideal_partition_count

where ideal_partition_count is sum({memory_calculation}) / {memory_target} and memory_calculation is mem_size for normal tables, or mem_size + uncompressed_mem_size for summary tables.

This value is presented typically as a float between 0.0 and 1.0 (a percentage).

merge-controller provides in-depth merge target efficiency information within the following Prometheus metrics:

Metric Name Type Components Description Labels
efficiency Gauge Merge Controller A float between 0.0 and 1.0 reported per-merge-target (project, table, era), with a higher value indicating better merge efficiency. A value close to 0.0 means the target is almost completely unmerged (most partitions are tiny), while a value approaching 1.0 means the target is as merged as it could be (most partitions being close to the target memory). project_id project_name table_id table_name target
ideal_partition_count Gauge Merge Controller Given the current required memory for all partitions in the target, what's the ideal number of partitions if each partition was at the target size project_id project_name table_id table_name target
actual_partition_count Gauge Merge Controller Current count of partitions in the target project_id project_name table_id table_name target

Additional merge-related metrics can be found on the Merge Metrics page.