Merge controller: HTTP endpoints

Prerequisites⚓︎

You should have already deployed a Tooling pod by following the steps on the tooling pod page.

Merge controller: HTTP endpoints⚓︎

The merge-controller pod's HTTP interface provides 3 endpoints on port 9001 which return the same data for different target groups in JSON representation:

/admin/efficiencies - Merge efficiencies for all projects, tables, and targets
/admin/efficiency/{project_id} - Merge efficiencies for all tables and targets for a given project
/admin/efficiency/{project_id}/{table_id} - Merge efficiencies for all targets for a given project and table

Each endpoint produces the same structure for its output with different filtering:

[
  {
    "project": {
      "id": "{project_uuid}",
      "name": "hydro"
    },
    "table": {
      "id": "{table_uuid}",
      "name": "logs"
    },
    "pool": "merge-peer-iii",
    "efficiency": {
      "target": 4000000000,
      "ideal_partition_count": 18,
      "actual_partition_count": 751,
      "efficiency": 0.02
    },
    "distribution": {
      "buckets": [
        {
          "upper_bound": 440000000,
          "count": 7501
        },
        {
          "upper_bound": 880000000,
          "count": 9
        }
      ],
      "over_range": 0
    }
  }
]

The efficiency and distribution keys are optional. They are present when merge-controller has enough data to calculate them for a given merge target.

Example usage⚓︎

Get the IP for the Merge controller component by running

kubectl get service/merge-controller --namespace=$HDX_KUBERNETES_NAMESPACE

Shell into the tooling pod and run:

curl -X GET "http://{merge-controller-ip}:9001/admin/efficiencies" | jq .

To get the actual and ideal partition counts:

Get actual partition countGet ideal partition count

1	`sum by (target, project_name, table_name) (actual_partition_count{project_name="$Project", table_name="$Table"})`

1	`sum by (target, project_name, table_name) (ideal_partition_count{project_name="$Project", table_name="$Table"})`

Given the current required memory for all partitions in the target, what is the ideal count if each partition was at the target size:

1	`sum by (target, project_name, table_name) (efficiency{project_name="$Project", table_name="$Table"})`

1	`histogram_quantile(0.99, sum by(le) ((partition_distribution_bucket{project_name="$Project", table_name="$Table"})))`

Merge controller: Metrics⚓︎

In addition to the HTTP endpoints, the merge-controller tracks and provides canonical metrics describing merge efficiency for any given merge work. A merge target is a unique triple of project, table, and era, with era being a time bound. Each merge target's era matches one of the defined Merge Pools, in increasing lengths of time range.

Each merge target itself has a configured memory target:

Era I: 1Gi
Era II: 2Gi
Era III: 4Gi

Maximum efficiency for a merge target would be the fewest number of partitions whose memory calculation was up to the configured memory target.

Put another way, the calculation for efficiency of a merge target is:

1	`actual_partition_count / ideal_partition_count`

where ideal_partition_count is sum({memory_calculation}) / {memory_target} and memory_calculation is mem_size for normal tables, or mem_size + uncompressed_mem_size for summary tables.

This value is presented typically as a float between 0.0 and 1.0 (a percentage).

merge-controller provides in-depth merge target efficiency information within the following Prometheus metrics:

Metric Name	Type	Components	Description	Labels
`efficiency`	Gauge	Merge Controller	A float between `0.0` and `1.0` reported per-merge-target (project, table, era), with a higher value indicating better merge efficiency. A value close to `0.0` means the target is almost completely unmerged (most partitions are tiny), while a value approaching `1.0` means the target is as merged as it could be (most partitions being close to the target memory).	`project_id` `project_name` `table_id` `table_name` `target`
`ideal_partition_count`	Gauge	Merge Controller	Given the current required memory for all partitions in the target, what's the ideal number of partitions if each partition was at the target size	`project_id` `project_name` `table_id` `table_name` `target`
`actual_partition_count`	Gauge	Merge Controller	Current count of partitions in the target	`project_id` `project_name` `table_id` `table_name` `target`

Additional merge-related metrics can be found on the Merge Metrics page.