Metrics for Observability

Use metrics to enhance observability and build dashboards

Overview

Observability can mean different things for different purposes. This guide provides some key metrics to look for when building dashboards, and suggestions to avoid alert fatigue. This guide assumes the use of Kubernetes and Prometheus, and visualization with Grafana dashboards.

Metrics types

There are a number of different metric types used in different areas of Hydrolix.

Node-level metrics

  • CPU usage
  • Memory usage
  • Disk usage
  • Network traffic I/O

Pod and container metrics

  • CPU and memory usage per pod or container
  • Pod status and restarts
  • Resource requests and limits

Control plane metrics

  • Scheduler performance
  • Server latency
  • Hydrolix-specific metrics

Grafana dashboards

Hydrolix uses Prometheus for metrics and Kubernetes for logs, with visualization through Grafana dashboards. Grafana dashboards are flexible and can be filtered to prevent over-alerting while still showing issues as they happen.

See how to Visualize Hydrolix Data in Grafana.

To see a list of metrics used by Hydrolix, see All Metrics.


What’s Next

Visualize Hydrolix Data in Grafana