Metrics for Observability
Use metrics to enhance observability and build dashboards
Overview
Observability can mean different things for different purposes. This guide provides some key metrics to look for when building dashboards, and suggestions to avoid alert fatigue. This guide assumes the use of Kubernetes and Prometheus, and visualization with Grafana dashboards.
Metrics types
There are a number of different metric types used in different areas of Hydrolix.
Node-level metrics
- CPU usage
- Memory usage
- Disk usage
- Network traffic I/O
Pod and container metrics
- CPU and memory usage per pod or container
- Pod status and restarts
- Resource requests and limits
Control plane metrics
- Scheduler performance
- Server latency
- Hydrolix-specific metrics
Grafana dashboards
Hydrolix uses Prometheus for metrics and Kubernetes for logs, with visualization through Grafana dashboards. Grafana dashboards are flexible and can be filtered to prevent over-alerting while still showing issues as they happen.
See how to Visualize Hydrolix Data in Grafana.
To see a list of metrics used by Hydrolix, see All Metrics.
Updated 2 days ago