Hydrolix relies on Grafana and prometheus for observability of the infrastructure.
By default we deploy an alerting dashboard which you can customise based on the threshold you need.
Log into Grafana and find the dashboard
Platform Alerts in the Hydrolix Monitoring folder:
In this page by default you'll see the current alert for your platform like that:
We split up the dashboard in different alerting component:
- Batch Ingest
- Kafka Ingest
- Streaming Ingest
- Shared Component - such as zookeeper, Web UI, prometheus etc
- Visualisation - Superset and Grafana
- System Monitoring - overall CPU usage, disk etc...
All our chart have default threshold that may need to be modified based on your needs.
For example you have 10 query peers running and want to be alerted if one is not available.
You should modify the chart
Query Server Count
Click on Edit for the Chart like the following
This will open a view similar to this one:
As you can see we are monitoring the number of query heads and the number of the query peers (more details on the different function here)
Now we want to customise the alert to specify the number of host we are expecting, click on
As you can see we are checking every minute for each query and we are looking the last value of the response within 5 minutes.
Here we'll trigger an alert if we have less than 1 query head and 1 query peer.
You can customise each chart like that based on your deployment and your infrastructure needs for Hydrolix.
Now that we have the proper threshold we can finally send the alert to the proper endpoint.
You can configure your Alert endpoint in the Alert Menu on Grafana, click on
New Channel and choose from the list of notification mechanism
Once the notification channel is created you can edit the different alert from the dashboard and specify the notification channel in the
Send To list, so back to our example it'd be:
Updated 27 days ago