System Health helps you understand the current state of your Hydrolix cluster. Located in the System Health tab of the Hydrolix portal, you can use this tool to visualize usage and view logs for various components in your cluster.
This page explains the basic information exposed by System Health as well as a few tips and tricks to get the most out of it.
Other Ways to Monitor Cluster Health
You can also view logs in the
hydro.logstable of your cluster.
Enter a set of search criteria. At minimum, you must select a service, log level, and time range. Click the "Retrieve Logs" button to query Hydrolix for the matching logs. System Health does not automatically refresh, so you must click "Retrieve Logs" again to fetch the latest matching logs for a stale query.
By default, System Health displays a chart of log counts for your query. This provides a basic visualization of when matching logs appear in your Hydrolix cluster within the selected time period.
You can change the interval used for count totals by adjusting the time range. Time ranges up to 30 minutes in length use an interval of 1 minute. Time ranges up to 2 hours in length use an interval of 5 minutes. Time ranges larger than 2 hours use 10 minute intervals.
Use the Logs tab to view individual log messages.
On the System Health page, you can filter log messages in several ways:
- services within Hydrolix
- log level
- time range
- service pool within the selected service
- component of the selected service
- string match on log message contents
Within each service, you can select service resource pools and components of internal services.
Hydrolix produces logs with the following potential log levels, ordered from least to most relevant:
- trace: useful for tracking details of data flow throughout Hydrolix
- info: useful for tracking high-level data flow throughout Hydrolix
- warning: indicates a problem from which Hydrolix was able to recover automatically
- error: indicates a problem from which Hydrolix was not able to recover
Use the "Timerange" dropdown to select only logs that occur within a specific window of time. System Health supports the following default ranges:
- last 5 minutes
- last 15 minutes
- last 30 minutes
- last 1 hour
- last 3 hours
- last 6 Hours
- custom range
The selected time range also determines the interval used for count totals in the "Counts" tab. System Health supports time ranges of no more than 6 hours. Logs use the UTC time zone.
System Health supports filtering logs by the following services (comprised of the following apps, components, and containers):
|Containers (pod containers)
Hydrolix uses Kubernetes pods for components and Kubernetes pod containers for containers.
Updated 3 months ago