System Health
Investigate the status of your Hydrolix cluster.
System Health helps you understand the current state of your Hydrolix cluster. Located in the System Health tab of the Hydrolix portal, you can use this tool to visualize usage and view logs for various components in your cluster.
This page explains the basic information exposed by System Health as well as a few tips and tricks to get the most out of it.
Other Ways to Monitor Cluster Health
You can also view logs in the
hydro.logs
table of your cluster.
Usage
Enter a set of search criteria. At minimum, you must select a service, log level, and time range. Click the "Retrieve Logs" button to query Hydrolix for the matching logs. System Health does not automatically refresh, so you must click "Retrieve Logs" again to fetch the latest matching logs for a stale query.
Counts
By default, System Health displays a chart of log counts for your query. This provides a basic visualization of when matching logs appear in your Hydrolix cluster within the selected time period.
Set Count Interval
You can change the interval used for count totals by adjusting the time range. Time ranges up to 30 minutes in length use an interval of 1 minute. Time ranges up to 2 hours in length use an interval of 5 minutes. Time ranges larger than 2 hours use 10 minute intervals.
Logs
Use the Logs tab to view individual log messages.
Filters
On the System Health page, you can filter log messages in several ways:
- services within Hydrolix
- log level
- time range
- service pool within the selected service
- component of the selected service
- container
- string match on log message contents
Within each service, you can select service resource pools and components of internal services.
Log Levels
Hydrolix produces logs with the following potential log levels, ordered from least to most relevant:
- trace: useful for tracking details of data flow throughout Hydrolix
- info: useful for tracking high-level data flow throughout Hydrolix
- warning: indicates a problem from which Hydrolix was able to recover automatically
- error: indicates a problem from which Hydrolix was not able to recover
Set Time Range
Use the "Timerange" dropdown to select only logs that occur within a specific window of time. System Health supports the following default ranges:
- last 5 minutes
- last 15 minutes
- last 30 minutes
- last 1 hour
- last 3 hours
- last 6 Hours
- custom range
The selected time range also determines the interval used for count totals in the "Counts" tab. System Health supports time ranges of no more than 6 hours. Logs use the UTC time zone.
Filter by Service, Component, and Container
System Health supports filtering logs by the following services (comprised of the following apps, components, and containers):
Service | Apps | Components (pods) | Containers (pod containers) |
---|---|---|---|
Alter | alter-head alter-peer | rabbitmq | |
Batch | batch-head batch-peer autoingest intake-api | rabbitmq | |
Core | turbine-api ui operator | postgres keycloak prometheus | |
Intake | intake-head | intake-head | intake-head turbine |
Kafka | kafka-peer | ||
Kinesis | kinesis-peer | ||
LifeCycle | reaper partition-vacuum prune-locks init-cluster init-turbine-api job-purge decay merge-cleanup rejects-vacum stale-job-monitor task-monitor | ||
Logs | hydrologs | ||
Merge | merge-head merge-peer | rabbitmq | merge-peer merge-peer-ii merge-peer-iii |
Query | query-head query-peer | zookeeper traefik | query-peer |
Stream (DEPRECATED) | stream-head stream-peer | traefik redpanda | stream-peer |
Summary (DEPRECATED) | stream-head summary-peer | traefik redpanda | |
Validator | validator | ||
Version | version |
Hydrolix uses Kubernetes pods for components and Kubernetes pod containers for containers.
Updated about 2 months ago