System Health

System Health helps you understand the current state of your Hydrolix cluster. Located in the System Health tab of the Hydrolix portal, you can use this tool to visualize usage and view logs for various components in your cluster.

This page explains the basic information exposed by System Health as well as a few tips and tricks to get the most out of it.

Other Ways to Monitor Cluster Health

You can also view logs in the hydro.logs table of your cluster.

Usage⚓︎

Enter a set of search criteria. At minimum, you must select a service, log level, and time range. Click the "Retrieve Logs" button to query Hydrolix for the matching logs. System Health does not automatically refresh, so you must click "Retrieve Logs" again to fetch the latest matching logs for a stale query.

Counts⚓︎

By default, System Health displays a chart of log counts for your query. This provides a basic visualization of when matching logs appear in your Hydrolix cluster within the selected time period.

The System Health count tab.

Set Count Interval⚓︎

You can change the interval used for count totals by adjusting the time range. Time ranges up to 30 minutes in length use an interval of 1 minute. Time ranges up to 2 hours in length use an interval of 5 minutes. Time ranges larger than 2 hours use 10 minute intervals.

Logs⚓︎

Use the Logs tab to view individual log messages.

The System Health Logs tab.

Filters⚓︎

On the System Health page, you can filter log messages in several ways:

services within Hydrolix
log level
time range
service pool within the selected service
component of the selected service
container
string match on log message contents

Within each service, you can select service resource pools and components of internal services.

Log Levels⚓︎

Hydrolix produces logs with the following potential log levels, ordered from least to most relevant:

trace: useful for tracking details of data flow throughout Hydrolix
info: useful for tracking high-level data flow throughout Hydrolix
warning: indicates a problem from which Hydrolix was able to recover automatically
error: indicates a problem from which Hydrolix was not able to recover

Set Time Range⚓︎

Use the "Timerange" dropdown to select only logs that occur within a specific window of time. System Health supports the following default ranges:

last 5 minutes
last 15 minutes
last 30 minutes
last 1 hour
last 3 hours
last 6 Hours
custom range

The selected time range also determines the interval used for count totals in the "Counts" tab. System Health supports time ranges of no more than 6 hours. Logs use the UTC time zone.

Filter by Service, Component, and Container⚓︎

System Health supports filtering logs by the following services (comprised of the following apps, components, and containers):

Service	Apps	Components (pods)	Containers (pod containers)
Alter	alter-head alter-peer	rabbitmq
Batch	batch-head batch-peer autoingest intake-api	rabbitmq
Core	turbine-api ui operator	postgres keycloak prometheus
Intake	intake-head	intake-head	intake-head turbine
Kafka	kafka-peer
Kinesis	kinesis-peer
LifeCycle	reaper partition-vacuum prune-locks init-cluster init-turbine-api job-purge decay merge-cleanup rejects-vacum stale-job-monitor task-monitor
Logs	hydrologs
Merge	merge-head merge-peer	rabbitmq	merge-peer merge-peer-ii merge-peer-iii
Query	query-head query-peer	zookeeper traefik	query-peer
Stream (DEPRECATED)	stream-head stream-peer	traefik redpanda	stream-peer
Summary (DEPRECATED)	stream-head summary-peer	traefik redpanda
Validator	validator
Version	version

Hydrolix uses Kubernetes pods for components and Kubernetes pod containers for containers.