System Health

Investigate the status of your Hydrolix cluster.

System Health helps you understand the current state of your Hydrolix cluster. Located in the System Health tab of the Hydrolix portal, you can use this tool to visualize usage and view logs for various components in your cluster.

This page explains the basic information exposed by System Health as well as a few tips and tricks to get the most out of it.

📘

Other Ways to Monitor Cluster Health

You can also view logs in the hydro.logs table of your cluster.

Usage

Enter a set of search criteria. At minimum, you must select a service, log level, and time range. Click the "Retrieve Logs" button to query Hydrolix for the matching logs. System Health does not automatically refresh, so you must click "Retrieve Logs" again to fetch the latest matching logs for a stale query.

Counts

By default, System Health displays a chart of log counts for your query. This provides a basic visualization of when matching logs appear in your Hydrolix cluster within the selected time period.

The System Health count tab.

Set Count Interval

You can change the interval used for count totals by adjusting the time range. Time ranges up to 30 minutes in length use an interval of 1 minute. Time ranges up to 2 hours in length use an interval of 5 minutes. Time ranges larger than 2 hours use 10 minute intervals.

Logs

Use the Logs tab to view individual log messages.

The System Health Logs tab.

Filters

On the System Health page, you can filter log messages in several ways:

  • services within Hydrolix
  • log level
  • time range
  • service pool within the selected service
  • component of the selected service
  • container
  • string match on log message contents

Within each service, you can select service resource pools and components of internal services.

Log Levels

Hydrolix produces logs with the following potential log levels, ordered from least to most relevant:

  • trace: useful for tracking details of data flow throughout Hydrolix
  • info: useful for tracking high-level data flow throughout Hydrolix
  • warning: indicates a problem from which Hydrolix was able to recover automatically
  • error: indicates a problem from which Hydrolix was not able to recover

Set Time Range

Use the "Timerange" dropdown to select only logs that occur within a specific window of time. System Health supports the following default ranges:

  • last 5 minutes
  • last 15 minutes
  • last 30 minutes
  • last 1 hour
  • last 3 hours
  • last 6 Hours
  • custom range

The selected time range also determines the interval used for count totals in the "Counts" tab. System Health supports time ranges of no more than 6 hours. Logs use the UTC time zone.

Filter by Service, Component, and Container

System Health supports filtering logs by the following services (comprised of the following apps, components, and containers):

ServiceAppsComponents (pods)Containers (pod containers)
Alteralter-head
alter-peer
rabbitmq
Batchbatch-head
batch-peer
autoingest
intake-api
rabbitmq
Coreturbine-api
ui
operator
postgres
keycloak
prometheus
Intakeintake-headintake-headintake-head
turbine
Kafkakafka-peer
Kinesiskinesis-peer
LifeCyclereaper
partition-vacuum
prune-locks
init-cluster
init-turbine-api
job-purge
decay
merge-cleanup
rejects-vacum
stale-job-monitor
task-monitor
Logshydrologs
Mergemerge-head
merge-peer
rabbitmqmerge-peer
merge-peer-ii
merge-peer-iii
Queryquery-head
query-peer
zookeeper
traefik
query-peer
Stream (DEPRECATED)stream-head
stream-peer
traefik
redpanda
stream-peer
Summary (DEPRECATED)stream-head
summary-peer
traefik
redpanda
Validatorvalidator
Versionversion

Hydrolix uses Kubernetes pods for components and Kubernetes pod containers for containers.