System Logs

Viewing system logs

The components of your Hydrolix cluster make their various logs available through various AWS services, as well as direct access.

Hydrolix components based on EC2 instances, such as SQL query handlers, regularly copy their system and application logs into their associated S3 bucket. You may alternatively view these logs directly by logging into these components via SSH.

AWS Logs

AWS Lambda-based components, such as the temporary processes that that manage batch ingestion, add their logs into CloudWatch instead.

Hydrolix's catalog-data component, as a special case, publishes its logs to AWS RDS.

Logs in S3

You can find your cluster's system and other plain-text logs at the path /CLIENT_ID/logs/ within the S3 bucket that the cluster uses. For example, if your Hydrolix Client ID is "hdxcli-example123", then you would find all the system logs from all the clusters associated with that ID at /hdxcli-example123/logs/ within the S3 bucket you use with Hydrolix.

The log directory contains a number of sub-directories named after dates, in YYYY-MM-DD format. Each of these sub-directories contains all the log files that Hydrolix wrote during that day, in UTC time.

Browsing S3 log files

Each Hydrolix log file contains exactly one minute of system log entries from your cluster's components. Hydrolix names these files according to a pattern that lets you filter for logs written during a particular hour or minute, or from a certain component:

LOG_TYPE-INSTANCE_ID-YYYY-MM-DD-HHMM.log.gz

Where:

  • LOG_TYPE is the type of information being logged, with journald for components' operating system logs, and other values for logs written by the components' Hydrolix software. See S3 log filename prefixes for a full list.

  • INSTANCE_ID is the AWS instance ID of the component that generated this log.

  • YYYY-MM-DD-HHMM identifies the specific minute that the log file covers, in UTC time.

So, for example, the file named head-i-0df413e635f4fc35b-2021-04-20-2131.log.gz would contain the Hydrolix-specific logs for a query-head component from 9:31 PM UTC time on April 20, 2021.

(We can also see that the query head's AWS instance ID was "i-0df413e635f4fc35b", but this information is rarely relevant while browsing logs.)

As the .gz extension implies, Hydrolix compresses all these files in Gzip format.

S3 log filename prefixes

Hydrolix's S3 log files have prefixes that identify their purpose or component.

For a more complete exploration of these various components, see Hydrolix Components.

File prefixLog typeComponent type
batch-peerApplicationA batch-ingestion worker.
headApplicationA query head.
journaldSystemAny component type.
kafka-peerApplicationA Kafka ingestion worker.
merge-peerApplicationA merge service worker.
peerApplicationA query worker.
stream-headApplicationA stream-intake head.
stream-peerApplicationA stream-intake worker.

Note that every EC2-based component creates journald logs, even if they also produce component-specific logs. Components that create only journald system logs include the Bastion, Zookeeper, and UI components.

Logs in CloudWatch

Hydrolix components and services that run as temporary AWS Lambda instances write their logs as data into AWS CloudWatch. While you can use this data to power sophisticated monitoring dashboards within the CloudWatch web UI, you can also simply browse these logs as plain text, organized by Hydrolix service and sorted by date. This guide will focus on this simpler use-case.

To see your cluster's CloudWatch logs, select Logs > Log groups from CloudWatch's left navigation bar. The resulting list contains one entry for every CloudWatch-using service across all your clusters. Hydrolix names each of these log groups according to a pattern that identifies the service writing it:

/aws/lambda/CLUSTER_ID-LOG_TYPE

Where:

  • CLUSTER_ID is the Hydrolix ID of the cluster this component belongs to.

  • LOG_TYPE is a brief text tag identifying this service's role within your cluster. See CloudWatch log group suffixes for a full list.

1320

Click on any log group to see all the individual logs it contains, sorted by date and time.

CloudWatch log group suffixes

Hydrolix's CloudWatch log groups have suffixes that identify each one's associated Hydrolix service.

Log group suffixService type
autoingestAn auto-ingestion service
batch-ingest-apiA batch-ingestion API handler.
batch-ingest-headA batch-ingestion process head.
decayPart of the data-aging service.
merge-headPart of the data-merging service.
reaperPart of the data-aging service.

Logs in RDS

The PostgreSQL database that powers your Hydrolix cluster's metadata catalog stores its own logs on AWS RDS.

To view these logs, follow these steps.

  1. Select Databases from RDS's left navigation bar.
  2. On the resulting screen, select the DB identifier that matches your Hydrolix Client ID.
  3. Finally, select the Logs & events tab.

The Logs section of the page you arrive at lists your catalog component's logs, sorted by time. You may select any of these logs and click the View button to see that log's text.

Other logs and metrics

Hydrolix features a number of ways to view your cluster's ongoing operations, including a built-in Prometheus metrics database.

For more help with viewing your system logs, please contact Hydrolix support.

Logs via SSH

As an alternative to browsing logs copied regularly onto S3, you can log into components via SSH in order to browse log files directly. Doing this requires that you configure your deployment for SSH access, as detailed on the page Accessing components with SSH.

Every component writes its system logs to the path /var/log/. You can find service-specific logs in the following paths, on their respective components.

ServiceLocation
batch/var/log/batch/
kafka/var/log/stream/
stream (HTTP)/var/log/stream/
query/var/log/turbine/
merge/var/log/merge/
zookeeper/var/log/zookeeper/