Observability

Your Hydrolix stack includes Prometheus, an open-source metrics database. While your stack runs, Hydrolix continuously updates its Prometheus instance with metrics information. You can query, view, and actively monitor this information through the use of your stack's Grafana instance, after performing a one-time setup.

Using Grafana

Since version 2.10.12 your Grafana instance comes up with default sources built-in:

  • hdx-monitoring-prometheus pointing to Prometheus service and used in our monitoring dashboard
  • hdx-monitoring-query pointing to Hydrolix cluster and used in our monitoring dashboard
  • hdx-query which is the default data source you can use to created dashboard from your Hydrolix Cluster.

If you are using a previous version of Hydrolix you can either upgrade or follow the next step to connect your stack's Prometheus instance as a data source into its Grafana instance.

Preparing the data source

  1. Visit your stack's Grafana instance at https://YOUR-HYDROLIX-HOSTNAME.hydrolix.live/grafana.

  2. Select ⚙️ Configuration from the left menu bar, and then select Data Sources.

  3. Click the Add data source button.

  4. Select Prometheus from the list of compatible data sources.

  5. Enter the name hdx-monitoring-prometheus in the field.

  6. Enter https://YOUR-HYDROLIX-HOSTNAME.hydrolix.live/prometheus as the data source's URL.

Entering your Prometheus instance's URL.Entering your Prometheus instance's URL.

Finally, click Save & Test. The message "Data source is working" should appear immediately, completing this setup.

If you do not see that success message, confirm that your provided URL is correct, and that your stack has at least one prometheus service running. If you still have trouble getting Grafana to connect to Prometheus, please contact Hydrolix support.

Importing Monitoring Dashboard

Hydrolix has deployed built-in monitoring dashboard for our different services in Grafana community dashboard.

You can easily add those dashboards in your Grafana deployment:

  1. Visit your stack's Grafana instance at https://YOUR-HYDROLIX-HOSTNAME.hydrolix.live/grafana.

  2. Select + Create from the left menu bar, and then select Import.

  3. Specify Hydrolix Dashboard ID and click on Load

  4. Hydrolix has several dashboards ID deployed:

    1. 14443 For HTTP Streaming Ingest Monitoring
    2. 14444 For Kafka Ingest Monitoring
    3. 14446 For Merge Service Monitoring
    4. 14447 For Overall System Metrics like CPU, Disk etc
    5. 14442 For Table / Partitions Monitoring
    6. 14543 For an Overview of your Hydrolix Cluster
    7. 14846 For setting up Alert on the different component

Entering your dashboard ID for ImportEntering your dashboard ID for Import

When you import the dashboard it'll requires to select the datasources you should use:

  • hdx-monitoring-prometheus for prometheus
  • hdx-monitoring-query for clickhouse (only for the Table Partition monitoring).

Based on your deployment you may want specific dashboards or all of them.

Querying Grafana

After setting up Prometheus as a data source, you can query, graph, and monitor your Hydrolix metrics through all the tools and techniques Grafana makes available.

For a simple example, select 🧭 Explore from Grafana's left menu bar, and then enter a basic metric-query such as process_open_fds into the text entry field. This results in a multi-line graph showing the open file descriptors in use by several of your stack's components.

To turn this static graph into a dynamic monitor, select 5s from the pull-down menu next to the Run query button. The graph then refreshes itself every five seconds.

Viewing a simple metric query.Viewing a simple metric query.

For a complete list of the metrics that Hydrolix makes available, see Hydrolix's Metrics.

For more information on using Grafana and Prometheus together, you may consult Prometheus's documentation on that topic.

Using Prometheus directly

Prometheus has its own web-based UI, available by visiting https://YOUR-HYDROLIX-HOSTNAME.hydrolix.live/prometheus in your web browser.

This view is far more basic than Grafana's, suitable for quickly entering queries and seeing simple, graphed results. Hydrolix does make this feature available immediately, without any additional setup.

Hydrolix's metrics

This table lists the metrics available, and which components update them.

If more than one component uses a given metric, then querying it will return results from all relevant components. You can restrict results to a specific component by adding a service keyword to your query, e.g. "process_open_fds{service="stream-peer"}".

For more information about metric types, refer to Prometheus's documentation.

General metrics

These metrics track various counters and statistics regarding data ingestion.

MetricTypeComponentsPurpose
bytes_writtenCounterBatch peer, Stream peerBytes written to the indexer.
partitions_createdCounterBatch peer, Stream peerCount of partitions created.
process_cpu_seconds_totalCounterBatch peer, Stream head, Stream peerTotal user and system CPU time spent in seconds.
process_max_fdsGaugeBatch peer, Stream head, Stream peerMaximum number of open file descriptors.
process_open_fdsGaugeBatch peer, Stream head, Stream peerNumber of open file descriptors.
process_resident_memory_bytesGaugeBatch peer, Stream head, Stream peerResident memory size in bytes.
process_start_time_secondsGaugeBatch peer, Stream head, Stream peerStart time of the process since unix epoch in seconds.
process_virtual_memory_bytesGaugeBatch peer, Stream head, Stream peerVirtual memory size in bytes.
process_virtual_memory_max_bytesGaugeBatch peer, Stream head, Stream peerMaximum amount of virtual memory available in bytes.
promhttp_metric_handler_requests_in_flightGaugeBatch peer, Stream head, Stream peerCurrent number of scrapes being served.
promhttp_metric_handler_requests_totalCounterBatch peer, Stream head, Stream peerTotal number of scrapes by HTTP status code.

Batch metrics

These metrics track activity specific to batch ingestions.

MetricTypeComponentsPurpose
processed_countCounterBatch peerCount of items processed.
processed_failureCounterBatch peerCount of processing failures.
processing_duration_histoHistogramBatch peerHistogram of Batch processing durations in milliseconds.
processing_duration_summarySummaryBatch peerSummary of Batch processing durations in milliseconds.
rows_readCounterBatch peerCount of rows read.

Merge metrics

These metrics correspond to Hydrolix's merge service.

MetricTypeComponentsPurpose
merge_duration_summarySummaryMerge peerMerge processing duration, in milliseconds.
merge_duration_histoHistogramMerge peerMerge processing duration, in milliseconds.
merge_sdk_duration_summarySummaryMerge peerMerge SDK processing duration, in milliseconds.
merge_sdk_duration_histoHistogramMerge peerMerge SDK processing duration, in milliseconds.
merge_candidate_histoHistogramMerge peerPartitions per merge candidate.
merge_successCounterMerge peerCount of merge successes.
merge_failureCounterMerge peerCount of merge successes.

Streaming metrics

HTTP Stream Ingest

These metrics are specific to the use of streaming data sources.

MetricTypeComponentsPurpose
http_source_byte_countCounterStream headCount of bytes processed.
http_source_request_countCounterStream headCount of http requests.
http_source_request_duration_nsHistogramStream headA histogram of HTTP request durations in nanoseconds.
http_source_request_error_countCounterStream headCount of http request failures.
http_source_row_countCounterStream headCount of rows processed.
http_source_value_countCounterStream headCount of values processed.
kinesis_source_byte_countCounterStream peerCount of bytes read from Kinesis.
kinesis_source_checkpoint_countCounterStream peerCount of Kinesis checkpoint operations.
kinesis_source_checkpoint_duration_nsHistogramStream peerDuration of Kinesis checkpoint operations.
kinesis_source_checkpoint_error_countCounterStream peerCount of errors in Kinesis checkpoint operations.
kinesis_source_error_countCounterStream peerCount of errors in Kinesis source reads.
kinesis_source_lag_msGaugeStream peerMeasure of lag in Kinesis source.
kinesis_source_operation_countCounterStream peerCount of operations on Kinesis.
kinesis_source_operation_duration_nsHistogramStream peerHistogram of duration of operations on Kinesis.
kinesis_source_record_countCounterStream peerCount of records read from Kinesis.
kinesis_source_row_countCounterStream peerCount of rows read from Kinesis.
kinesis_source_value_countCounterStream peerCount of values read from Kinesis.

Kafka Ingest

These metrics are specific to the use of Kafka data sources.

MetricTypeComponentsPurpose
kafka_source_byte_countCounterStream peerCount of bytes read from Kafka.
kafka_source_commit_duration_nsHistogramStream peerKafka commit duration.
kafka_source_read_countCounterStream peerCount of Kafka reads.
kafka_source_read_duration_nsHistogramStream peerKafka read duration.
kafka_source_read_error_countCounterStream peerCount of Kafka errors.
kafka_source_row_countCounterStream peerCount of rows processed.
kafka_source_value_countCounterStream peerCount of values processed.

Go environment metrics

These metrics track resources used by Hydrolix's Go environments.

MetricTypeComponentsPurpose
go_gc_duration_secondsSummaryBatch peer, Stream head, Stream peerA summary of the pause duration of garbage collection cycles.
go_goroutinesGaugeBatch peer, Stream head, Stream peerNumber of goroutines that currently exist.
go_infoGaugeBatch peer, Stream head, Stream peerInformation about the Go environment.
go_memstats_alloc_bytesGaugeBatch peer, Stream head, Stream peerNumber of bytes allocated and still in use.
go_memstats_alloc_bytes_totalCounterBatch peer, Stream head, Stream peerTotal number of bytes allocated, even if freed.
go_memstats_buck_hash_sys_bytesGaugeBatch peer, Stream head, Stream peerNumber of bytes used by the profiling bucket hash table.
go_memstats_frees_totalCounterBatch peer, Stream head, Stream peerTotal number of frees.
go_memstats_gc_cpu_fractionGaugeBatch peer, Stream head, Stream peerThe fraction of this program's available CPU time used by the GC since the program started.
go_memstats_gc_sys_bytesGaugeBatch peer, Stream head, Stream peerNumber of bytes used for garbage collection system metadata.
go_memstats_heap_alloc_bytesGaugeBatch peer, Stream head, Stream peerNumber of heap bytes allocated and still in use.
go_memstats_heap_idle_bytesGaugeBatch peer, Stream head, Stream peerNumber of heap bytes waiting to be used.
go_memstats_heap_inuse_bytesGaugeBatch peer, Stream head, Stream peerNumber of heap bytes that are in use.
go_memstats_heap_objectsGaugeBatch peer, Stream head, Stream peerNumber of allocated objects.
go_memstats_heap_released_bytesGaugeBatch peer, Stream head, Stream peerNumber of heap bytes released to OS.
go_memstats_heap_sys_bytesGaugeBatch peer, Stream head, Stream peerNumber of heap bytes obtained from system.
go_memstats_last_gc_time_secondsGaugeBatch peer, Stream head, Stream peerNumber of seconds since 1970 of last garbage collection.
go_memstats_lookups_totalCounterBatch peer, Stream head, Stream peerTotal number of pointer lookups.
go_memstats_mallocs_totalCounterBatch peer, Stream head, Stream peerTotal number of mallocs.
go_memstats_mcache_inuse_bytesGaugeBatch peer, Stream head, Stream peerNumber of bytes in use by mcache structures.
go_memstats_mcache_sys_bytesGaugeBatch peer, Stream head, Stream peerNumber of bytes used for mcache structures obtained from system.
go_memstats_mspan_inuse_bytesGaugeBatch peer, Stream head, Stream peerNumber of bytes in use by mspan structures.
go_memstats_mspan_sys_bytesGaugeBatch peer, Stream head, Stream peerNumber of bytes used for mspan structures obtained from system.
go_memstats_next_gc_bytesGaugeBatch peer, Stream head, Stream peerNumber of heap bytes when next garbage collection will take place.
go_memstats_other_sys_bytesGaugeBatch peer, Stream head, Stream peerNumber of bytes used for other system allocations.
go_memstats_stack_inuse_bytesGaugeBatch peer, Stream head, Stream peerNumber of bytes in use by the stack allocator.
go_memstats_stack_sys_bytesGaugeBatch peer, Stream head, Stream peerNumber of bytes obtained from system for stack allocator.
go_memstats_sys_bytesGaugeBatch peer, Stream head, Stream peerNumber of bytes obtained from system.
go_threadsGaugeBatch peer, Stream head, Stream peerNumber of OS threads created.

Did this page help you?