Batch Metrics

The Hydrolix stack includes Prometheus, an open-source metrics database. While the Stack runs, Hydrolix continuously updates its Prometheus instance with metrics information.

Using Prometheus directly

Prometheus has its own web-based UI, available by visiting https://<YourHostname/prometheus in your web browser.

This view is a basic metric view, suitable for quickly entering queries and seeing simple, graphed results. Hydrolix does make this feature available immediately, without any additional setup.

For more information about metric types, refer to Prometheus's documentation.

Hydrolix's metrics

If more than one component uses a given metric, then querying it will return results from all relevant components. You can restrict results to a specific component by adding a service keyword to your query, e.g. "process_open_fds{service="stream-peer"}".

For more information about metric types, refer to Prometheus's documentation.

Each of the Ingest method peers has multiple containers. One container that will do message acquisition and the other is the indexer which will index and complete enrichment jobs.

Batch metrics

These metrics track activity specific to batch ingestions.

MetricTypePurpose
processed_countCounterCount of items processed.
processed_failureCounterCount of processing failures.
processing_duration_histo_count/bucket/sumHistogramHistogram of Batch processing durations in milliseconds.
processing_duration_summarySummarySummary of Batch processing durations in milliseconds.
processing_duration_summary_count/sumSummarySummary of Batch processing durations in milliseconds.
query_countCounterCount of calls to the Catalog.
query_failureCounterCount of failed Catalog calls.
query_latency_summaryCounterLatency in calls to catalog.
query_latency_summary_count/sumCounterLatency in calls to catalog.

In addition to Stream-Peer metrics Indexer metrics should also be reviewed (below).

Indexer Metrics

Indexer metrics are metrics that cover the indexing and enrichment of data being ingested. These are available across all of the above peer components.

MetricTypeComponentsPurpose
indexer_rows_written_count/bucket/sumHistogramBatch (inc. Autoingest), Kafka, Kinesis, Stream HTTPTotal rows indexed (written to partitions)
indexer_bytes_written_count/bucket/sumHistogramBatch (inc. Autoingest), Kafka, Kinesis, Stream HTTPTotal bytes indexed (written to partitions)
indexer_partitions_rejected_count/bucket/sumHistogramBatch (inc. Autoingest), Kafka, Kinesis, Stream HTTPHistogram of partitions not able to written. If value is 0=raw data parsing failed, 1=raw data / transform schema mismatch, 3=Error writing partition file, 4= Other Error during indexing
indexer_partitions_written_count/bucket/sumHistogramBatch (inc. Autoingest), Kafka, Kinesis, Stream HTTPTotal partitions created
indexer_partition_write_seconds_count/bucket/sumHistogramBatch (inc. Autoingest), Kafka, Kinesis, Stream HTTPTime from receiving indexing query to writing partition file (seconds)
hdx_sink_row_countCounterBatch (inc. Autoingest), Kafka, Kinesis, Stream HTTPCount of rows processed by the indexer and uploaded to storage. Includes Hot and Cold reporting.
hdx_sink_byte_countCounterBatch (inc. Autoingest), Kafka, Kinesis, Stream HTTPCount of bytes processed by the indexer and uploaded to storage. Includes Hot and Cold reporting.
hdx_sink_value_countCounterBatch (inc. Autoingest), Kafka, Kinesis, Stream HTTPCount of values processed by the indexer and uploaded to storage. Includes Hot and Cold reporting.
hdx_sink_error_countCounterBatch (inc. Autoingest), Kafka, Kinesis, Stream HTTPCount of errors in indexing and uploading to storage.

RabbitMQ

The following are suggested metrics to monitor your RabbitMQ service

MetricTypePurpose
rabbitmq_queue_messagesCounterSum of ready and unacknowledged messages (queue depth).
rabbitmq_queuesCounterRabbitMQ Queues count.
erlang_vm_statistics_bytes_received_totalCounterThe total number of bytes received through RabbitMQ ports.
erlang_vm_statistics_bytes_output_totalCounterThe total number of bytes output through RabbitMQ ports.

Additional Metrics on can be found on the RabbitMQ Site

Storage

Cloud Storage metrics.

MetricTypeComponentsPurpose
net_http_status_code_bucketCounterBatch (inc. Autoingest), Kafka, Kinesis, Stream HTTPHTTP Status Code histogram count from Storage.

Additional Metrics

Additional Metrics are provided for the management and control of different components.

General Metrics.

These metrics track various counters and statistics regarding data ingestion.

MetricTypePurpose
process_cpu_seconds_totalCounterTotal user and system CPU time spent in seconds.
process_max_fdsGaugeMaximum number of open file descriptors.
process_open_fdsGaugeNumber of open file descriptors.
process_resident_memory_bytesGaugeResident memory size in bytes.
process_start_time_secondsGaugeStart time of the process since unix epoch in seconds.
process_virtual_memory_bytesGaugeVirtual memory size in bytes.
process_virtual_memory_max_bytesGaugeMaximum amount of virtual memory available in bytes.
promhttp_metric_handler_requests_in_flightGaugeCurrent number of scrapes being served.
promhttp_metric_handler_requests_totalCounterTotal number of scrapes by HTTP status code.

Go environment metrics

These metrics track resources used by Hydrolix's Go environments.

MetricTypePurpose
go_gc_duration_secondsSummaryA summary of the pause duration of garbage collection cycles.
go_goroutinesGaugeNumber of goroutines that currently exist.
go_infoGaugeInformation about the Go environment.
go_memstats_alloc_bytesGaugeNumber of bytes allocated and still in use.
go_memstats_alloc_bytes_totalCounterTotal number of bytes allocated, even if freed.
go_memstats_buck_hash_sys_bytesGaugeNumber of bytes used by the profiling bucket hash table.
go_memstats_frees_totalCounterTotal number of frees.
go_memstats_gc_cpu_fractionGaugeThe fraction of this program's available CPU time used by the GC since the program started.
go_memstats_gc_sys_bytesGaugeNumber of bytes used for garbage collection system metadata.
go_memstats_heap_alloc_bytesGauge
go_memstats_heap_idle_bytesGaugeNumber of heap bytes waiting to be used.
go_memstats_heap_inuse_bytesGaugeNumber of heap bytes that are in use.
go_memstats_heap_objectsGaugeNumber of allocated objects.
go_memstats_heap_released_bytesGaugeNumber of heap bytes released to OS.
go_memstats_heap_sys_bytesGaugeNumber of heap bytes obtained from system.
go_memstats_last_gc_time_secondsGaugeNumber of seconds since 1970 of last garbage collection.
go_memstats_lookups_totalCounterTotal number of pointer lookups.
go_memstats_mallocs_totalCounterTotal number of mallocs.
go_memstats_mcache_inuse_bytesGaugeNumber of bytes in use by mcache structures.
go_memstats_mcache_sys_bytesGaugeNumber of bytes used for mcache structures obtained from system.
go_memstats_mspan_inuse_bytesGaugeNumber of bytes in use by mspan structures.
go_memstats_mspan_sys_bytesGaugeNumber of bytes used for mspan structures obtained from system.
go_memstats_next_gc_bytesGaugeNumber of heap bytes when next garbage collection will take place.
go_memstats_other_sys_bytesGaugeNumber of bytes used for other system allocations.
go_memstats_stack_inuse_bytesGaugeNumber of bytes in use by the stack allocator.
go_memstats_stack_sys_bytesGaugeNumber of bytes obtained from system for stack allocator.
go_memstats_sys_bytesGaugeNumber of bytes obtained from system.
go_threadsGaugeNumber of OS threads created.