Prometheus Integration

The Hydrolix stack includes Prometheus, an open-source metrics database. While the Stack runs, Hydrolix continuously updates its Prometheus instance with metrics information. You can query, view, and actively monitor this information through the use of a stack's Grafana instance or you can access it via your own monitoring platform.

Using Prometheus directly

Prometheus has its own web-based UI, available by visiting https://YOUR-HYDROLIX-HOSTNAME.hydrolix.live/prometheus in your web browser.

This view is far more basic than Grafana's, suitable for quickly entering queries and seeing simple, graphed results. Hydrolix does make this feature available immediately, without any additional setup.

Hydrolix's metrics

This table lists the metrics available, and which components update them.

If more than one component uses a given metric, then querying it will return results from all relevant components. You can restrict results to a specific component by adding a service keyword to your query, e.g. "process_open_fds{service="stream-peer"}".

For more information about metric types, refer to Prometheus's documentation.

General metrics

These metrics track various counters and statistics regarding data ingestion.

MetricTypeComponentsPurpose
bytes_writtenCounterBatch peer, Stream peerBytes written to the indexer.
partitions_createdCounterBatch peer, Stream peerCount of partitions created.
process_cpu_seconds_totalCounterBatch peer, Stream head, Stream peerTotal user and system CPU time spent in seconds.
process_max_fdsGaugeBatch peer, Stream head, Stream peerMaximum number of open file descriptors.
process_open_fdsGaugeBatch peer, Stream head, Stream peerNumber of open file descriptors.
process_resident_memory_bytesGaugeBatch peer, Stream head, Stream peerResident memory size in bytes.
process_start_time_secondsGaugeBatch peer, Stream head, Stream peerStart time of the process since unix epoch in seconds.
process_virtual_memory_bytesGaugeBatch peer, Stream head, Stream peerVirtual memory size in bytes.
process_virtual_memory_max_bytesGaugeBatch peer, Stream head, Stream peerMaximum amount of virtual memory available in bytes.
promhttp_metric_handler_requests_in_flightGaugeBatch peer, Stream head, Stream peerCurrent number of scrapes being served.
promhttp_metric_handler_requests_totalCounterBatch peer, Stream head, Stream peerTotal number of scrapes by HTTP status code.
upload_durationSummaryAny intake peerTime spent uploading a file, in milliseconds

Query metrics

These metrics track activity specific to batch ingestions.

MetricTypeComponentsPurpose
net_connect_attempts_totalHistogramHead/Query peerHistogram of TCP connection attempted to storage service
net_connect_secondsHistogramHead/Query peerHistogram of time to connect over TCP to storage service in seconds
net_dns_resolve_secondsHistogramHead/Query peerHistogram of DNS resolution time to storage service in seconds.
net_http_response_timeHistogramHead/Query peerHistogram HTTP response time to storage service in seconds
net_http_response_bytesHistogramHead/Query peerHistogram of HTTP bytes downloaded from the storage service
net_http_attempts_totalHistogramHead/Query peerHistogram of HTTP connection attempted to storage service
net_http_status_codeHistogramHead/Query peerHistogram of HTTP status code result from storage service
vfs_cache_hitmiss_totalHistogramHead/Query peerHistogram of cache status if bucket = 0 cache miss, and 1 cache hit
vfs_cache_read_bytesHistogramHead/Query peerHistogram bytes read from cache
vfs_net_read_bytesHistogramHead/Query peerHistogram bytes read from network
vfs_cache_lru_file_eviction_totalHistogramHead/Query peerHistogram cache eviction of files
epoll_cpu_secondsHistogramHead/Query peerHistogram CPU used in seconds
epoll_io_secondsHistogramHead/Query peerHistogram I/O in seconds
epoll_poll_secondsHistogramHead/Query peerHistogram wait for file descriptor in seconds
hdx_storage_r_catalog_partitions_totalHistogramHead/Query peerHistogram of per query catalog partition count
hdx_storage_r_partitions_read_totalHistogramHead/Query peerHistogram of per query partition read count
hdx_storage_r_partitions_per_core_totalHistogramHead/Query peerHistogram of per core partition used count
hdx_storage_r_peers_used_totalHistogramQuery peerHistogram of storage used total
hdx_storage_r_cores_used_totalHistogramQuery peerHistogram of Cores used total
hdx_storage_r_catalog_timerangeHistogramHead/Query peerHistogram of query time range distribution
hdx_partition_columns_read_totalHistogramHead/Query peerHistogram of column read
hdx_partition_block_decode_secondsHistogramHead/Query peerHistogram of time spent decoding hdx blocks in seconds
hdx_partition_open_secondsHistogramHead/Query peerHistogram of time spent opening hdx partition in seconds
hdx_partition_read_secondsHistogramHead/Query peerHistogram of time spent reading hdx partition in seconds
hdx_partition_skipped_totalHistogramHead/Query peerHistogram of partition skip count due to no matching columns
hdx_partition_blocks_read_totalHistogramHead/Query peerHistogram of partition read count
hdx_partition_blocks_avail_totalHistogramHead/Query peerHistogram of partition blocks available
hdx_partition_index_decisionHistogramHead/Query peerHistogram of partition decision if bucket = 0 fullscan, 1 partial scan and 2 no match
hdx_partition_index_lookup_secondsHistogramHead/Query peerHistogram of index lookup in seconds
hdx_partition_index_blocks_skipped_percentHistogramHead/Query peerHistogram of skipped index blocked in percentage
hdx_partition_index_blocks_skipped_totalHistogramHead/Query peerHistogram of skipped index blocked in total
hdx_partition_rd_w_err_totalHistogramHead/Query peerHistogram of errors if bucket = 0 read error, 1 written error and 3 error
query_iowait_secondsHistogramHead/Query peerHistogram query IO wait in seconds
query_cpuwait_secondsHistogramHead/Query peerHistogram query cpu wait in seconds
query_hdx_ch_conv_secondsHistogramHead/Query peerHistogram of time spent converting hdx blocks to clickhouse in seconds
query_healthHistogramHead/Query peerHistogram of query health if bucket = 0 initiated error, 1 succeeded and 2 error
query_peer_availabilityHistogramHead/Query peerHistogram of query peer availability if bucket = 0 primary_peer_available, 1 secondary_peer_available and 2 no_reachable_peers
query_attempts_totalHistogramHead/Query peerHistogram of query attempts total
query_response_secondsHistogramHead/Query peerHistogram of query response total in seconds
query_rows_read_totalHistogramHead/Query peerHistogram of query rows read total
query_read_bytesHistogramHead/Query peerHistogram of query read bytes total
query_rows_written_totalHistogramHead/Query peerHistogram of query rows written total

Batch metrics

These metrics track activity specific to batch ingestions.

MetricTypeComponentsPurpose
processed_countCounterBatch peerCount of items processed.
processed_failureCounterBatch peerCount of processing failures.
processing_duration_histoHistogramBatch peerHistogram of Batch processing durations in milliseconds.
processing_duration_summarySummaryBatch peerSummary of Batch processing durations in milliseconds.
rows_readCounterBatch peerCount of rows read.

Merge metrics

These metrics correspond to Hydrolix's merge service.

MetricTypeComponentsPurpose
merge_duration_summarySummaryMerge peerMerge processing duration, in milliseconds.
merge_duration_histoHistogramMerge peerMerge processing duration, in milliseconds.
merge_sdk_duration_summarySummaryMerge peerMerge SDK processing duration, in milliseconds.
merge_sdk_duration_histoHistogramMerge peerMerge SDK processing duration, in milliseconds.
merge_candidate_histoHistogramMerge peerPartitions per merge candidate.
merge_candidate_inactiveCounterMerge peerMerge candidates skipped due to an inactive partition within the candidate
merge_candidate_construction_summarySummaryMerge headTime spent building merge candidates, in milliseconds.
merge_queue_fullCounterMerge headTimes candidate generation was skipped due to a full queue
merge_successCounterMerge peerCount of merge successes.
merge_failureCounterMerge peerCount of merge successes.

Rabbit MQ

https://www.rabbitmq.com/prometheus.html

Streaming metrics

HTTP Stream Ingest

These metrics are specific to the use of streaming data sources.

MetricTypeComponentsPurpose
http_source_byte_countCounterStream headCount of bytes processed.
http_source_request_countCounterStream headCount of http requests.
http_source_request_duration_nsHistogramStream headA histogram of HTTP request durations in nanoseconds.
http_source_request_error_countCounterStream headCount of http request failures.
http_source_row_countCounterStream headCount of rows processed.
http_source_value_countCounterStream headCount of values processed.
kinesis_source_byte_countCounterStream peerCount of bytes read from Kinesis.
kinesis_source_checkpoint_countCounterStream peerCount of Kinesis checkpoint operations.
kinesis_source_checkpoint_duration_nsHistogramStream peerDuration of Kinesis checkpoint operations.
kinesis_source_checkpoint_error_countCounterStream peerCount of errors in Kinesis checkpoint operations.
kinesis_source_error_countCounterStream peerCount of errors in Kinesis source reads.
kinesis_source_lag_msGaugeStream peerMeasure of lag in Kinesis source.
kinesis_source_operation_countCounterStream peerCount of operations on Kinesis.
kinesis_source_operation_duration_nsHistogramStream peerHistogram of duration of operations on Kinesis.
kinesis_source_record_countCounterStream peerCount of records read from Kinesis.
kinesis_source_row_countCounterStream peerCount of rows read from Kinesis.
kinesis_source_value_countCounterStream peerCount of values read from Kinesis.

Redpanda

https://docs.redpanda.com/docs/cluster-administration/monitoring/

Kafka Ingest

These metrics are specific to the use of Kafka data sources.

MetricTypeComponentsPurpose
kafka_source_byte_countCounterStream peerCount of bytes read from Kafka.
kafka_source_commit_duration_nsHistogramStream peerKafka commit duration.
kafka_source_read_countCounterStream peerCount of Kafka reads.
kafka_source_read_duration_nsHistogramStream peerKafka read duration.
kafka_source_read_error_countCounterStream peerCount of Kafka errors.
kafka_source_row_countCounterStream peerCount of rows processed.
kafka_source_value_countCounterStream peerCount of values processed.

Go environment metrics

These metrics track resources used by Hydrolix's Go environments.

MetricTypeComponentsPurpose
go_gc_duration_secondsSummaryBatch peer, Stream head, Stream peerA summary of the pause duration of garbage collection cycles.
go_goroutinesGaugeBatch peer, Stream head, Stream peerNumber of goroutines that currently exist.
go_infoGaugeBatch peer, Stream head, Stream peerInformation about the Go environment.
go_memstats_alloc_bytesGaugeBatch peer, Stream head, Stream peerNumber of bytes allocated and still in use.
go_memstats_alloc_bytes_totalCounterBatch peer, Stream head, Stream peerTotal number of bytes allocated, even if freed.
go_memstats_buck_hash_sys_bytesGaugeBatch peer, Stream head, Stream peerNumber of bytes used by the profiling bucket hash table.
go_memstats_frees_totalCounterBatch peer, Stream head, Stream peerTotal number of frees.
go_memstats_gc_cpu_fractionGaugeBatch peer, Stream head, Stream peerThe fraction of this program's available CPU time used by the GC since the program started.
go_memstats_gc_sys_bytesGaugeBatch peer, Stream head, Stream peerNumber of bytes used for garbage collection system metadata.
go_memstats_heap_alloc_bytesGaugeBatch peer, Stream head, Stream peerNumber of heap bytes allocated and still in use.
go_memstats_heap_idle_bytesGaugeBatch peer, Stream head, Stream peerNumber of heap bytes waiting to be used.
go_memstats_heap_inuse_bytesGaugeBatch peer, Stream head, Stream peerNumber of heap bytes that are in use.
go_memstats_heap_objectsGaugeBatch peer, Stream head, Stream peerNumber of allocated objects.
go_memstats_heap_released_bytesGaugeBatch peer, Stream head, Stream peerNumber of heap bytes released to OS.
go_memstats_heap_sys_bytesGaugeBatch peer, Stream head, Stream peerNumber of heap bytes obtained from system.
go_memstats_last_gc_time_secondsGaugeBatch peer, Stream head, Stream peerNumber of seconds since 1970 of last garbage collection.
go_memstats_lookups_totalCounterBatch peer, Stream head, Stream peerTotal number of pointer lookups.
go_memstats_mallocs_totalCounterBatch peer, Stream head, Stream peerTotal number of mallocs.
go_memstats_mcache_inuse_bytesGaugeBatch peer, Stream head, Stream peerNumber of bytes in use by mcache structures.
go_memstats_mcache_sys_bytesGaugeBatch peer, Stream head, Stream peerNumber of bytes used for mcache structures obtained from system.
go_memstats_mspan_inuse_bytesGaugeBatch peer, Stream head, Stream peerNumber of bytes in use by mspan structures.
go_memstats_mspan_sys_bytesGaugeBatch peer, Stream head, Stream peerNumber of bytes used for mspan structures obtained from system.
go_memstats_next_gc_bytesGaugeBatch peer, Stream head, Stream peerNumber of heap bytes when next garbage collection will take place.
go_memstats_other_sys_bytesGaugeBatch peer, Stream head, Stream peerNumber of bytes used for other system allocations.
go_memstats_stack_inuse_bytesGaugeBatch peer, Stream head, Stream peerNumber of bytes in use by the stack allocator.
go_memstats_stack_sys_bytesGaugeBatch peer, Stream head, Stream peerNumber of bytes obtained from system for stack allocator.
go_memstats_sys_bytesGaugeBatch peer, Stream head, Stream peerNumber of bytes obtained from system.
go_threadsGaugeBatch peer, Stream head, Stream peerNumber of OS threads created.