Stream Metrics
The Hydrolix stack includes Prometheus, an open-source metrics database. Hydrolix outputs metrics to the included Prometheus instance.
UI
Prometheus provides a web-based UI that displays graphs of metrics. You can find the UI at https://<hostname>/prometheus
.
This provides graphed results.
For more information about metric types, refer to the Prometheus documentation.
Metrics
If more than one component uses a given metric, querying it returns results from all relevant components. You can restrict results to a specific component by adding a service
keyword to your query, e.g. process_open_fds{service="stream-peer"}
.
Each of the Ingest method peers
has multiple containers. One container performs message acquisition. The other container, known as the indexer, completes indexing and enrichment jobs.
HTTP Stream Ingest
These metrics are specific to the use of streaming data sources.
Traefik
The Traefik HTTP routing service produces the following metrics:
Metric | Type | Components | Purpose |
---|---|---|---|
traefik_service_requests_total | Counter | Traefik | HTTP Traefik request information. |
traefik_service_request_duration_seconds_count/sum/bucket | Counter | Traefik | Response time of traefik to client. |
http_source_request_duration_ns_count/sum/bucket | Counter | Traefik | Response time from Stream-Head. |
Intake-Head
The Intake-Heads are an all-in-one replacement for the older Stream-Head/Stream-Peer architecture.
Metric | Type | Components | Purpose |
---|---|---|---|
hdx_sink_backlog_bytes_count | Gauge | Intake head | Total bytes of all partition buckets in sink backlog waiting to be indexed. Only produced when intake_head_index_backlog_enabled is true . |
hdx_sink_backlog_items_count | Gauge | Intake head | Total count of partition buckets in sink backlog waiting to be indexed. Only produced when intake_head_index_backlog_enabled is true . |
hdx_sink_backlog_dropped_bytes_count | Counter | Intake head | Total bytes of partition buckets dropped due to backlog growing too big. Only produced when intake_head_index_backlog_enabled is true . |
hdx_sink_backlog_dropped_items_count | Counter | Intake head | Count of partition buckets dropped due to backlog growing too big. Only produced when intake_head_index_backlog_enabled is true . |
hdx_sink_backlog_delivery_count | Counter | Intake head | Count of backlog buckets successfully handed off to indexing. Only produced when intake_head_index_backlog_enabled is true . |
hdx_sink_backlog_trim_duration_ns | Histogram | Intake head | Time to trim the backlog in nanoseconds. Only produced when intake_head_index_backlog_enabled is true . |
http_source_outstanding_reqs | Gauge | Intake head | Number of outstanding ingest event requests. |
Stream-Head
Stream-Heads, which coordinate streaming jobs, produce the following metrics:
Metric | Type | Components | Purpose |
---|---|---|---|
http_source_byte_count | Counter | Stream head | Count of bytes processed. |
http_source_request_count | Counter | Stream head | Count of http requests. |
http_source_request_duration_ns_count/bucket/sum | Histogram | Stream head | A histogram of HTTP request process duration in nanoseconds. Time is measured between last byte of the message received and placement of the message (or last part of the message) onto the queue. |
http_source_request_error_count | Counter | Stream head | Count of http request failures. |
http_source_row_count | Counter | Stream head | Count of rows processed. |
http_source_value_count | Counter | Stream head | Count of values processed. |
RedPanda
Internal RedPanda queues produce the following metrics:
Metric | Type | Components | Purpose |
---|---|---|---|
internal_event_queue_byte_count{mode="sink"} | Counter | Stream Head | Row Count sent to RedPanda |
internal_event_queue_row_count{mode="sink""} | Counter | Stream Head | Row Count sent to RedPanda |
internal_event_queue_value_count{mode="sink""} | Counter | Stream Head | Row Count sent to RedPanda |
internal_event_queue_row_count{mode="source"} | Counter | Stream Peer | Row Count received from RedPanda |
internal_event_queue_value_count{mode="source"} | Counter | Stream Peer | Value Count recieved from RedPanda |
Stream-Peer
Stream-Peers, which carry out streaming jobs, produce the following metrics:
Metric | Type | Components | Purpose |
---|---|---|---|
query_count | Counter | Stream Peer | Count of calls to the Catalog. |
query_failure | Counter | Stream Peer | Count of failed Catalog calls. |
query_latency_summary | Counter | Stream Peer | Latency in calls to catalog. |
query_latency_summary_count/sum | Count/Sum | Stream Peer | Latency in calls to catalog. |
Additionally, Stream ingest produces Stream-Peer Indexer metrics.
Kafka Ingest
Kafka data sources produce the following metrics:
Metric | Type | Components | Purpose |
---|---|---|---|
kafka_source_byte_count | Counter | Kafka peer | Count of bytes read from Kafka. |
kafka_source_commit_duration_ns_count/bucket/sum | Histogram | Kafka peer | Kafka commit duration. |
kafka_source_read_count | Counter | Kafka peer | Count of Kafka reads. |
kafka_source_read_duration_ns_count/bucket/sum | Histogram | Kafka peer | Kafka read duration. |
kafka_source_read_error_count | Counter | Kafka peer | Count of Kafka errors. |
kafka_source_row_count | Counter | Kafka peer | Count of rows processed. |
kafka_source_value_count | Counter | Kafka peer | Count of values processed. |
query_count | Counter | Kafka Peer | Count of calls to the Catalog. |
query_failure | Counter | Kafka Peer | Count of failed Catalog calls. |
query_latency_summary | Counter | Kafka Peer | Latency in calls to catalog. |
query_latency_summary_count/sum | Count/Sum | Kafka Peer | Latency in calls to catalog. |
Additionally, Kafka ingest produces Kafka-Peer Indexer metrics.
Kinesis Ingest
Kinesis data sources produce the following metrics:
Metric | Type | Components | Purpose |
---|---|---|---|
kinesis_source_byte_count | Counter | Kinesis peer | Count of bytes read from Kinesis. |
kinesis_source_checkpoint_count | Counter | Kinesis peer | Count of Kinesis checkpoint operations. |
kinesis_source_checkpoint_duration_ns_count/bucket/sum | Histogram | Kinesis peer | Duration of Kinesis checkpoint operations. |
kinesis_source_lag_ms | Gauge | Kinesis peer | Measure of lag in Kinesis source. |
kinesis_source_operation_count | Counter | Kinesis peer | Count of operations on Kinesis. |
kinesis_source_operation_duration_ns_count/bucket/sum | Histogram | Kinesis peer | Histogram of duration of operations on Kinesis. |
kinesis_source_record_count | Counter | Kinesis peer | Count of records read from Kinesis. |
kinesis_source_row_count | Counter | Kinesis peer | Count of rows read from Kinesis. |
kinesis_source_value_count | Counter | Kinesis peer | Count of values read from Kinesis. |
query_count | Counter | Kinesis Peer | Count of calls to the Catalog. |
query_failure | Counter | Kinesis Peer | Count of failed Catalog calls. |
query_latency_summary | Counter | Kinesis Peer | Latency in calls to catalog. |
query_latency_summary_count/sum | Count/Sum | Kinesis Peer | Latency in calls to catalog. |
Additionally, Kinesis ingest produces Kinesis-Peer Indexer metrics.
Indexer Metrics
Indexer metrics cover the indexing and enrichment of data being ingested. All of the above peer components produce these metrics. The indexer produces the following metrics:
Metric | Type | Components | Purpose |
---|---|---|---|
indexer_rows_written_count/bucket/sum | Histogram | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP | Total rows indexed (written to partitions) |
indexer_bytes_written_count/bucket/sum | Histogram | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP | Total bytes indexed (written to partitions) |
indexer_partitions_rejected_count/bucket/sum | Histogram | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP | Histogram of partitions not able to written. If value is 0 - query_parse_error 1 - network_error, 2 - internal_system_error, 3 - schema_mismatch, 4 - partition_files_write_failed, 5 - block_conversion_or_insertion_failed, 6 - other_internal_error |
indexer_partitions_written_count/bucket/sum | Histogram | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP | Total partitions created |
indexer_partition_write_seconds_count/bucket/sum | Histogram | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP | Count of rows processed by the indexer and uploaded to storage. Includes Hot and Cold reporting. |
hdx_indexer_req_duration_ns | Histogram | Intake Head | A histogram of durations of requests to indexer |
hdx_indexer_req_errors | Counter | Intake Head | Count of errors for requests to indexer service |
hdx_sink_bucket_maint_duration_ns | Summary | Intake Head | Summary of the bucket maintenance loop execution time. |
hdx_sink_bucket_seal_files | Histogram | Intake Head | A histogram of the number of files in buckets when sealed. |
hdx_sink_byte_count | Counter | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP | Count of bytes processed by the indexer and uploaded to storage. Includes Hot and Cold reporting. |
hdx_sink_row_count | Counter | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP | Count of rows processed by the indexer and uploaded to storage. Includes Hot and Cold reporting. |
hdx_sink_value_count | Counter | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP | Count of values processed by the indexer and uploaded to storage. Includes Hot and Cold reporting. |
hdx_sink_error_count | Counter | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP | Count of errors in indexing and uploading to storage. |
hdx_sink_open_bucket_slots | Counter | Intake Head | Measure of open bucket slots in the sink. |
hdx_sink_partition_rows_summary | Summary | Intake Head | Summary of number of rows in created partitions. |
hdx_upload_obj_store_duration_ns | Summary | Intake Head | Summary of durations for the uploading partition index files to object storage in nanoseconds. |
hdx_upload_obj_store_errors | Counter | Intake Head | Count of errors uploading partition index files to storage. |
hdx_upload_process_write_result_duration_ns | Summary | Intake Head | Summary of durations for processing of results in nanoseconds. |
Storage
Cloud/Object Storage metrics.
Each of the below object_store*
metrics has these labels:
- provider - Object storage provider (AWS, Azure, GCS)
- code - HTTP Response code
- method - HTTP Method used (POST, GET, etc)
- host - HTTP Host used to target object storage.
Metric | Type | Components | Purpose |
---|---|---|---|
net_http_status_code_bucket | Counter | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Intake Head | HTTP Status Code histogram count from Storage. |
object_store_http_histo | Histogram | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Intake Head | A histogram of object storage interaction latencies |
object_store_http_summary | Summary | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Intake Head | A summary of object storage interaction latencies |
object_store_http_status_code_count | Count | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Intake Head | A count of successful HTTP requests against object storage (replaces net_http_status_code_count). Requests resulting in 500 are still considered successful. |
object_store_http_error_count | Count | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Intake Head | A count of HTTP request errors (timeouts, connection errors, etc.) |
object_store_http_bytes_tx | Count | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Intake Head | A count of bytes transmitted to object storage (request body only) |
object_store_http_bytes_rx | Count | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Intake Head | A count of bytes received from object storage (response body only) |
Additional Metrics
The following metrics track the management and control of different components.
General Metrics.
The following metrics track various counters and statistics during data ingestion:
Metric | Type | Components | Purpose |
---|---|---|---|
up | Counter | All components | How many pods are up for each component |
process_cpu_seconds_total | Counter | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Total user and system CPU time spent in seconds. |
process_max_fds | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Maximum number of open file descriptors. |
process_open_fds | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Number of open file descriptors. |
process_resident_memory_bytes | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Resident memory size in bytes. |
process_start_time_seconds | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Start time of the process since unix epoch in seconds. |
process_virtual_memory_bytes | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Virtual memory size in bytes. |
process_virtual_memory_max_bytes | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Maximum amount of virtual memory available in bytes. |
promhttp_metric_handler_requests_in_flight | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Current number of scrapes being served. |
promhttp_metric_handler_requests_total | Counter | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Total number of scrapes by HTTP status code. |
Go Environment Metrics
The following metrics track resources used by Hydrolix's Go environment:
Metric | Type | Components | Purpose | |
---|---|---|---|---|
go_gc_duration_seconds | Summary | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | A summary of the pause duration of garbage collection cycles. | |
go_goroutines | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Number of goroutines that currently exist. | |
go_info | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Information about the Go environment. | |
go_memstats_alloc_bytes | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Number of bytes allocated and still in use. | |
go_memstats_alloc_bytes_total | Counter | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Total number of bytes allocated, even if freed. | |
go_memstats_buck_hash_sys_bytes | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Number of bytes used by the profiling bucket hash table. | |
go_memstats_frees_total | Counter | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Total number of frees. | |
go_memstats_gc_cpu_fraction | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | The fraction of this program's available CPU time used by the GC since the program started. | |
go_memstats_gc_sys_bytes | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Number of bytes used for garbage collection system metadata. | |
go_memstats_heap_alloc_bytes | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Number of heap bytes allocated and still in use. | |
go_memstats_heap_idle_bytes | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Number of heap bytes waiting to be used. | |
go_memstats_heap_inuse_bytes | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Number of heap bytes that are in use. | |
go_memstats_heap_objects | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Number of allocated objects. | |
go_memstats_heap_released_bytes | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Number of heap bytes released to OS. | |
go_memstats_heap_sys_bytes | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Number of heap bytes obtained from system. | |
go_memstats_last_gc_time_seconds | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Number of seconds since 1970 of last garbage collection. | |
go_memstats_lookups_total | Counter | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Total number of pointer lookups. | |
go_memstats_mallocs_total | Counter | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Total number of mallocs. | |
go_memstats_mcache_inuse_bytes | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Number of bytes in use by mcache structures. | |
go_memstats_mcache_sys_bytes | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Number of bytes used for mcache structures obtained from system. | |
go_memstats_mspan_inuse_bytes | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Number of bytes in use by mspan structures. | |
go_memstats_mspan_sys_bytes | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Number of bytes used for mspan structures obtained from system. | |
go_memstats_next_gc_bytes | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Number of heap bytes when next garbage collection will take place. | |
go_memstats_other_sys_bytes | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Number of bytes used for other system allocations. | |
go_memstats_stack_inuse_bytes | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Number of bytes in use by the stack allocator. | |
go_memstats_stack_sys_bytes | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Number of bytes obtained from system for stack allocator. | |
go_memstats_sys_bytes | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Number of bytes obtained from system. | |
go_threads | Gauge | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Traefik | Number of OS threads created. |
Updated about 2 months ago