Batch Metrics
The Hydrolix stack includes Prometheus, an open-source metrics database. While the Stack runs, Hydrolix continuously updates its Prometheus instance with metrics information.
Using Prometheus directly
Prometheus has its own web-based UI, available by visiting https://<YourHostname/prometheus
in your web browser.
This view is a basic metric view, suitable for quickly entering queries and seeing simple, graphed results. Hydrolix does make this feature available immediately, without any additional setup.
For more information about metric types, refer to Prometheus's documentation.
Hydrolix's metrics
If more than one component uses a given metric, then querying it will return results from all relevant components. You can restrict results to a specific component by adding a service
keyword to your query, e.g. "process_open_fds{service="stream-peer"}
".
For more information about metric types, refer to Prometheus's documentation.
Each of the Ingest method peers
has multiple containers. One container that will do message acquisition and the other is the indexer
which will index and complete enrichment jobs.
Batch metrics
These metrics track activity specific to batch ingestions.
Metric | Type | Purpose |
---|---|---|
processed_count | Counter | Count of items processed. |
processed_failure | Counter | Count of processing failures. |
processing_duration_histo_count/bucket/sum | Histogram | Histogram of Batch processing durations in milliseconds. |
processing_duration_summary | Summary | Summary of Batch processing durations in milliseconds. |
processing_duration_summary_count/sum | Summary | Summary of Batch processing durations in milliseconds. |
query_count | Counter | Count of calls to the Catalog. |
query_failure | Counter | Count of failed Catalog calls. |
query_latency_summary | Counter | Latency in calls to catalog. |
query_latency_summary_count/sum | Counter | Latency in calls to catalog. |
In addition to Stream-Peer metrics Indexer metrics should also be reviewed (below).
Indexer Metrics
Indexer metrics are metrics that cover the indexing and enrichment of data being ingested. These are available across all of the above peer components.
Metric | Type | Components | Purpose |
---|---|---|---|
indexer_rows_written_count/bucket/sum | Histogram | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP | Total rows indexed (written to partitions) |
indexer_bytes_written_count/bucket/sum | Histogram | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP | Total bytes indexed (written to partitions) |
indexer_partitions_rejected_count/bucket/sum | Histogram | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP | Histogram of partitions not able to written. If value is 0 - query_parse_error 1 - network_error, 2 - internal_system_error, 3 - schema_mismatch, 4 - partition_files_write_failed, 5 - block_conversion_or_insertion_failed, 6 - other_internal_error |
indexer_partitions_written_count/bucket/sum | Histogram | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP | Total partitions created |
hdx_sink_row_count | Counter | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP | Count of rows processed by the indexer and uploaded to storage. Includes Hot and Cold reporting. |
hdx_sink_byte_count | Counter | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP | Count of bytes processed by the indexer and uploaded to storage. Includes Hot and Cold reporting. |
hdx_sink_value_count | Counter | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP | Count of values processed by the indexer and uploaded to storage. Includes Hot and Cold reporting. |
hdx_sink_error_count | Counter | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP | Count of errors in indexing and uploading to storage. |
RabbitMQ
The following are suggested metrics to monitor your RabbitMQ service
Metric | Type | Purpose |
---|---|---|
rabbitmq_queue_messages | Counter | Sum of ready and unacknowledged messages (queue depth). |
rabbitmq_queues | Counter | RabbitMQ Queues count. |
erlang_vm_statistics_bytes_received_total | Counter | The total number of bytes received through RabbitMQ ports. |
erlang_vm_statistics_bytes_output_total | Counter | The total number of bytes output through RabbitMQ ports. |
Additional Metrics on can be found on the RabbitMQ Site
Storage
Cloud/Object Storage metrics.
Each of the below object_store*
metrics has these labels:
- provider - Object storage provider (AWS, Azure, GCS)
- code - HTTP Response code
- method - HTTP Method used (POST, GET, etc)
- host - HTTP Host used to target object storage.
Metric | Type | Components | Purpose |
---|---|---|---|
net_http_status_code_bucket | Counter | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Intake Head | HTTP Status Code histogram count from Storage. |
object_store_http_histo | Histogram | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Intake Head | A histogram of object storage interaction latencies |
object_store_http_summary | Summary | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Intake Head | A summary of object storage interaction latencies |
object_store_http_status_code_count | Count | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Intake Head | A count of successful HTTP requests against object storage (replaces net_http_status_code_count). Requests resulting in 500 are still considered successful. |
object_store_http_error_count | Count | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Intake Head | A count of HTTP request errors (timeouts, connection errors, etc.) |
object_store_http_bytes_tx | Count | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Intake Head | A count of bytes transmitted to object storage (request body only) |
object_store_http_bytes_rx | Count | Batch (inc. Autoingest), Kafka, Kinesis, Stream HTTP, Intake Head | A count of bytes received from object storage (response body only) |
Additional Metrics
Additional Metrics are provided for the management and control of different components.
General Metrics.
These metrics track various counters and statistics regarding data ingestion.
Metric | Type | Purpose |
---|---|---|
process_cpu_seconds_total | Counter | Total user and system CPU time spent in seconds. |
process_max_fds | Gauge | Maximum number of open file descriptors. |
process_open_fds | Gauge | Number of open file descriptors. |
process_resident_memory_bytes | Gauge | Resident memory size in bytes. |
process_start_time_seconds | Gauge | Start time of the process since unix epoch in seconds. |
process_virtual_memory_bytes | Gauge | Virtual memory size in bytes. |
process_virtual_memory_max_bytes | Gauge | Maximum amount of virtual memory available in bytes. |
promhttp_metric_handler_requests_in_flight | Gauge | Current number of scrapes being served. |
promhttp_metric_handler_requests_total | Counter | Total number of scrapes by HTTP status code. |
Go environment metrics
These metrics track resources used by Hydrolix's Go environments.
Metric | Type | Purpose | |
---|---|---|---|
go_gc_duration_seconds | Summary | A summary of the pause duration of garbage collection cycles. | |
go_goroutines | Gauge | Number of goroutines that currently exist. | |
go_info | Gauge | Information about the Go environment. | |
go_memstats_alloc_bytes | Gauge | Number of bytes allocated and still in use. | |
go_memstats_alloc_bytes_total | Counter | Total number of bytes allocated, even if freed. | |
go_memstats_buck_hash_sys_bytes | Gauge | Number of bytes used by the profiling bucket hash table. | |
go_memstats_frees_total | Counter | Total number of frees. | |
go_memstats_gc_cpu_fraction | Gauge | The fraction of this program's available CPU time used by the GC since the program started. | |
go_memstats_gc_sys_bytes | Gauge | Number of bytes used for garbage collection system metadata. | |
go_memstats_heap_alloc_bytes | Gauge | Number of heap bytes allocated and still in use. | |
go_memstats_heap_idle_bytes | Gauge | Number of heap bytes waiting to be used. | |
go_memstats_heap_inuse_bytes | Gauge | Number of heap bytes that are in use. | |
go_memstats_heap_objects | Gauge | Number of allocated objects. | |
go_memstats_heap_released_bytes | Gauge | Number of heap bytes released to OS. | |
go_memstats_heap_sys_bytes | Gauge | Number of heap bytes obtained from system. | |
go_memstats_last_gc_time_seconds | Gauge | Number of seconds since 1970 of last garbage collection. | |
go_memstats_lookups_total | Counter | Total number of pointer lookups. | |
go_memstats_mallocs_total | Counter | Total number of mallocs. | |
go_memstats_mcache_inuse_bytes | Gauge | Number of bytes in use by mcache structures. | |
go_memstats_mcache_sys_bytes | Gauge | Number of bytes used for mcache structures obtained from system. | |
go_memstats_mspan_inuse_bytes | Gauge | Number of bytes in use by mspan structures. | |
go_memstats_mspan_sys_bytes | Gauge | Number of bytes used for mspan structures obtained from system. | |
go_memstats_next_gc_bytes | Gauge | Number of heap bytes when next garbage collection will take place. | |
go_memstats_other_sys_bytes | Gauge | Number of bytes used for other system allocations. | |
go_memstats_stack_inuse_bytes | Gauge | Number of bytes in use by the stack allocator. | |
go_memstats_stack_sys_bytes | Gauge | Number of bytes obtained from system for stack allocator. | |
go_memstats_sys_bytes | Gauge | Number of bytes obtained from system. | |
go_threads | Gauge | Number of OS threads created. |
Updated about 2 months ago