Enable an Independent Prometheus Operator

Use Prometheus operators instead of Hydrolix Prometheus

Overview

The functionality to use a Prometheus operator other than the version included in Hydrolix was introduced in version 5.3.

Hydrolix provides a version of Prometheus in its default installation. To bypass, disable, or use a different Prometheus operator, use Hydrolix tunables to configure the hydrolixcluster.yaml file.

📘

Hydrolix only supports one Prometheus instance running in a cluster. Enable either the Hydrolix built-in, or the external Prometheus version, not both.

Configure an external Prometheus operator in a cluster

Disable the Hydrolix-provided Prometheus service and provide your own Prometheus operator in the Kubernetes cluster.

The external Prometheus instances rely on ServiceMonitor labels to find metrics in the Hydrolix cluster. The ServiceMonitor resource runs in the Hydrolix namespace.

Prerequisites

  • A Prometheus operator, configured for your needs
  • Permissions to edit the hydrolixcluster.yaml file

Disable the built-in Hydrolix Prometheus

  1. Edit the hydrolixcluster.yaml file.
  2. In the spec: section, add this line:
prometheus_enabled: false

Configure the external operator

  1. Edit the hydrolixcluster.yaml file.
  2. Configure ServiceMonitor labels.
    1. By default, the Prometheus operator looks for ServiceMonitors with the release: kube-prometheus label.
    2. If your Prometheus stack uses a different label, specify it in the prometheus_servicemonitor_selector tunable.
  3. Specify the namespace, service name, and port for your external operator.
  4. Use this example to set the configuration:
spec:
  tunables:
    prometheus_operator_installed: true
    prometheus_namespace: <YOUR_PROMETHEUS_NAMESPACE>
    prometheus_service_name: <YOUR_PROMETHEUS_SERVICE_NAME>
    prometheus_service_port: 9090
    prometheus_servicemonitor_selector:
      - release: <YOUR_PROMETHEUS_RELEASE_LABEL>
  1. Edit the configuration file for the external Prometheus resource configuration. Hydrolix forwards incoming requests to URLs ending in /prometheus.
prometheus:
  prometheusSpec:
    externalUrl: /prometheus
  1. (Optional) If your scrape targets don't send the Content-Type header correctly, add this line to the Prometheus configuration to add fallback support:
scrapeClasses:
  - fallbackScrapeProtocol: PrometheusText0.0.4
    name: legacy-exporters
  1. Apply the changes to hydrolixcluster.yaml and restart the Hydrolix deployment to verify that the external Prometheus operator discovers the metrics using ServiceMonitors.

Test the external operator

To verify that the external operator is working, check the following metrics charts for data:

  • Events per Second
  • Queries per Second

Use the /prometheus/query endpoint for the external Prometheus resource to confirm data availability.

Monitor for gaps in historical data that may occur due to Prometheus instance time-series database (TSDB) differences.

Revert to built-in Prometheus in Hydrolix

  1. Edit hydrolixcluster.yaml.
  2. Remove the line, or set the external Prometheus tunable to false:
prometheus_operator_installed: false
  1. Re-enable built-in Prometheus:
prometheus_enabled: true
  1. (Optional) Remove any additional lines used to configure an external Prometheus resource like the following:
prometheus_namespace: ""
prometheus_service_name: ""
prometheus_service_port: 9090
prometheus_servicemonitor_selector: []
  1. Apply the hydrolixcluster.yaml changes and restart if needed.

Verify that the Hydrolix operator redeploys the Prometheus pods and that they're in the Running state.

Considerations when reverting to Hydrolix Prometheus

  • Hydrolix uses its own built-in Prometheus to scrape data
    • If any resources were changed previously to configure external Prometheus, ensure they're re-created or restored.
  • Check the Events per Second and Queries per Second charts to verify that new data is flowing in.
  • Historical data from the external Prometheus instance won't be migrated, as Prometheus TSDBs are separate.
  • Be sure that the hydrolixcluster.yaml file doesn't contain both the prometheus_operator_installed: true and prometheus_enabled: true tunables at the same time. It can cause conflicts, including metrics duplication.
  • There may be gaps in the metrics if there was downtime between using the external Prometheus resource and reverting to the built-in version.