Enable an Independent Prometheus Operator
Use Prometheus operators instead of Hydrolix Prometheus
Overview
The functionality to use a Prometheus operator other than the version included in Hydrolix was introduced in version 5.3.
Hydrolix provides a version of Prometheus in its default installation. To bypass, disable, or use a different Prometheus operator, use Hydrolix tunables to configure the hydrolixcluster.yaml
file.
Hydrolix only supports one Prometheus instance running in a cluster. Enable either the Hydrolix built-in, or the external Prometheus version, not both.
Configure an external Prometheus operator in a cluster
Disable the Hydrolix-provided Prometheus service and provide your own Prometheus operator in the Kubernetes cluster.
The external Prometheus instances rely on ServiceMonitor labels to find metrics in the Hydrolix cluster. The ServiceMonitor
resource runs in the Hydrolix namespace.
Prerequisites
- A Prometheus operator, configured for your needs
- Permissions to edit the
hydrolixcluster.yaml
file
Disable the built-in Hydrolix Prometheus
- Edit the
hydrolixcluster.yaml
file. - In the
spec:
section, add this line:
prometheus_enabled: false
Configure the external operator
- Edit the
hydrolixcluster.yaml
file. - Configure
ServiceMonitor
labels.- By default, the Prometheus operator looks for
ServiceMonitors
with therelease: kube-prometheus
label. - If your Prometheus stack uses a different label, specify it in the
prometheus_servicemonitor_selector
tunable.
- By default, the Prometheus operator looks for
- Specify the namespace, service name, and port for your external operator.
- Use this example to set the configuration:
spec:
tunables:
prometheus_operator_installed: true
prometheus_namespace: <YOUR_PROMETHEUS_NAMESPACE>
prometheus_service_name: <YOUR_PROMETHEUS_SERVICE_NAME>
prometheus_service_port: 9090
prometheus_servicemonitor_selector:
- release: <YOUR_PROMETHEUS_RELEASE_LABEL>
- Edit the configuration file for the external Prometheus resource configuration. Hydrolix forwards incoming requests to URLs ending in
/prometheus
.
prometheus:
prometheusSpec:
externalUrl: /prometheus
- (Optional) If your scrape targets don't send the
Content-Type
header correctly, add this line to the Prometheus configuration to add fallback support:
scrapeClasses:
- fallbackScrapeProtocol: PrometheusText0.0.4
name: legacy-exporters
- Apply the changes to
hydrolixcluster.yaml
and restart the Hydrolix deployment to verify that the external Prometheus operator discovers the metrics usingServiceMonitors
.
Test the external operator
To verify that the external operator is working, check the following metrics charts for data:
- Events per Second
- Queries per Second
Use the /prometheus/query
endpoint for the external Prometheus resource to confirm data availability.
Monitor for gaps in historical data that may occur due to Prometheus instance time-series database (TSDB) differences.
Revert to built-in Prometheus in Hydrolix
- Edit
hydrolixcluster.yaml
. - Remove the line, or set the external Prometheus tunable to false:
prometheus_operator_installed: false
- Re-enable built-in Prometheus:
prometheus_enabled: true
- (Optional) Remove any additional lines used to configure an external Prometheus resource like the following:
prometheus_namespace: ""
prometheus_service_name: ""
prometheus_service_port: 9090
prometheus_servicemonitor_selector: []
- Apply the
hydrolixcluster.yaml
changes and restart if needed.
Verify that the Hydrolix operator redeploys the Prometheus pods and that they're in the Running
state.
Considerations when reverting to Hydrolix Prometheus
- Hydrolix uses its own built-in Prometheus to scrape data
- If any resources were changed previously to configure external Prometheus, ensure they're re-created or restored.
- Check the Events per Second and Queries per Second charts to verify that new data is flowing in.
- Historical data from the external Prometheus instance won't be migrated, as Prometheus TSDBs are separate.
- Be sure that the
hydrolixcluster.yaml
file doesn't contain both theprometheus_operator_installed: true
andprometheus_enabled: true
tunables at the same time. It can cause conflicts, including metrics duplication. - There may be gaps in the metrics if there was downtime between using the external Prometheus resource and reverting to the built-in version.
Updated 18 days ago