Fluent Bit Integration

Fluent Bit is an open source telemetry agent which can collect, process, and forward metric, log, and trace telemetry data from a wide range of environments into Hydrolix. Fluent Bit integrates with existing ecosystems such as Prometheus and OpenTelemetry and is designed with minimal performance impact in mind. You can use it as a simple edge agent for a single deployment or within a more complex environment as a central collector and decorator for telemetry data from varying sources and environments.

For more details, see the Fluent Bit documentation.

Before you begin

You will need a running Hydrolix deployment. Follow the instructions for your preferred cloud vendor if you have yet to deploy Hydrolix. From your running Hydrolix cluster, you will need the following information:

ItemDescriptionExample valueHow to obtain this information
Org IDThis is the ID of your Hydrolix organization.ID: bc3f041b-ea52-45e1-b9d5-41819c465854You can determine what orgs exist within your running Hydrolix cluster using the Hydrolix cluster API. The org ID you use should correspond to the table in which you want to store Fluent Bit data.
Project name and IDThis is a logical namespace for your table below. You will need the name of the project corresponding to the table in which you want to store Fluent Bit data.Name: fluentbit_project
ID: c2445da3-ec63-42be-9f12-f104f9656f4c
Follow these instructions to create a project.
Table name and IDThis is the destination to which you will route data from your Fluent Bit instance. You will need the name of the table you want to store Fluent Bit data in.Name: fluentbit_table
ID: 798dfcf4-7560-4b24-1337-29017619baa6
Follow these instructions to create a table.
OAuth bearer tokenUse an OAuth Bearer token for Fluent Bit to authenticate with the Hydrolix Streaming Ingest API.eyXrxkzoN2fRiiKpnV...Follow these instructions to generate an OAuth bearer token.

Getting started

There are four major steps covered within this document. Following these steps will result in an integrated setup between Fluent Bit and a running Hydrolix cluster:

  1. Deploy Fluent Bit
  2. Create a Hydrolix Transform for your Fluent Bit data
  3. Configure Fluent Bit to send data to Hydrolix
  4. Verify that data is flowing through Fluent Bit into your Hydrolix cluster

Example files

Throughout this document, there are example files for Fluent Bit configuration and a Hydrolix transform. If you are standing up a proof-of-concept with Fluent Bit and Hydrolix, you can use these files to generate and send data from Fluent Bit to Hydrolix. The example files use the following Fluent Bit inputs, filters, and outputs:

Inputs

Filters

Note that the Nest filter is used to nest the input records within a map key. This ensures Hydrolix will store each of the four input record types within four distinct columns in a single table.

Outputs

Fluent Bit must use the Hydrolix HTTP Stream API endpoint as the protocol and ingest endpoint for consuming data into your Hydrolix cluster.

Deploy Fluent Bit

There are multiple deployment methods for Fluent Bit described in Getting Started with Fluent Bit.

If you are getting a first time or proof-of-concept deployment started with Fluent Bit and Hydrolix, the Docker deployment is quick to get started. Additionally, it's recommended you deploy using the latest debug Docker image as the standard Fluent Bit images are Distroless and don't include a shell or package manager. The debug images, however, do. You can deploy the latest debug image with:

docker pull fluent/fluent-bit:latest-debug

followed by

docker run -ti cr.fluentbit.io/fluent/fluent-bit:latest-debug

Use the Fluent Bit release images for production deployments.

Create a Hydrolix transform for the Fluent Bit data

You will need to create a Hydrolix transform which determines how your Fluent Bit data will be mapped onto your Hydrolix table. After creating it, you will specify the transform name within the Fluent Bit configuration file.

Reference these instructions for creating and publishing a transform. The following example transform will map incoming Fluent Bit data from the example inputs. The transform also uses the Hydrolix auto values feature to generate an ingest timestamp (hdx_ingest_timestamp). This timestamp acts as the primary column for the table in which Fluent Bit data will be ingested.

A more useful primary timestamp might be an event timestamp decorated onto the telemetry data by Fluent Bit before it is forwarded to your Hydrolix cluster. Additionally, you could decorate the outgoing Fluent Bit data with the originating hostname. These two use-cases are not covered within this document.

📘

Creating transforms can be easier in the UI rather than via the API

If you create your transform in your Hydrolix cluster UI, you don't need to know the org, project, or table IDs. However, you will still need to supply the transform output columns.

[
  {
    "name": "fluentbit_transform",
    "description": "",
    "settings": {
      "is_default": true,
      "rate_limit": null,
      "sql_transform": null,
      "null_values": [],
      "sample_data": null,
      "output_columns": [
        {
          "name": "hdx_ingest_timestamp",
          "datatype": {
            "type": "datetime",
            "index": false,
            "primary": true,
            "format": "2006-01-02T15:04:05.999999Z",
            "resolution": "seconds",
            "default": null,
            "script": null,
            "source": {
              "from_automatic_value": "current_time"
            },
            "suppress": false
          }
        },
        {
          "name": "account_id",
          "datatype": {
            "type": "string",
            "index": true,
            "format": null,
            "resolution": "seconds",
            "default": null,
            "script": null,
            "source": null,
            "suppress": false
          }
        },
        {
          "name": "mem_stats",
          "datatype": {
            "type": "map",
            "index": true,
            "format": null,
            "resolution": "seconds",
            "default": null,
            "script": null,
            "elements": [
              {
                "type": "string",
                "index": true
              },
              {
                "type": "uint32",
                "index": true
              }
            ],
            "source": null,
            "suppress": false
          }
        },
        {
          "name": "swap_stats",
          "datatype": {
            "type": "map",
            "index": true,
            "format": null,
            "resolution": "seconds",
            "default": null,
            "script": null,
            "elements": [
              {
                "type": "string",
                "index": true
              },
              {
                "type": "uint32",
                "index": true
              }
            ],
            "source": null,
            "suppress": false
          }
        },
        {
          "name": "cpu_stats",
          "datatype": {
            "type": "map",
            "index": false,
            "format": null,
            "resolution": "seconds",
            "default": null,
            "script": null,
            "elements": [
              {
                "type": "string",
                "index": true
              },
              {
                "type": "double",
                "index": false
              }
            ],
            "source": null,
            "suppress": false
          }
        },
        {
          "name": "net_stats",
          "datatype": {
            "type": "map",
            "index": false,
            "format": null,
            "resolution": "seconds",
            "default": null,
            "script": null,
            "elements": [
              {
                "type": "string",
                "index": true
              },
              {
                "type": "double",
                "index": false
              }
            ],
            "source": null,
            "suppress": false
          }
        },
        {
          "name": "disk_stats",
          "datatype": {
            "type": "map",
            "index": false,
            "format": null,
            "resolution": "seconds",
            "default": null,
            "script": null,
            "elements": [
              {
                "type": "string",
                "index": true
              },
              {
                "type": "double",
                "index": false
              }
            ],
            "source": null,
            "suppress": false
          }
        }
      ],
      "compression": "",
      "wurfl": null,
      "format_details": {
        "flattening": {
          "depth": null,
          "active": false,
          "map_flattening_strategy": {
            "left": "",
            "right": ""
          },
          "slice_flattening_strategy": {
            "left": "",
            "right": ""
          }
        }
      }
    },
    "url": "https://{hdx-host}.hydrolix.live/config/v1/orgs/bc3f041b-ea52-45e1-b9d5-41819c465854/projects/c2445da3-ec63-42be-9f12-f104f9656f4c/tables/798dfcf4-7560-4b24-1337-29017619baa6/transforms/",
    "type": "json"
  }
]

Configure Fluent Bit to send data to Hydrolix

Fluent Bit support two formats for configuration files: YAML (fluent-bit.yaml) and classic mode (fluent-bit.conf). You can read more about the two formats in Configuring Fluent Bit. The default configuration file format and location provided in the docker images is fluent-bit/etc/fluent-bit.conf. However, below is example configuration in both formats. The following example configuration files use the aforementioned Fluent Bit inputs, filters, and outputs and have been tested using Fluent Bit version 3.2.7.

📘

Fluent Bit Configuration Changes Require a Restart

Don't forget to restart Fluent Bit after making changes to the configuration file. Fluent Bit doesn't dynamically read in configuration changes.

[INPUT]
    name cpu
    tag  cpu
    interval_sec 1

[INPUT]
    name mem
    tag  mem
    interval_sec 1

[INPUT]
    name netif
    tag netif
    interval_sec  1
    interface ens5

[INPUT]
    name disk
    tag disk
    interval_sec  1

[FILTER]
    Name nest
    Match mem
    Operation nest
    Wildcard Mem.*
    Nest_under mem_stats
    Remove_prefix Mem.

[FILTER]
    Name nest
    Match mem
    Operation nest
    Wildcard Swap.*
    Nest_under swap_stats
    Remove_prefix Swap.

[FILTER]
    Name nest
    Match cpu
    Operation nest
    Wildcard *
    Nest_under cpu_stats

[FILTER]
    Name nest
    Match netif
    Operation nest
    Wildcard *
    Nest_under net_stats

[FILTER]
    Name nest
    Match disk
    Operation nest
    Wildcard *
    Nest_under disk_stats

[OUTPUT]
    name stdout
    match *

[OUTPUT]
    Name  http
    Match *
    Host  {hdx-host}.hydrolix.live
    uri  /ingest/event
    Port  443
    tls  on
    header x-hdx-table fluentbit_project.fluentbit_table
    header x-hdx-transform fluentbit_transform
    header Authorization Bearer eyJhbGciOiJSUzI1N...
service:
    http_server: "on"
    Health_Check: "on"
    log_level: info

pipeline: 
    inputs:
        - name: cpu
          tag:  cpu
          interval_sec: 1
        - name: mem
          tag: mem
          interval_sec: 1
        - name: disk
          tag: disk
          interval_sec:  1
        - name: netif
          tag: netif
          interval_sec:  1
          interface: ens
    filters:
        - name: nest
          match: '*'
          operation: nest
          wildcard: Mem.*
          nest_under: mem_stats
          remove_prefix: Mem.5
        - name: nest
          match: mem
          operation: nest
          wildcard: Swap.*
          nest_under: swap_stats
          remove_prefix: Swap.
           Name nest
        - name: nest
          match: cpu
          operation: nest
          wildcard: *
          nest_under: cpu_stats
        - name: nest
          match: netif
          operation: nest
          wildcard: *
          nest_under: net_stats
        - name: nest
          match: disk
          operation: nest
          wildcard: *
          nest_under: disk_stats
    outputs:
        - name: stdout
          match: '*'
        - name: http
          match: '*'
          host: {hdx-host}.hydrolix.live
          port: 443
          URI: /ingest/event
          tls: on
          header: x-hdx-table fluentbit_project.fluentbit_table
          header: x-hdx-transform fluentbit_transform
          header: Authorization Bearer eyJhbGciOiJSUzI1NiIsIn...

Verify data from Fluent Bit is in your Hydrolix cluster

If your Fluent Bit deployment is successfully sending data to your Hydrolix cluster, you should see a line similar to the following in your Fluent Bit logs:

2025-02-25 11:53:25 [2025/02/25 19:53:25] [ info] [output:http:http.1] {hdx-host}.hydrolix.live:443, HTTP status=200
2025-02-25 11:53:25 {"code":200,"message":"success"}

Once Fluent Bit data is being successfully stored in your Hydrolix cluster, you can query the data using the UI, API, or via other query interfaces specified in Query Data. You can start with a query like the following to view your Fluent Bit data:

select *
from fluentbit_project.fluentbit_table
limit 10

Troubleshooting

Debug logging: Fluent Bit

For more log detail from Fluent Bit, you can enable debug logging with the following configuration:

[SERVICE]
    log_level    debug
service:
  log_level: info

Debug logging: Hydrolix

Within your hydrolixcluster.yaml Kubernetes configuration, change the following value to enable debug logging for all Hydrolix service components:

spec:
  log_level:
    '*': debug

You can read more about Hydrolix logging configuration here.

Querying Hydrolix cluster returns zero results

If querying your Hydrolix cluster for Fluent Bit data returns zero results, try the following investigative and troubleshooting steps:

High ingest latency

Within your Hydrolix cluster UI, navigate to the Data tab within the left-hand navigation bar. Then select your project and table name. If you observe high Ingest Latency such as the following:

this along with the Total Size of 0 rows both indicate that no data has made it into the table. An unusually high ingest latency indicates that data isn't making it from Fluent Bit into your Hydrolix cluster. The following troubleshooting steps will help identify the causes.

Authentication error

If you observe the following in your Fluent Bit logs:

2025-02-25 11:41:12 [2025/02/25 19:41:12] [error] [output:http:http.1] {hdx-host}.hydrolix.live:443, HTTP status=401
2025-02-25 11:41:12 401: Unauthorized
2025-02-25 11:41:12 [2025/02/25 19:41:12] [ warn] [output:http:http.1] could not flush records to {hdx-host}.hydrolix.live:443 (http_do=0), chunk will not be retried

this indicates an authentication error with your Hydrolix cluster. The Hydrolix HTTP streaming ingest API uses an OAuth Bearer token for authentication. You can retrieve a bearer token using these instructions.

Missing project, table, or transform

If you observe any of the following in your Fluent Bit logs:

2025-02-25 11:48:03 [2025/02/25 19:48:03] [error] [output:http:http.1] {hdx-host}.hydrolix.live:443, HTTP status=400
2025-02-25 11:48:03 {"code":400,"message":"no project ‘nonexistent_project’ found"}
2025-02-25 11:48:03 [2025/02/25 19:48:03] [ warn] [output:http:http.1] could not flush records to {hdx-host}.hydrolix.live:443 (http_do=0), chunk will not be retried
2025-02-25 11:47:23 [2025/02/25 19:47:23] [error] [output:http:http.1] {hdx-host}.hydrolix.live:443, HTTP status=400
2025-02-25 11:47:23 {"code":400,"message":"no table ‘nonexistent_table’ found"}
2025-02-25 11:47:23 [2025/02/25 19:47:23] [ warn] [output:http:http.1] could not flush records to {hdx-host}.hydrolix.live:443 (http_do=0), chunk will not be retried
2025-02-25 11:45:15 [2025/02/25 19:45:15] [error] [output:http:http.1] {hdx-host}.hydrolix.live:443, HTTP status=400
2025-02-25 11:45:15 {"code":400,"message":"unknown transform ‘nonexistent_transform’"}
2025-02-25 11:45:15 [2025/02/25 19:45:15] [ warn] [output:http:http.1] could not flush records to {hdx-host}.hydrolix.live:443 (http_do=0), chunk will not be retried

this indicates that the project, table, or transform specified in your Fluent Bit configuration doesn't exist. Make sure your Fluent Bit configuration references an existing project and table in your Hydrolix cluster. Additionally, verify that you have published a transform to your Hydrolix cluster that's compatible with your Fluent Bit data.

Invalid transform SQL

If you use K9s or kubectl to access your running Hydrolix cluster, navigate to one of your intake-head pods (e.g intake-head-5cf9dk247-idkok) and view the logs for the turbine container. If you observe a log line similar to the following:

│ 2025-02-25T20:22:40.573208471Z {"timestamp": "2025-02-25T20:22:40.572+00:00", "component": "query_executor", "level":"error", "message":"{\"bytes_read\":0,\"dict_used\":[],\"exception\":\"Code: 47. DB::Exception: Missing columns: 'some_int32 │
│ ' 'some_other_int32' 'primary' while processing query: 'SELECT primary, 10 * some_int32 AS some_int32, some_other_int32 FROM file('7aaaaec2-ee43-43db-9541-950e5195961c/3a1bc171-ac68-47f2-bfb2-4ce193c654a7/42bc986dc5eec4d3/input.4291266653.js │
│ on', 'JSONCompactEachRow', '`hdx_ingest_timestamp` DateTime,`account_id` Nullable(String),`mem_stats` Map(String,Nullable(UInt32)),`swap_stats` Map(String,Nullable(UInt32)),`cpu_stats` Map(String,Nullable(Float64)),`net_stats` Map(String,Nul │
│ lable(Float64)),`disk_stats` Map(String,Nullable(Float64))')', required columns: 'primary' 'some_other_int32' 'some_int32'. (UNKNOWN_IDENTIFIER) (version 23.8.10.1)\",\"exception_code\":47...

Or a line like the following in the intake-head container within the same pod:

2025-02-25T20:22:20.574400897Z {"error":"sink error: Code: 47. DB::Exception: Missing columns: 'some_int32' 'some_other_int32' 'primary' while processing query: 'SELECT primary, 10 * some_int32 AS some_int32, some_other_int32 FROM file('7aaa │
│ aec2-ee43-43db-9541-950e5195961c/3a1bc171-ac68-47f2-bfb2-4ce193c654a7/42bc986dc5eec4d3/input.650434658.json', 'JSONCompactEachRow', '`hdx_ingest_timestamp` DateTime,`account_id` Nullable(String),`mem_stats` Map(String,Nullable(UInt32)),`swap │
│ _stats` Map(String,Nullable(UInt32)),`cpu_stats` Map(String,Nullable(Float64)),`net_stats` Map(String,Nullable(Float64)),`disk_stats` Map(String,Nullable(Float64))')', required columns: 'primary' 'some_other_int32' 'some_int32'. (UNKNOWN_IDE │
│ NTIFIER)","file":"hdx_sink.go:447","level":"error","message":"Got error","timestamp":"2025-02-25T20:22:20.574+00:00"}

These both indicate a problem with the Transform SQL configured with the Hydrolix transform. You should verify that the Transform SQL references existing columns, or you can try removing it. Once the Transform SQL is either valid or removed, you should no longer see these error messages.