Vector Integration

Vector can be used to collect, optionally transform, and route log data into Hydrolix. For a longer explanation, see their documentation.

Before You Begin

You will need a running Hydrolix deployment. Follow the instructions for your preferred cloud vendor if you have yet to deploy Hydrolix. From your running Hydrolix cluster, you will need the following information:

ItemDescriptionExample valueHow to obtain this information
Org IDThis is the ID of your Hydrolix organization.bc3f041b-ea52-45e1-b9d5-41819c465854You can determine what orgs exist within your running Hydrolix cluster using our API. The org ID you use should correspond to the table in which you want to store Vector data.
Project name and IDThis is a logical namespace for your table below. You will need the name of the project corresponding to the table in which you want to store Vector data.Name: hdx_project_for_vector

ID: c2445da3-ec63-42be-9f12-f104f9656f4c
Instructions for creating a project can be followed here.
Table name and IDThis is the destination to which you will route data from your Vector instance. You will need the name of the table you want to store Vector data in.Name:hdx_table_for_vector

ID: 798dfcf4-7560-4b24-1337-29017619baa6
Instructions for creating a table can be followed here.
OAuth bearer tokenUse an OAuth Bearer token in order to authenticate with the Hydrolix Streaming Ingest API.eyXrxkzoN2fRiiKpnV...Instructions for generating a bearer token can be followed here.

Keep these values on hand for later steps within this guide.

Getting Started

There are three required components to successfully integrate Vector with a running Hydrolix cluster.

ComponentDescription
Data sourceThe source for log data that will be routed to your Vector instance. See Vector's Demo logs page for more details on the source component used in this getting started guide.
Vector instanceThe observability data pipeline. Responsible for collecting and routing the sample data to a "sink" which in this guide will be your running Hydrolix cluster.
Hydrolix clusterThe data "sink." You should have this deployed prior to continuing.

A Vector configuration contains three types of components:

  • A Source to collect or receive data from observability data sources into Vector.
  • A Transform to change or restructure that observability data as it passes through your Vector topology. Specifying a transform is optional and not covered within this guide. Note that this is different than the Hydrolix Transform, which you'll specify below.
  • A Sink which is an external service or destination to which to route your data. You will use a running Hydrolix cluster as a sink within the following instructions.

For more information on Vector topologies, see this section of their quickstart guide.

Install Vector

Follow Vector's installation instructions and make sure you verify your installation by running:

vector --version

...which should return a result similar to:

vector 0.40.1 <architecture etc...>

Configure Vector to Use the Hydrolix HTTP Streaming API

Create a configuration file called vector-hdx.yaml containing the following:

sources:
  generate_demo_logs:
    type:   "demo_logs"
    format: "syslog"
    count:  100

sinks:
  hydrolix:
    type: http
    inputs:
      - generate_demo_logs
    uri: https://<your-hdx-host-url>/ingest/event
    encoding:
      codec: json
    compression: gzip
    headers:
      X-HDX-Table: <hdx_project_for_vector>.<hdx_table_for_vector>
      X-HDX-Transform: <hydrolix_transform_for_vector_data_id>
      Authorization: Bearer <token>

The source we are using is Vector’s Demo Logs component which generates fake log events to use for testing.

Insert your Hydrolix host URL, project name, table name, and your bearer token where indicated. You will generate a transform and insert the value for <hydrolix_transform_for_vector_data_id> below.

This configuration will insert data into your running Hydrolix cluster using the HTTP Streaming API. The headers provided correspond to those required by the Streaming API endpoint (https://<your-hdx-host-url>/ingest/event) which specify the table and project into which it should insert the log data routed through Vector.

Create a Hydrolix Transform

Now you can create a transform which will map the incoming sample data generated and routed by Vector onto columns within your Hydrolix table. This is distinct from Vector transforms which are a similar concept but configured for your Vector instance. For more in-depth information on Hydrolix transforms, see this page.

Using the Hydrolix API, create a transform for your table using the following request body:

{
    "name": "hydrolix_transform_for_vector_data",
    "description": "Transform sample log data from Vector",
    "type": "json",
    "settings": {
        "is_default": true,
        "compression": "none",
        "output_columns": [
            {
                "name": "host",
                "datatype": {
                    "type": "string",
                    "index": true
                }
            },
            {
                "name": "message",
                "datatype": {
                    "type": "string",
                    "index": true
                }
            },
            {
                "name": "source_type",
                "datatype": {
                    "type": "string",
                    "index": true
                }
            },
            {
                "name": "timestamp",
                "datatype": {
                    "type": "datetime",
                    "format": "2006-01-02T15:04:05.999999Z",
                    "resolution": "ms",
                    "primary": true
                }
            }
        ]
        }
}

If you want to create a transform by interacting with the API via cURL, you can insert the above request body into a cURL request like the following, with a similar expected response:

#!/bin/bash
ORG_ID="<hdx_org_id>"
PROJECT_ID="<hdx_project_for_vector_id>"
TABLE_ID="<hdx_table_for_vector_id>"
TOKEN="<bearer_token>"

curl --request POST \
     --url https://<your-hdx-host-url>/config/v1/orgs/$ORG_ID/projects/$PROJECT_ID/tables/$TABLE_ID/transforms/ \
     --header 'accept: application/json' \
     --header "authorization: Bearer $TOKEN" \
     --header 'content-type: application/json' \
     --data '
{
    "name": "hydrolix_transform_for_vector_data",
    "description": "Transform sample logs from Vector",
    "type": "json",
    "settings": {
        "is_default": true,
        "compression": "none",
        "output_columns": [
            {
                "name": "host",
                "datatype": {
                    "type": "string",
                    "index": true
                }
            },
            {
                "name": "message",
                "datatype": {
                    "type": "string",
                    "index": true
                }
            },
            {
                "name": "source_type",
                "datatype": {
                    "type": "string",
                    "index": true
                }
            },
            {
                "name": "timestamp",
                "datatype": {
                    "type": "datetime",
                    "format": "2006-01-02T15:04:05.999999Z",
                    "resolution": "ms",
                    "primary": true
                }
            }
        ]
        }
}'
{
  "name": "vector_transform",
  "description": "Transform sample logs from Vector",
  "uuid": "84ceef18-c1cd-449f-86c3-be48e37fec60",
  "created": "2024-09-16T22:44:42.899750Z",
  "modified": "2024-09-16T22:44:42.899759Z",
  "settings": {
    "is_default": true,
    "sql_transform": null,
    "sample_data": null,
    "output_columns": [
      {
        "name": "host",
        "datatype": {
          "type": "string",
          "index": true,
          "format": null,
          "resolution": "seconds",
          "default": null,
          "script": null,
          "source": null,
          "suppress": false
        }
      },
      {
        "name": "message",
        "datatype": {
          "type": "string",
          "index": true,
          "format": null,
          "resolution": "seconds",
          "default": null,
          "script": null,
          "source": null,
          "suppress": false
        }
      },
      {
        "name": "source_type",
        "datatype": {
          "type": "string",
          "index": true,
          "format": null,
          "resolution": "seconds",
          "default": null,
          "script": null,
          "source": null,
          "suppress": false
        }
      },
      {
        "name": "timestamp",
        "datatype": {
          "type": "datetime",
          "index": false,
          "primary": true,
          "format": "2006-01-02T15:04:05.999999Z",
          "resolution": "ms",
          "default": null,
          "script": null,
          "source": null,
          "suppress": false
        }
      }
    ],
    "compression": "none",
    "wurfl": null,
    "format_details": {
      "flattening": {
        "active": false,
        "map_flattening_strategy": null,
        "slice_flattening_strategy": null,
        "depth": null
      }
    }
  },
  "url": "https://<your-hdx-host-url>/config/v1/orgs/bc3f041b-ea52-45e1-b9d5-41819c465854/projects/c2445da3-ec63-42be-9f12-f104f9656f4c/tables/798dfcf4-7560-4b24-1337-29017619baa6/transforms/84ceef18-c1cd-449f-86c3-be48e37fec60",
  "type": "json",
  "table": "798dfcf4-7560-4b24-1337-29017619baa6"{
  "name": "vector_transform",
  "description": "Transform sample logs from Vector",
  "uuid": "84ceef18-c1cd-449f-86c3-be48e37fec60",
  "created": "2024-09-16T22:44:42.899750Z",
  "modified": "2024-09-16T22:44:42.899759Z",
  "settings": {
    "is_default": true,
    "sql_transform": null,
    "sample_data": null,
    "output_columns": [
      {
        "name": "host",
        "datatype": {
          "type": "string",
          "index": true,
          "format": null,
          "resolution": "seconds",
          "default": null,
          "script": null,
          "source": null,
          "suppress": false
        }
      },
      {
        "name": "message",
        "datatype": {
          "type": "string",
          "index": true,
          "format": null,
          "resolution": "seconds",
          "default": null,
          "script": null,
          "source": null,
          "suppress": false
        }
      },
      {
        "name": "source_type",
        "datatype": {
          "type": "string",
          "index": true,
          "format": null,
          "resolution": "seconds",
          "default": null,
          "script": null,
          "source": null,
          "suppress": false
        }
      },
      {
        "name": "timestamp",
        "datatype": {
          "type": "datetime",
          "index": false,
          "primary": true,
          "format": "2006-01-02T15:04:05.999999Z",
          "resolution": "ms",
          "default": null,
          "script": null,
          "source": null,
          "suppress": false
        }
      }
    ],
    "compression": "none",
    "wurfl": null,
    "format_details": {
      "flattening": {
        "active": false,
        "map_flattening_strategy": null,
        "slice_flattening_strategy": null,
        "depth": null
      }
    }
  },
  "url": "https://<your-hdx-host-url>/config/v1/orgs/bc3f041b-ea52-45e1-b9d5-41819c465854/projects/c2445da3-ec63-42be-9f12-f104f9656f4c/tables/798dfcf4-7560-4b24-1337-29017619baa6/transforms/84ceef18-c1cd-449f-86c3-be48e37fec60",
  "type": "json",
  "table": "798dfcf4-7560-4b24-1337-29017619baa6"

Insert the top-level uuid -- in this case, 84ceef18-c1cd-449f-86c3-be48e37fec60 -- into your vector-hdx.yaml as the value for the X-HDX-Transform header.

Now that everything is wired up, run the following command within the directory containing your vector-hdx.yaml to run your Vector instance:

vector --config vector-hydrolix.yaml

Dummy data generated by the Vector demo logs source should be flowing through your Vector instance and into your Hydrolix cluster. If you're unsure whether your Vector instance is successfully receiving data and forwarding it to your Hydrolix cluster, try running Vector with debug logging enabled:

vector -v --config vector-hydrolix.yaml

👍

Check Your work

You should see log output similar to the following:

2024-09-16T22:51:45.037525Z  INFO vector::app: Log level is enabled. level="debug"
...
2024-09-16T22:51:45.042267Z DEBUG vector::topology::builder: Building new source. component=generate_demo_logs
2024-09-16T22:51:45.042882Z DEBUG vector::topology::builder: Building new sink. component=hydrolix
...
2024-09-16T22:51:45.303018Z DEBUG vector::topology::running: Configuring outputs for source. component=generate_demo_logs
2024-09-16T22:51:45.303099Z  INFO vector::topology::builder: Healthcheck passed.
2024-09-16T22:51:45.303952Z DEBUG vector::topology::running: Configuring output for component. component=generate_demo_logs output_id=None
2024-09-16T22:51:45.303985Z DEBUG vector::topology::running: Connecting inputs for sink. component=hydrolix
2024-09-16T22:51:45.304012Z DEBUG vector::topology::running: Adding component input to fanout. component=hydrolix fanout_id=generate_demo_logs
2024-09-16T22:51:45.304043Z DEBUG vector::topology::running: Spawning new source. key=generate_demo_logs
2024-09-16T22:51:45.304123Z  INFO vector: Vector has started. debug="false" version="0.40.1" arch="x86_64" revision="a9392b0 2024-08-26 14:35:19.223750502"
...
2024-09-16T22:51:46.308260Z DEBUG sink{component_kind="sink" component_id=hydrolix component_type=http}:request{request_id=1}:http: vector::internal_events::http_client: Sending HTTP request. uri=https://<your-hdx-host-url>/ingest/event method=POST version=HTTP/1.1 headers={"content-type": "application/json", "content-encoding": "gzip", "authorization": Sensitive, "x-hdx-table": "vector_project.vector_table", "x-hdx-transform": "vector_transform", "accept-encoding": "zstd,gzip,deflate,br", "user-agent": "Vector/0.40.1 (<arch etc.>)"} body=[191 bytes]
2024-09-16T22:51:46.311437Z DEBUG hyper::client::connect::dns: resolving host="<your-hdx-host-url>"
2024-09-16T22:51:46.318351Z DEBUG sink{component_kind="sink" component_id=hydrolix component_type=http}:request{request_id=1}:http: hyper::client::connect::http: connecting to <host-ip>:443
2024-09-16T22:51:46.342478Z DEBUG sink{component_kind="sink" component_id=hydrolix component_type=http}:request{request_id=1}:http: hyper::client::connect::http: connected to <host-ip>:443
...
2024-09-16T22:51:46.412380Z DEBUG sink{component_kind="sink" component_id=hydrolix component_type=http}:request{request_id=1}:http: hyper::client::pool: pooling idle connection for ("https", <your-hdx-host-url>)
2024-09-16T22:51:46.412429Z DEBUG sink{component_kind="sink" component_id=hydrolix component_type=http}:request{request_id=1}:http: vector::internal_events::http_client: HTTP response. status=200 OK version=HTTP/1.1 headers={"content-length": "32", "content-type": "application/json; charset=utf-8", "date": "Mon, 16 Sep 2024 22:51:46 GMT", "server": "Hydrolix"} body=[32 bytes]
...

Customize Your Sources

If you already have other sources configured for Vector for log data, you can ingest those sources into Hydrolix as well by adding those as inputs within the sinks section of your vector-hdx.yaml

sources:
  custom-source-1:
    type:   "<type>"
    format: "<format>"
  custom-source-2:
    type:   "<type>"
    format: "<format>"

sinks:
  hydrolix:
    type: http
    inputs:
      - generate_demo_logs
      - custom-source-1
      - custom-source-2
    uri: https://<your-hdx-host-url>/ingest/event
    encoding:
      codec: json
    compression: gzip
    headers:
      X-HDX-Table: <hdx_project_for_vector>.<hdx_table_for_vector>
      X-HDX-Transform: <hydrolix_transform_for_vector_data_id>
      Authorization: Bearer <token>

If these sources are generating data with a different format from the demo logs, you will need to create additional transforms for these sources and configure the transforms within your vector-hdx.yaml. If you’re unsure about the format of the data from any particular source, consider setting up a Hydrolix catch-all transform.