Vector Integration
Vector can be used to collect, optionally transform, and route log data into Hydrolix. For a longer explanation, see their documentation.
Before You Begin
You will need a running Hydrolix deployment. Follow the instructions for your preferred cloud vendor if you have yet to deploy Hydrolix. From your running Hydrolix cluster, you will need the following information:
Item | Description | Example value | How to obtain this information |
---|---|---|---|
Org ID | This is the ID of your Hydrolix organization. | bc3f041b-ea52-45e1-b9d5-41819c465854 | You can determine what orgs exist within your running Hydrolix cluster using our API. The org ID you use should correspond to the table in which you want to store Vector data. |
Project name and ID | This is a logical namespace for your table below. You will need the name of the project corresponding to the table in which you want to store Vector data. | Name: hdx_project_for_vector ID: c2445da3-ec63-42be-9f12-f104f9656f4c | Instructions for creating a project can be followed here. |
Table name and ID | This is the destination to which you will route data from your Vector instance. You will need the name of the table you want to store Vector data in. | Name:hdx_table_for_vector ID: 798dfcf4-7560-4b24-1337-29017619baa6 | Instructions for creating a table can be followed here. |
OAuth bearer token | Use an OAuth Bearer token in order to authenticate with the Hydrolix Streaming Ingest API. | eyXrxkzoN2fRiiKpnV... | Instructions for generating a bearer token can be followed here. |
Keep these values on hand for later steps within this guide.
Getting Started
There are three required components to successfully integrate Vector with a running Hydrolix cluster.
Component | Description |
---|---|
Data source | The source for log data that will be routed to your Vector instance. See Vector's Demo logs page for more details on the source component used in this getting started guide. |
Vector instance | The observability data pipeline. Responsible for collecting and routing the sample data to a "sink" which in this guide will be your running Hydrolix cluster. |
Hydrolix cluster | The data "sink." You should have this deployed prior to continuing. |
A Vector configuration contains three types of components:
- A Source to collect or receive data from observability data sources into Vector.
- A Transform to change or restructure that observability data as it passes through your Vector topology. Specifying a transform is optional and not covered within this guide. Note that this is different than the Hydrolix Transform, which you'll specify below.
- A Sink which is an external service or destination to which to route your data. You will use a running Hydrolix cluster as a sink within the following instructions.
For more information on Vector topologies, see this section of their quickstart guide.
Install Vector
Follow Vector's installation instructions and make sure you verify your installation by running:
vector --version
...which should return a result similar to:
vector 0.40.1 <architecture etc...>
Configure Vector to Use the Hydrolix HTTP Streaming API
Create a configuration file called vector-hdx.yaml
containing the following:
sources:
generate_demo_logs:
type: "demo_logs"
format: "syslog"
count: 100
sinks:
hydrolix:
type: http
inputs:
- generate_demo_logs
uri: https://<your-hdx-host-url>/ingest/event
encoding:
codec: json
compression: gzip
headers:
X-HDX-Table: <hdx_project_for_vector>.<hdx_table_for_vector>
X-HDX-Transform: <hydrolix_transform_for_vector_data_id>
Authorization: Bearer <token>
The source we are using is Vector’s Demo Logs component which generates fake log events to use for testing.
Insert your Hydrolix host URL, project name, table name, and your bearer token where indicated. You will generate a transform and insert the value for <hydrolix_transform_for_vector_data_id>
below.
This configuration will insert data into your running Hydrolix cluster using the HTTP Streaming API. The headers provided correspond to those required by the Streaming API endpoint (https://<your-hdx-host-url>/ingest/event
) which specify the table and project into which it should insert the log data routed through Vector.
Create a Hydrolix Transform
Now you can create a transform which will map the incoming sample data generated and routed by Vector onto columns within your Hydrolix table. This is distinct from Vector transforms which are a similar concept but configured for your Vector instance. For more in-depth information on Hydrolix transforms, see this page.
Using the Hydrolix API, create a transform for your table using the following request body:
{
"name": "hydrolix_transform_for_vector_data",
"description": "Transform sample log data from Vector",
"type": "json",
"settings": {
"is_default": true,
"compression": "none",
"output_columns": [
{
"name": "host",
"datatype": {
"type": "string",
"index": true
}
},
{
"name": "message",
"datatype": {
"type": "string",
"index": true
}
},
{
"name": "source_type",
"datatype": {
"type": "string",
"index": true
}
},
{
"name": "timestamp",
"datatype": {
"type": "datetime",
"format": "2006-01-02T15:04:05.999999Z",
"resolution": "ms",
"primary": true
}
}
]
}
}
If you want to create a transform by interacting with the API via cURL, you can insert the above request body into a cURL request like the following, with a similar expected response:
#!/bin/bash
ORG_ID="<hdx_org_id>"
PROJECT_ID="<hdx_project_for_vector_id>"
TABLE_ID="<hdx_table_for_vector_id>"
TOKEN="<bearer_token>"
curl --request POST \
--url https://<your-hdx-host-url>/config/v1/orgs/$ORG_ID/projects/$PROJECT_ID/tables/$TABLE_ID/transforms/ \
--header 'accept: application/json' \
--header "authorization: Bearer $TOKEN" \
--header 'content-type: application/json' \
--data '
{
"name": "hydrolix_transform_for_vector_data",
"description": "Transform sample logs from Vector",
"type": "json",
"settings": {
"is_default": true,
"compression": "none",
"output_columns": [
{
"name": "host",
"datatype": {
"type": "string",
"index": true
}
},
{
"name": "message",
"datatype": {
"type": "string",
"index": true
}
},
{
"name": "source_type",
"datatype": {
"type": "string",
"index": true
}
},
{
"name": "timestamp",
"datatype": {
"type": "datetime",
"format": "2006-01-02T15:04:05.999999Z",
"resolution": "ms",
"primary": true
}
}
]
}
}'
{
"name": "vector_transform",
"description": "Transform sample logs from Vector",
"uuid": "84ceef18-c1cd-449f-86c3-be48e37fec60",
"created": "2024-09-16T22:44:42.899750Z",
"modified": "2024-09-16T22:44:42.899759Z",
"settings": {
"is_default": true,
"sql_transform": null,
"sample_data": null,
"output_columns": [
{
"name": "host",
"datatype": {
"type": "string",
"index": true,
"format": null,
"resolution": "seconds",
"default": null,
"script": null,
"source": null,
"suppress": false
}
},
{
"name": "message",
"datatype": {
"type": "string",
"index": true,
"format": null,
"resolution": "seconds",
"default": null,
"script": null,
"source": null,
"suppress": false
}
},
{
"name": "source_type",
"datatype": {
"type": "string",
"index": true,
"format": null,
"resolution": "seconds",
"default": null,
"script": null,
"source": null,
"suppress": false
}
},
{
"name": "timestamp",
"datatype": {
"type": "datetime",
"index": false,
"primary": true,
"format": "2006-01-02T15:04:05.999999Z",
"resolution": "ms",
"default": null,
"script": null,
"source": null,
"suppress": false
}
}
],
"compression": "none",
"wurfl": null,
"format_details": {
"flattening": {
"active": false,
"map_flattening_strategy": null,
"slice_flattening_strategy": null,
"depth": null
}
}
},
"url": "https://<your-hdx-host-url>/config/v1/orgs/bc3f041b-ea52-45e1-b9d5-41819c465854/projects/c2445da3-ec63-42be-9f12-f104f9656f4c/tables/798dfcf4-7560-4b24-1337-29017619baa6/transforms/84ceef18-c1cd-449f-86c3-be48e37fec60",
"type": "json",
"table": "798dfcf4-7560-4b24-1337-29017619baa6"{
"name": "vector_transform",
"description": "Transform sample logs from Vector",
"uuid": "84ceef18-c1cd-449f-86c3-be48e37fec60",
"created": "2024-09-16T22:44:42.899750Z",
"modified": "2024-09-16T22:44:42.899759Z",
"settings": {
"is_default": true,
"sql_transform": null,
"sample_data": null,
"output_columns": [
{
"name": "host",
"datatype": {
"type": "string",
"index": true,
"format": null,
"resolution": "seconds",
"default": null,
"script": null,
"source": null,
"suppress": false
}
},
{
"name": "message",
"datatype": {
"type": "string",
"index": true,
"format": null,
"resolution": "seconds",
"default": null,
"script": null,
"source": null,
"suppress": false
}
},
{
"name": "source_type",
"datatype": {
"type": "string",
"index": true,
"format": null,
"resolution": "seconds",
"default": null,
"script": null,
"source": null,
"suppress": false
}
},
{
"name": "timestamp",
"datatype": {
"type": "datetime",
"index": false,
"primary": true,
"format": "2006-01-02T15:04:05.999999Z",
"resolution": "ms",
"default": null,
"script": null,
"source": null,
"suppress": false
}
}
],
"compression": "none",
"wurfl": null,
"format_details": {
"flattening": {
"active": false,
"map_flattening_strategy": null,
"slice_flattening_strategy": null,
"depth": null
}
}
},
"url": "https://<your-hdx-host-url>/config/v1/orgs/bc3f041b-ea52-45e1-b9d5-41819c465854/projects/c2445da3-ec63-42be-9f12-f104f9656f4c/tables/798dfcf4-7560-4b24-1337-29017619baa6/transforms/84ceef18-c1cd-449f-86c3-be48e37fec60",
"type": "json",
"table": "798dfcf4-7560-4b24-1337-29017619baa6"
Insert the top-level uuid -- in this case, 84ceef18-c1cd-449f-86c3-be48e37fec60
-- into your vector-hdx.yaml
as the value for the X-HDX-Transform
header.
Now that everything is wired up, run the following command within the directory containing your vector-hdx.yaml
to run your Vector instance:
vector --config vector-hydrolix.yaml
Dummy data generated by the Vector demo logs source should be flowing through your Vector instance and into your Hydrolix cluster. If you're unsure whether your Vector instance is successfully receiving data and forwarding it to your Hydrolix cluster, try running Vector with debug logging enabled:
vector -v --config vector-hydrolix.yaml
Check Your work
You should see log output similar to the following:
2024-09-16T22:51:45.037525Z INFO vector::app: Log level is enabled. level="debug" ... 2024-09-16T22:51:45.042267Z DEBUG vector::topology::builder: Building new source. component=generate_demo_logs 2024-09-16T22:51:45.042882Z DEBUG vector::topology::builder: Building new sink. component=hydrolix ... 2024-09-16T22:51:45.303018Z DEBUG vector::topology::running: Configuring outputs for source. component=generate_demo_logs 2024-09-16T22:51:45.303099Z INFO vector::topology::builder: Healthcheck passed. 2024-09-16T22:51:45.303952Z DEBUG vector::topology::running: Configuring output for component. component=generate_demo_logs output_id=None 2024-09-16T22:51:45.303985Z DEBUG vector::topology::running: Connecting inputs for sink. component=hydrolix 2024-09-16T22:51:45.304012Z DEBUG vector::topology::running: Adding component input to fanout. component=hydrolix fanout_id=generate_demo_logs 2024-09-16T22:51:45.304043Z DEBUG vector::topology::running: Spawning new source. key=generate_demo_logs 2024-09-16T22:51:45.304123Z INFO vector: Vector has started. debug="false" version="0.40.1" arch="x86_64" revision="a9392b0 2024-08-26 14:35:19.223750502" ... 2024-09-16T22:51:46.308260Z DEBUG sink{component_kind="sink" component_id=hydrolix component_type=http}:request{request_id=1}:http: vector::internal_events::http_client: Sending HTTP request. uri=https://<your-hdx-host-url>/ingest/event method=POST version=HTTP/1.1 headers={"content-type": "application/json", "content-encoding": "gzip", "authorization": Sensitive, "x-hdx-table": "vector_project.vector_table", "x-hdx-transform": "vector_transform", "accept-encoding": "zstd,gzip,deflate,br", "user-agent": "Vector/0.40.1 (<arch etc.>)"} body=[191 bytes] 2024-09-16T22:51:46.311437Z DEBUG hyper::client::connect::dns: resolving host="<your-hdx-host-url>" 2024-09-16T22:51:46.318351Z DEBUG sink{component_kind="sink" component_id=hydrolix component_type=http}:request{request_id=1}:http: hyper::client::connect::http: connecting to <host-ip>:443 2024-09-16T22:51:46.342478Z DEBUG sink{component_kind="sink" component_id=hydrolix component_type=http}:request{request_id=1}:http: hyper::client::connect::http: connected to <host-ip>:443 ... 2024-09-16T22:51:46.412380Z DEBUG sink{component_kind="sink" component_id=hydrolix component_type=http}:request{request_id=1}:http: hyper::client::pool: pooling idle connection for ("https", <your-hdx-host-url>) 2024-09-16T22:51:46.412429Z DEBUG sink{component_kind="sink" component_id=hydrolix component_type=http}:request{request_id=1}:http: vector::internal_events::http_client: HTTP response. status=200 OK version=HTTP/1.1 headers={"content-length": "32", "content-type": "application/json; charset=utf-8", "date": "Mon, 16 Sep 2024 22:51:46 GMT", "server": "Hydrolix"} body=[32 bytes] ...
Customize Your Sources
If you already have other sources configured for Vector for log data, you can ingest those sources into Hydrolix as well by adding those as inputs within the sinks
section of your vector-hdx.yaml
sources:
custom-source-1:
type: "<type>"
format: "<format>"
custom-source-2:
type: "<type>"
format: "<format>"
sinks:
hydrolix:
type: http
inputs:
- generate_demo_logs
- custom-source-1
- custom-source-2
uri: https://<your-hdx-host-url>/ingest/event
encoding:
codec: json
compression: gzip
headers:
X-HDX-Table: <hdx_project_for_vector>.<hdx_table_for_vector>
X-HDX-Transform: <hydrolix_transform_for_vector_data_id>
Authorization: Bearer <token>
If these sources are generating data with a different format from the demo logs, you will need to create additional transforms for these sources and configure the transforms within your vector-hdx.yaml
. If you’re unsure about the format of the data from any particular source, consider setting up a Hydrolix catch-all transform.
Updated 3 months ago