Getting Started

The HTTP Streaming API is used to continuously consume events via HTTP to be loaded into the Hydrolix platform. Event messages are sent via HTTP POST to the /ingest/event API endpoint.

An event's structure can be of two types:

Predefined: the event has a predefined transform loaded into Hydrolix prior to loading data.
Self-Described: the event contains the transform within the message itself. This can be useful if you need to define your message structure at time of creation.

HTTP headers are used to provide information to the system about which table the data should be loaded into, the content-type of the incoming data, authentication information, and the transform used to prepare and store the data.

👍
Basic Steps

Create a Project/Table

Create a Transform

Scale the Stream Architecture .

HTTP POST messages to the /ingest/event API endpoint .

It is assumed that the project, table, and transform are all already configured. More information on how to set these up can be found on the Projects & Tables and Write Transforms pages.

The API Endpoint

The HTTP endpoint uses the URI path /ingest/event and is made available by default on port 443 (HTTPS) to receive data. IP allowlists and authentication information can be used to grant/restrict access to the endpoint (see Configure IP Access, User Authentication).

The POST method is used to send data with HTTP Headers used to define the characteristics of the message.

For example:

$ curl -s \
     -H 'content-type: application/json' \
     -H 'x-hdx-table: <project.table>' \
     -H 'x-hdx-transform: <pre-existing transform>' \
     -H 'Authorization: Bearer <token>'
     https://<myhost>/ingest/event -X POST -d @<my_data_file>

HTTP Headers

Most HTTP headers required for streaming ingest are prefixed with x-hdx-. Firewall rules should be adjusted where necessary to allow for the additional headers x-hdx- and Authorization. If x-hdx- headers are omitted, a 4xx-style response will be issued by the endpoint.

Header key	Description	Values
`content-type`	The format of the data payload.	`application/json`, `text/csv`
`x-hdx-table`	An existing Hydrolix project and table where the data should land. Can also be used to determine which `intake-head` pool processes the incoming data. See Dynamic routing with default headers for more information.	Format: `{project_name}.{table_name}`
`Authorization`	The OAuth bearer token.	See below

Optional headers all have the default of NONE if they are not present.

Header key	Description
`x-hdx-transform`	A transform schema for ingest. If this value is not present, a transform (write) schema must be included in the request. Can also be used to determine which `intake-head` pool processes the incoming data. See Dynamic routing with default headers for more information.
`content-encoding`	If the message is compressed, this value should be set to the compression type. More detail below on `content-encoding` here. If this value is not declared, Hydrolix will attempt to infer any applied compression.
`x-hdx-token`	An authentication token for tables configured to use stream authentication.
`x-hdx-project`	An existing Hydrolix project name where the data should land. Can also be used to determine which `intake-head` pool processes the incoming data. See Dynamic routing with default headers for more information.

The OAuth Bearer Token

Get the bearer token, which is good for the next 24 hours, to authenticate future API calls. This command assumes you've set the $HDX_HOSTNAME, $HDX_USER and $HDX_PASSWORD environment variables:

export HDX_TOKEN=$(
  curl -v -X POST -H "Content-Type: application/json" \
  https://$HDX_HOSTNAME/config/v1/login/ \
  -d "{
    \"username\":\"$HDX_USER\",
    \"password\":\"$HDX_PASSWORD\"  
  }" | jq -r ".auth_token.access_token"
)

Dynamic routing with default headers

You can configure dynamic routing using the following headers:

x-hdx-project
x-hdx-table
x-hdx-transform

This allows clients to specify the destination project, table, transform, and select an intake-head pool to ingest their data without updating the endpoint to which the client is sending data.

To enable this feature, update your Kubernetes config for an intake-head pool with the following routing block:

pools:
  routing_demo:
    routing:
      headers:
        x-hdx-table: my_project.my_table
        x-hdx-transform: my_transform
    name: routing_demo
    service: intake-head

With the above configuration, any traffic which contains the headers x-hdx-table: my_project.my_table and x-hdx-transform: my_transform will be routed through the intake-head pool called routing_demo.

For more information on creating and updating service pools, see the Resource Pools page.

Dynamic routing with custom headers

You can configure custom headers to supplement the default headers that can be used for intake-head routing by setting them using the traefik_service_allowed_headers tunable. Custom header keys should match the entries passed in the pool annotations. For example, consider the following Hydrolix spec:

spec:
  pools:
    intake-head-private-pool:
      routing:
        headers:
          x-hdx-intake-pool: intake-head-private-pool
      name: intake-head-private-pool
      service: intake-head
      cpu: "8"
      memory: 8Gi
      replicas: "5"

  traefik_service_allowed_headers: ['x-hdx-intake-pool']

Incoming traffic sent to the default ingest endpoint:

https://{myhost}.hydrolix.live/ingest/event

containing the header key/value pair x-hdx-intake-pool: intake-head-private-pool will be routed through the intake-head-private-pool ingest pool.

Using Query Strings

As an alternative to using HTTP headers to specify the table, transform, and ingest token, the following query string parameters can be used within the URI string:

Query parameter key	Description
`table`	The project and table name, separated by a `.`
`transform`	The name of the transform to use for ingest
`token`	The token to use to access the table. See stream authentication for more information.

For example:

curl 'https://{myhost}.hydrolix.live/ingest/event?table=myproject.mytable&transform=transform'  -H 'content-type: application/json' -H 'Authorization: Bearer <token>

Dynamic routing with default query parameters

You can use the table and transform query parameters to route traffic to a particular intake-head pool without updating the endpoint to which the client is sending data. For example, the following Hydrolix configuration spec:

pools:
  intake-head-private-pool:
    routing:
      query_params:
        table: my_project.my_table
        transform: my_transform
    name: intake-head-private-pool
    service: intake-head

allows a user to send data to intake-head-private-pool with a request using the specified query parameters:

POST /ingest/event?table=my_project.my_table&transform=my_transform HTTP/1.1
Host: {myhost}.hydrolix.live
Content-Type: application/json

Dynamic routing with custom query parameters

You can configure custom query parameters to supplement the default strings that can be used for intake-head routing, table and transform, by setting them using the traefik_service_allowed_query_params tunable. Custom parameter keys should match the entries passed in the pool annotations. For example, consider the following Hydrolix spec:

spec:
  pools:
    intake-head-private-pool:
      routing:
        query_params:
          intake-pool: intake-head-private-pool
      name: intake-head-private-pool
      service: intake-head
      cpu: "8"
      memory: 8Gi
      replicas: "5"

  traefik_service_allowed_query_params: ['intake-pool']

Then incoming traffic sent to the default ingest endpoint with the specified query parameter:

https://{myhost}.hydrolix.live/ingest/event?intake_pool=intake-head-private-pool

will be routed through the intake-head-private-pool ingest pool.

Dynamic routing: Verify Traefik config updates

Dynamic routing configuration in the Hydrolix cluster spec config results in updates to Traefik's configuration. This configuration will be updated with any new dynamic routing paths using headers or query parameters.

If you have not already, install K9s.

If an existing pool called intake-head-private-pool is updated with the following routing configuration:

spec:
  pools:
    intake-head-private-pool:
      name: intake-head-private-pool
      service: intake-head

spec:
  pools:
    intake-head-private-pool:
      name: intake-head-private-pool
      routing:
        headers:
          x-hdx-header: my_project.my_table
        query_params:
          intake-pool: private
      service: intake-head

You can confirm the Traefik configuration has been updated using the following steps.

Start k9s

k9s

Open up the pods selector

:pods

Select the Traefik pod and shell into the traefik container using the command s.
Run the following command:

cat /etc/traefik/dynamic_conf.yaml | grep 'PathPrefix(`/pool/intake-head-private-pool`)' -B 8 -A 2

After a few minutes of latency at most, you will observe the following changes:

http:
  routers: 
    slash-pool/intake-head-private-pool-router:
      rule: PathPrefix(`/pool/intake-head-private-pool`)
      service: intake-head-private-pool

http:
  routers: 
    slash-pool/intake-head-private-pool-router:
      rule: PathPrefix(`/pool/intake-head-private-pool`) || (PathPrefix(`/ingest`) && Header(`x-hdx-table`, `my_project.my_table`)) && Query(`intake-pool`, `private`)
      service: intake-head-private-pool

Dynamic routing: Precedence

Dynamic routing uses Traefik rules and priority to determine which ingest pools should handle an incoming request. Rules and priority within a Hydrolix cluster follow the following descending order of precedence:

PathPrefix(/pool/pool-name): Which endpoint the payload arrives at, if the endpoint is a non-default ingest endpoint. For example, https://{myhost}.hydrolix.live}/pool/{pool_name}/ingest/event will be handled by the ingest pool called {pool_name} regardless of the headers or query parameters included.
PathPrefix(/ingest) and Query Parameters (Query(key, value)): For example, a payload sent to https://{myhost}.hydrolix.live/ingest/event?table=my_table&transform=my_transform and the header x-hdx-myheader: secondary_poolwith the following cluster configuration:

spec:
  pools:
    custom-ingest-pool:
      routing:
        query_params:
          table: my_table
          transform: my_transform
      name: custom-ingest-pool
      service: intake-head
    secondary-pool:
      routing:
        headers:
          x-hdx-myheader: secondary_pool
      name: secondary-pool
      service: intake-head

would be handled by custom-ingest-pool rather than secondary-pool.

PathPrefix(/ingest) and HTTP Headers (Header(key, value)): For example, a payload sent to https://{myhost}.hydrolix.live/ingest/event with headers x-hdx-table: my_table and x-hdx-transform: my_transform with the following cluster configuration:

spec:
  pools:
    custom-ingest-pool:
      routing:
        headers:
          x-hdx-table: my_table
          x-hdx-transform: my_transform
      name: custom-ingest-pool
      service: intake-head

would be handled by the custom-ingest-pool ingest pool.

Dynamic routing: Conflicting rules

Determining which ingest pool will handle a payload given conflicting rules is determined by Traefik priority calculation. For conflicting rules, the rules are sorted based on descending value of rules length.

For example, given the following configuration:

spec:
  pools:
    long-rule-pool:
      routing:
        query_params:
          table: my_table
          transform: my_transform
      name: short-rule-pool
      service: intake-head
    short-rule-pool:
      routing:
        query_params:
          table: my_table
      name: long-rule-pool
      service: intake-head

This generates the Traefik rules:

rule: PathPrefix(/pool/long-rule-pool) || PathPrefix(/ingest) && Query(table, my_table) && Query(transform, my_transform)
rule: PathPrefix(/pool/short-rule-pool) || PathPrefix(/ingest) && Query(table, my_table)

A payload coming sent to https://{myhost}.hydrolix.live/ingest/event?table_my_table&transform=my_transform matches both rules. However, it will match the longer rule, thereby being processed by the long-rule-pool.

Supported Payloads

The payload for the POST can be:

A single JSON object.
An array of JSON objects.
New-line delimited (ND_JSON) format.
A CSV (Character Separated) file.

Native HTTP compression is supported, with specific payload compression being specified within the transform (write schema). If another format is needed, please contact Hydrolix Support

JSON Payloads

JSON can be sent as a single JSON object, an array of JSON object or ND_JSON format.

Single JSON object

{
   "timestamp": "2020-03-01 00:00:00 PST",
   "a_metric": 45.2,
   "a_dimension": "My First Dimension"
}

Array of JSON objects

[
  {
    "timestamp": "2020-03-01 00:00:00 PST",
    "a_metric": 45.2,
    "a_dimension": "My First Dimension"
  },
  {
    "timestamp": "2020-03-01 00:00:15 PST",
    "a_metric": 12.6,
    "a_dimension": "My Second Dimension"
  }
]

ND_JSON

{"timestamp": "2020-03-01 00:00:00 PST", "a_metric": 45.2,"a_tag": "My First Dimension"}
{"timestamp": "2020-03-01 00:00:15 PST", "a_metric": 12.6,"a_tag": "My Second Dimension"}

CSV Payloads

CSVs, or character-separated values, are sent as data structured as CSV. The transform schema must have:

The same delimiter defined in its properties as the character used as the delimiter in the data.
A "source": { "from_input_index": 0 } of each column that matches the structure of the data and has the position of the column.

The header for content-type should be as follows: content-type: text/csv.

% curl -s \
     -H "x-hdx-table: my_test.testTable" \
     -H "x-hdx-transform: myTransform" \
     -H "content-type: text/csv" \
     -H "Authorization: Bearer $HDX_TOKEN" \
     $HDX_HYDROLIX_URL/ingest/event -X POST -d '
2020-03-01 00:00:00 PST,45.2,My First Dimension
2020-03-01 00:00:15 PST,12.6,My Second Dimension
'

Compression

If the message is compressed, the content-encoding: value should be set to the compression type. More detail on the content-encoding header can be found here. If this value is not declared, Hydrolix will attempt to infer any applied compression.

See Compression Algorithms for a complete list of supported compression types. If no compression is used on the request document, the content-encoding header is not required.

📘
Multiple Compression Options
It should be noted that the HTTP request document can be compressed using multiple methods if required before being ingested to Hydrolix.
If data was encoded with compression A, then B, then C,
# Original Encoding
encoded_data = C(B(A(data)))

# Header
content-encoding: A, B, C

# Decoded Data
decoded_data = decodeA(decodeB(decodeC(encoded_data)))
For example:
$ gzip mydata.json
$ curl -s \
     -H "x-hdx-table: my_table" \
     -H "x-hdx-transform: my_transform_schema" \
     -H "content-type: application/json" \
     -H "content-encoding: gzip, bzip2, zip" \
     -H "Authorization: Bearer <token> \
     https://demo.hydrolix.io/ingest/event -X POST --data-binary @mydata.json.gz.bzip2.zip

Event Structure and Transforms

As discussed earlier, there are two types of event structure that can be sent to Hydrolix for load.

Predefined: events that have a transform predefined and configured in the platform prior to loading data.
Self-Described: events that contain the transform in the message itself.

Predefined

Predefined events require the Transform to be defined already within the platform. Transforms are directly linked to tables and define how data should be ingested. When using predefined events, the transform name is included in the x-hdx-transform header sent to the streaming end-point.

Default Transform

The user has the ability to mark a transform as the default transform for a table. This means that all streaming traffic for that table will be processed with that default transform, unless a different transform is selected by the HTTP header value.

Example :
x-hdx-transform: <pre-existing transform>

In the example below we are assuming:

A project and table already exists called the_project.the_table
An existing schema attached to that table named my_transform_schema that has:
- timestamp defined as "type":"datetime","primary":true
- a_metric defined as "type":"double"
- a_dimension defeined as "type":"string".

$ curl -s \
  -H "x-hdx-table: the_project.the_table" \
  -H "x-hdx-transform: my_transform" \
  -H "content-type: application/json" \
  -H "Authorization: Bearer <token>" \
  https://demo.hydrolix.io/ingest/event -X POST -d '
  [
    {
       "timestamp": "2020-03-01 00:00:00 PST",
       "a_metric": 45.2,
       "a_dimension": "My first Dimension"
    },
    {
       "timestamp": "2020-03-01 00:00:15 PST",
       "the_metric": 12.6,
       "a_dimension": "My Second Dimension"
    }
]'

Self-Described

In some cases, an ingest transform may not yet exist in the system. In this case, it can be included in the HTTP Request document. In these cases, the HTTP request document will have two parts:

$ curl -s \
    -H 'content-type: <data type being passed>' \
    -H 'x-hdx-table: <project.table>' \
    -H 'Authorization: Bearer <token>' \
    https://demo.hydrolix.io/ingest/event -X POST -d 
    '{
      "transform": <ingest Transform doc>,
      "data": <either JSON or CSV data payload>
    }'

transform - contains the Hydrolix transform telling the system how to interpret and manage the incoming data.
data - the data that is to be inserted into the table.

For example, below we have defined the transform, with the data to be sent as JSON, and the values are provided as arrays.

$ curl -s \
     -H 'content-type: application/json' \
     -H 'x-hdx-table: <project.table>' \
     -H 'Authorization: Bearer <token>' \
     https://demo.hydrolix.io/ingest/event -X POST -d 
'{
  "transform": {
    "type": "json",
    "settings":{
      "output_columns": [
        {
          "name": "timestamp",
          "datatype": {
            "type": "datetime",
            "primary": true,
            "format": "2006-01-02 15:04:05 MST"
          }
        },
        {
          "name": "clientId",
          "datatype": {
            "type": "uint64"
          }
        },
        {
          "name": "clientIp",
          "datatype": {
            "type": "string",
            "index": true,
            "default": "0.0.0.0"
          }
        },
        {
          "name": "clientCityCode",
          "datatype": {
            "type": "uint32"
          }
        },
        {
          "name": "resolverIp",
          "datatype": {
            "type": "string",
            "index": true,
            "default": "0.0.0.0"
          }
        },
        {
          "name": "resolveDuration",
          "datatype": {
            "type": "double",
            "default": -1.0
          }
        }
      ]
    },
  }
  "data": [
    ["2020-02-26 16:01:27 PST", 29991, "1.2.3.4/24", 1223, "1.4.5.7", 1.234],
    ["2020-02-26 16:01:28 PST", 29989, "1.2.3.5/24", 9190, "1.4.5.7", 1.324],
    ["2020-02-26 16:01:28 PST", 29990, "1.2.3.5/24", null, "1.4.5.7", 12.34]
	]
}'

This example is an array containing individual messages. Each message is represented as an array, containing values in the order they were described in the schema. If a message in this particular batch contains a null or empty value, it is replaced with the default value if one is declared.

HTTP Stream API

Getting Started

👍
Basic Steps

The API Endpoint

HTTP Headers

The OAuth Bearer Token

Dynamic routing with default headers

Dynamic routing with custom headers

Using Query Strings

Dynamic routing with default query parameters

Dynamic routing with custom query parameters

Dynamic routing: Verify Traefik config updates

Dynamic routing: Precedence

Dynamic routing: Conflicting rules

Supported Payloads

JSON Payloads

CSV Payloads

Compression

📘
Multiple Compression Options

Event Structure and Transforms

Predefined

Default Transform

Self-Described

Getting Started

👍Basic Steps

The API Endpoint

HTTP Headers

The OAuth Bearer Token

Dynamic routing with default headers

Dynamic routing with custom headers

Using Query Strings

Dynamic routing with default query parameters

Dynamic routing with custom query parameters

Dynamic routing: Verify Traefik config updates

Dynamic routing: Precedence

Dynamic routing: Conflicting rules

Supported Payloads

JSON Payloads

CSV Payloads

Compression

📘Multiple Compression Options

Event Structure and Transforms

Predefined

Default Transform

Self-Described

👍
Basic Steps

📘
Multiple Compression Options