Skip to content

Overview

Get started⚓︎

The HTTP Streaming API is used to continuously consume events via HTTP to be loaded into the Hydrolix platform. Event messages are sent via HTTP POST to the /ingest/event API endpoint.

An event's structure can be of two types:

  • Predefined: the event has a predefined transform loaded into Hydrolix prior to loading data.
  • Self-Described: the event contains the transform within the message itself. This can be useful if you need to define your message structure at time of creation.

HTTP headers are used to provide information to the system about which table the data should be loaded into, the content-type of the incoming data, authentication information, and the transform used to prepare and store the data.

Basic Steps

  1. Create a Project/Table.
  2. Create a Transform.
  3. Scale the Stream Architecture.
  4. HTTP POST messages to the /ingest/event API endpoint.

Ensure that the project, table, and transform are all already configured. See Projects & Tables and Write Transforms.


The API Endpoint⚓︎

The HTTP endpoint uses the URI path /ingest/event and is made available by default on port 443 (HTTPS) to receive data. IP allowlists and authentication information can be used to grant/restrict access to the endpoint. See also Configure IP Access and User Authentication.

The POST method is used to send data with HTTP Headers used to define the characteristics of the message.

For example:

1
2
3
4
5
6
$ curl -s \
     -H 'content-type: application/json' \
     -H 'x-hdx-table: <project.table>' \
     -H 'x-hdx-transform: <pre-existing transform>' \
     -H 'Authorization: Bearer <token>'
     https://<myhost>/ingest/event -X POST -d @<my_data_file>

HTTP headers⚓︎

Most HTTP headers required for streaming ingest are prefixed with x-hdx-. Firewall rules should be adjusted where necessary to allow for the additional headers x-hdx- and Authorization. If x-hdx- headers are omitted, the ingestion endpoint responds with HTTP 4xx responses.

Header key Description Example Values Required/Optional
content-type The format of the data payload. application/json, text/csv Required
x-hdx-table An existing Hydrolix project and table to receive the data. Optionally, also influences intake-head pool selection. See Dynamic routing with headers Format: {project_name}.{table_name} Required
Authorization The OAuth bearer token. See below Required
x-hdx-transform The transform schema for use when handling payloads. See transform precedence implications when transform is specified in multiple ways. Optionally, also influences intake-head pool. See Dynamic routing with default headers. transform_name Optional
content-encoding Compression algorithms used. See content encoding. gzip Optional
x-hdx-token A table access token for tables configured to use stream authentication. token_value Optional
x-hdx-project Not used by intake system. Optional header to influence intake-head pool selection. See Dynamic routing with default headers. project_name Optional

Optional headers all have the default of NONE if not present.

The OAuth bearer token⚓︎

Get the bearer token, which is good for the next 24 hours, to authenticate future API calls. This command assumes you've set the $HDX_HOSTNAME, $HDX_USER, and $HDX_PASSWORD environment variables:

1
2
3
4
5
6
7
8
export HDX_TOKEN=$(
  curl -v -X POST -H "Content-Type: application/json" \
  https://$HDX_HOSTNAME/config/v1/login/ \
  -d "{
    \"username\":\"$HDX_USER\",
    \"password\":\"$HDX_PASSWORD\"  
  }" | jq -r ".auth_token.access_token"
)

Use query strings⚓︎

As an alternative to using HTTP headers to specify the table, transform, and ingest token, the following query string parameters can be used within the URI string:

Query parameter key Description
table The project and table name in the format project_name.table_name. Can also be used to determine which intake-head pool processes the incoming data. See Dynamic routing with headers for more information.
transform The name of the transform to use for ingest. Can also be used to determine which intake-head pool processes the incoming data. See Dynamic routing with headers for more information.
token The token to use to access the table. See stream authentication for more information.

For example:

curl 'https://hostname.hydrolix.live/ingest/event?table=myproject.mytable&transform=transform'  -H 'content-type: application/json' -H 'Authorization: Bearer <token>'

Supported payloads⚓︎

The payload for the POST can be:

  • A single JSON object
  • An array of JSON objects
  • Concatenated JSON objects, including newline-delimited JSON
  • A CSV (Character Separated) file
  • A Parquet file

Native HTTP compression is supported, with specific payload compression being specified within the transform (write schema). If another format is needed, please contact Hydrolix Support

JSON payloads⚓︎

JSON can be sent as a single JSON object, an array of JSON object or ND_JSON format.

Single JSON object
1
2
3
4
5
{
   "timestamp": "2020-03-01 00:00:00 PST",
   "a_metric": 45.2,
   "a_dimension": "My First Dimension"
}
Array of JSON objects
[
  {
    "timestamp": "2020-03-01 00:00:00 PST",
    "a_metric": 45.2,
    "a_dimension": "My First Dimension"
  },
  {
    "timestamp": "2020-03-01 00:00:15 PST",
    "a_metric": 12.6,
    "a_dimension": "My Second Dimension"
  }
]
ND_JSON, Newline Delimited JSON
{"timestamp": "2020-03-01 00:00:00 PST", "a_metric": 45.2,"a_tag": "My First Dimension"}
{"timestamp": "2020-03-01 00:00:15 PST", "a_metric": 12.6,"a_tag": "My Second Dimension"}

CSV payloads⚓︎

CSVs, or character-separated values, are sent as data structured as CSV. The transform schema must have:

  • The same delimiter defined in its properties as the character used as the delimiter in the data.
  • A "source": { "from_input_index": 0 } of each column that matches the structure of the data and has the position of the column.

The header for content-type should be as follows: content-type: text/csv.

1
2
3
4
5
6
7
8
9
curl -s \
     -H "x-hdx-table: my_test.testTable" \
     -H "x-hdx-transform: myTransform" \
     -H "content-type: text/csv" \
     -H "Authorization: Bearer $HDX_TOKEN" \
     $HDX_HYDROLIX_URL/ingest/event -X POST -d '
2020-03-01 00:00:00 PST,45.2,My First Dimension
2020-03-01 00:00:15 PST,12.6,My Second Dimension
'

Parquet payloads⚓︎

Parquet files are sent as binary data. The transform schema must have:

  • The type property set to parquet.
  • Output column names that match the Parquet file's column names, or explicit source mappings for columns with different names.

The header for content-type should be content-type: application/vnd.apache.parquet.

Because Parquet is a binary format, use --data-binary (not -d) with curl to avoid data corruption.

1
2
3
4
5
6
curl -s \
     -H "x-hdx-table: my_test.testTable" \
     -H "x-hdx-transform: myParquetTransform" \
     -H "content-type: application/vnd.apache.parquet" \
     -H "Authorization: Bearer $HDX_TOKEN" \
     $HDX_HYDROLIX_URL/ingest/event -X POST --data-binary @mydata.parquet

See Format Options for Parquet data type mapping, column mapping, and limitations.

Streaming decompression⚓︎

Streaming decompression is useful for receiving data compressed on-the-fly by HTTP clients before applying any compression settings from the transform.

Specify decompression algorithms using the Content-Encoding header.

Refer to compression algorithms for the list of supported algorithms.

The compression algorithm shouldn't be specified in both paces

If the incoming data is compressed with gzip, you can set compression in your transform or in Content-Encoding.

Specifying in both places causes the network transport receiver to decompress the data and when the transform-level decompression is attempted, it will fail. For more complex scenarios, see compression layering.

Streaming decompression allows independent configuration of compression algorithms to support network transport compression. This allows a single transform to be used in the batch or autoingest systems while supporting network transport compression in HTTP Stream API.

Uncompressed data⚓︎

When sending uncompressed data to the HTTP Stream API, either

  • omit the Content-Encoding header
  • include Content-Encoding: none explicitly indicating no compression

Layered compression⚓︎

List compression algorithms in the order they were applied to the original data. In this example, the corresponding transform uses no compression and receives uncompressed JSON data.

Specifying Multiple Stream Decompression Algorithms
1
2
3
4
5
6
7
8
$ curl --fail --silent \
  --header "x-hdx-table: news.requests" \
  --header  "Content-Type: application/json" \
  --header  "Authorization: Bearer ${HDX_TOKEN}" \
  --header "Content-Encoding: gzip, bzip2, zip" \
  --data-binary @data.json.gz.bz2.zip \
  -- "${HDX_HYDROLIX_URL}/ingest/event"
{"code":200,"message":"success"}

See also compression layering.

Event structure and transforms⚓︎

There are two types of event structure that can be sent to Hydrolix for load.

  • Predefined: events that have a transform predefined and configured in the platform prior to loading data.
  • Self-described: events that contain the transform in the message itself.

Predefined⚓︎

Predefined events require the Transform to be defined already within the platform. Transforms are directly linked to tables and define how data should be ingested. When using predefined events, the transform name is included in the x-hdx-transform header sent to the streaming end-point.

Default transform⚓︎

The user has the ability to mark a transform as the default transform for a table. This means that all streaming traffic for that table will be processed with that default transform, unless a different transform is selected by the HTTP header value.

Example : x-hdx-transform: <pre-existing transform>

In the example below, we assume

  • A project (the_project) and table (the_table) already exist
  • An existing schema attached to that table named my_transform_schema that has:
    • timestamp defined as "type":"datetime","primary":true
    • a_metric defined as "type":"double"
    • a_dimension defined as "type":"string".

      Specify an existing transform using HTTP header x-hdx-transform
      $ curl -s \
        -H "x-hdx-table: the_project.the_table" \
        -H "x-hdx-transform: my_transform" \
        -H "content-type: application/json" \
        -H "Authorization: Bearer <token>" \
        https://demo.hydrolix.io/ingest/event -X POST -d '
        [
          {
             "timestamp": "2020-03-01 00:00:00 PST",
             "a_metric": 45.2,
             "a_dimension": "My first Dimension"
          },
          {
             "timestamp": "2020-03-01 00:00:15 PST",
             "the_metric": 12.6,
             "a_dimension": "My Second Dimension"
          }
      ]'
      

Self-described⚓︎

In some cases, an ingest transform may not yet exist in the system. In this case, it can be included in the HTTP Request document. In these cases, the HTTP request document will have two parts:

1
2
3
4
5
6
7
8
9
$ curl -s \
    -H 'content-type: <data type being passed>' \
    -H 'x-hdx-table: <project.table>' \
    -H 'Authorization: Bearer <token>' \
    https://demo.hydrolix.io/ingest/event -X POST -d 
    '{
      "transform": <ingest Transform doc>,
      "data": <either JSON or CSV data payload>
    }'
  • transform - contains the Hydrolix transform telling the system how to interpret and manage the incoming data.

  • data - the data to be inserted into the table.

For example, here we define the transform to expect JSON data, and provide the values in arrays.

$ curl -s \
     -H 'content-type: application/json' \
     -H 'x-hdx-table: <project.table>' \
     -H 'Authorization: Bearer <token>' \
     https://demo.hydrolix.io/ingest/event -X POST -d 
'{
  "transform": {
    "type": "json",
    "settings": {
      "output_columns": [
        {
          "name": "timestamp",
          "datatype": {
            "type": "datetime",
            "primary": true,
            "format": "2006-01-02 15:04:05 MST"
          }
        },
        {
          "name": "clientId",
          "datatype": {
            "type": "uint64"
          }
        },
        {
          "name": "clientIp",
          "datatype": {
            "type": "string",
            "index": true,
            "default": "0.0.0.0"
          }
        },
        {
          "name": "clientCityCode",
          "datatype": {
            "type": "uint32"
          }
        },
        {
          "name": "resolverIp",
          "datatype": {
            "type": "string",
            "index": true,
            "default": "0.0.0.0"
          }
        },
        {
          "name": "resolveDuration",
          "datatype": {
            "type": "double",
            "default": -1.0
          }
        }
      ]
    }
  },
  "data": [
    ["2020-02-26 16:01:27 PST", 29991, "1.2.3.4/24", 1223, "1.4.5.7", 1.234],
    ["2020-02-26 16:01:28 PST", 29989, "1.2.3.5/24", 9190, "1.4.5.7", 1.324],
    ["2020-02-26 16:01:28 PST", 29990, "1.2.3.5/24", null, "1.4.5.7", 12.34]
  ]
}'

The example data is an array containing individual messages. Each message is represented as an array, containing values in the order they were described in the schema. For columns where a default is declared, NULL or empty values are replaced with default values.

Transform precedence⚓︎

Since a transform can be specified in multiple ways, a Hydrolix cluster uses the following descending order of precedence to determine which transform to use with an incoming payload:

  1. (Highest priority) Self-described
  2. Query parameter (transform)
  3. HTTP header (x-hdx-transform)
  4. (Lowest priority) Default transform on the destination table