HTTP Stream API

Getting started

The HTTP Streaming API is used to continuously consume events via HTTP to be loaded into the Hydrolix platform. Event messages are sent via HTTP POST to the /ingest/event API endpoint.

An Events structure can be of two types:

  • Predefined - the event has an already predefined transform loaded into the platform prior to data load.
  • Self-Described - the event contains the transform within the message itself. This can be useful if you need to define your message structure at time of creation.

HTTP headers are used to provide information to the system on which table (x-hdx-table ) the data should be loaded into, the content-type (content-type) of the incoming data the and transform (x-hdx-transform - unless using self-described streaming) to prepare and store the data.

👍

The basic steps are:

  1. Create a Project/Table
  2. Create a Transform
  3. Scale the Stream Architecture AWS Cloudformation/Kubernetes.
  4. HTTP POST messages to the /ingest/event API endpoint .

It is assumed that the project, table and transform are all already configured. More information on how to set these up can be found here - Projects & Tables, Write Transforms.


The API end-point

The HTTP endpoint uses the URI path /ingest/event is made available by default on port 443 (HTTPS) to receive data. IP allowlists can be used to grant/restrict access to the end-point (IP Access List / Enabling Access & TLS).

The POST method is used to send data with HTTP Headers used to define the characteristics of the message.

For example:

$ curl -s \
     -H 'content-type: application/json' \
     -H 'x-hdx-table: <project.table>' \
     -H 'x-hdx-transform: <pre-existing transform>' \
     https://<myhost>/ingest/event -X POST -d @<my_data_file>

HTTP Headers

All HTTP headers required for Streaming Ingest are prefixed with x-hdx-. Firewall rules should be adjusted where necessary to allow for the additional headers x-hdx-. If x-hdx- headers are omitted 4xx style response will be issued by the end-point.

Header keyDescriptionValues
content-typeThe format of the data payload.application/json, text/csv
x-hdx-tableAn existing Hydrolix <project.table> where the data should land.Format: <project_name>.<table_name>

Optional headers all have the default of NONE if they are not present.

Header keyDescription
x-hdx-transformA transform schema for ingest. If this value is not present, a transform (write) schema must be included in the request.
content-encodingIf the message is compressed, this value should be set to the compression type. More detail below on content-encoding here. If this value is not declared, Hydrolix will attempt to infer any applied compression.

Using Query Strings

In addition to using headers to define table and transform the following query string parameters can be used as an alternative to headers within the URI string.

Header keyDescription
tableThe project and table name, separated by a .
transformThe name of the transform to be used for ingest

For example:

curl 'https://my.hydrolix.net/ingest/event?table=myproject.mytable&transform=transform'  -H 'content-type: application/json'

Supported Payloads

The payload for the POST can be

  • a single JSON object
  • an array of JSON objects
  • new-line delimited (ND_JSON) format
  • a CSV (Character Separated) file.

Native HTTP compression is supported out of the box, with specific payload compression being specified within the transform (write schema). If another format is needed, please contact Hydrolix Support

JSON Payloads

JSON can be sent as a single JSON object, an array of JSON object or ND_JSON format.

Single JSON object

{
   "timestamp": "2020-03-01 00:00:00 PST",
   "a_metric": 45.2,
   "a_dimension": "My First Dimension"
}

Array of JSON objects

[
  {
    "timestamp": "2020-03-01 00:00:00 PST",
    "a_metric": 45.2,
    "a_dimension": "My First Dimension"
  },
  {
    "timestamp": "2020-03-01 00:00:15 PST",
    "a_metric": 12.6,
    "a_dimension": "My Second Dimension"
  }
]

ND_JSON

{"timestamp": "2020-03-01 00:00:00 PST", "a_metric": 45.2,"a_tag": "My First Dimension"}
{"timestamp": "2020-03-01 00:00:15 PST", "a_metric": 12.6,"a_tag": "My Second Dimension"}

CSV Payloads

CSVs, or CHARACTER separated values, are sent as data structured as CSV. The transform schema my_transform_schema must have:

  • the same delimiter defined in its properties as the character used as the delimiter in the data
  • a "source": { "from_input_index": 0 } of each column that matches the structure of the data and has the position of the column.

The header for content-type should be as follows content-type: text/csv.

$ curl -s \
     -H "x-hdx-table: the_project.the_table" \
     -H "x-hdx-transform: my_transform_schema" \
     -H "content-type: text/csv" \
     https://demo.hydrolix.io/ingest/event -X POST -d '
    "2020-03-01 00:00:00 PST", 45.2, "My First Dimension"
    "2020-03-01 00:00:15 PST", 12.6, "My Second Dimension"
		'

Compression

If the message is compressed, the content-encoding: value should be set to the compression type. More detail on the content-encoding header can be found here. If this value is not declared, Hydrolix will attempt to infer any applied compression.

See Compression for a complete list of supported compression types. If no compression is used on the request document, the content-encoding header is not required.

It should be noted that the http request document can be compressed using multiple methods if required before being ingested to Hydrolix.

If data was encoded with compression A, then B, then C,

# Original Encoding
encoded_data = C(B(A(data)))

# Header
content-encoding: A, B, C

# Decoded Data
decoded_data = decodeA(decodeB(decodeC(encoded_data)))

For example:

$ gzip mydata.json
$ curl -s \
     -H "x-hdx-table: my_table" \
     -H "x-hdx-transform: my_transform_schema" \
     -H "content-type: application/json" \
     -H "content-encoding: gzip, bzip2, zip" \
     https://demo.hydrolix.io/ingest/event -X POST --data-binary @mydata.json.gz.bzip2.zip

Event Structure

As discussed earlier there are two types of event structure that can be sent to Hydrolix for load.

  • Predefined - events that have a transform predefined and configured in the platform prior to data load.
  • Self-Described - events that contain the transform in the message itself.

Predefined

Predefined events require the Transform Schemas to be deployed already within the platform. Transforms are directly linked to tables and define how data should be treated. When using predefined events the transform is referenced in the x-hdx-transform header sent to the streaming end-point.

Default Transform

The user has the ability to mark a transform as the default transform for a table. This means that all streaming traffic for that table will be processed with that default transform, unless a different transform is selected by the HTTP header value.

Example :
x-hdx-transform: <pre-existing transform>

In the example below we are assuming:

  • a project and table already exists called the_project.the_table
  • an existing schema attached to that table named my_transform_schema that has:
    • timestamp defined as "type":"datetime","primary":true
    • a_metric defined as "type":"double"
    • a_dimension defeined as "type":"string".
$ curl -s \
  -H "x-hdx-table: the_project.the_table" \
  -H "x-hdx-transform: my_transform_schema" \
  -H "content-type: application/json" \
  https://demo.hydrolix.io/ingest/event -X POST -d '
  [
    {
       "timestamp": "2020-03-01 00:00:00 PST",
       "a_metric": 45.2,
       "a_dimension": "My first Dimension"
    },
    {
       "timestamp": "2020-03-01 00:00:15 PST",
       "the_metric": 12.6,
       "a_dimension": "My Second Dimension"
    }
]'

Self-Described

In some cases, an ingest transform may not yet exist in the system. In this case, it can be included in the HTTP Request document. In these cases, the HTTP request document will have two parts:

$ curl -s \
    -H 'content-type: <data type being passed>' \
    -H 'x-hdx-table: <project.table>' \
    https://demo.hydrolix.io/ingest/event -X POST -d 
    '{
      "transform": <ingest Transform doc>,
      "data": <either JSON or CSV data payload>
    }'
  • transform - contains the Hydrolix Write Transforms telling the system how to interpret and manage the incoming data.

  • data - the data that is to be inserted into the table.

For example in the below we have defined the transform, with the data to be sent as JSON, but the values are provided as arrays.

$ curl -s \
     -H 'content-type: application/json' \
     -H 'x-hdx-table: <project.table>' \
     https://demo.hydrolix.io/ingest/event -X POST -d 
'{
  "transform": {
    "type": "json",
    "settings":{
      "output_columns": [
        {
          "name": "timestamp",
          "datatype": {
            "type": "datetime",
            "primary": true,
            "format": "2006-01-02 15:04:05 MST"
          }
        },
        {
          "name": "clientId",
          "datatype": {
            "type": "uint64"
          }
        },
        {
          "name": "clientIp",
          "datatype": {
            "type": "string",
            "index": true,
            "default": "0.0.0.0"
          }
        },
        {
          "name": "clientCityCode",
          "datatype": {
            "type": "uint32"
          }
        },
        {
          "name": "resolverIp",
          "datatype": {
            "type": "string",
            "index": true,
            "default": "0.0.0.0"
          }
        },
        {
          "name": "resolveDuration",
          "datatype": {
            "type": "double",
            "default": -1.0
          }
        }
      ]
    },
  }
  "data": [
  ["2020-02-26 16:01:27 PST", 29991, "1.2.3.4/24", 1223, "1.4.5.7", 1.234],
  ["2020-02-26 16:01:28 PST", 29989, "1.2.3.5/24", 9190, "1.4.5.7", 1.324],
  ["2020-02-26 16:01:28 PST", 29990, "1.2.3.5/24", null, "1.4.5.7", 12.34]
	]
}'

In the above instance The data element is an array containing individual messages. Each message is represented as an array, contains values in the order they were described in the schema. If a message in this particular batch contains a null or empty value, it will be replaced with the "default" value if one is declared.

"data": [
    [ "2020-02-26 16:01:27 PST", 29991, "1.2.3.4/24", 1223, "1.4.5.7", 1.234 ],
    [ "2020-02-26 16:01:28 PST", 29989, "1.2.3.5/24", 9190, "1.4.5.7", 1.324 ],
    [ "2020-02-26 16:01:28 PST", 29990, "1.2.3.5/24", null, "1.4.5.7", 12.34 ]
  ]