Streaming API (HTTP) Options

The Streaming API is used to continuously consume events via HTTP for ingestion into Hydrolix.

Events are sent via HTTP POST to the ingest/event API endpoint. HTTP headers x-hdx-table and x-hdx-transform (unless using self-described streaming) and content-type are used to describe the table the data is going too, the transform that should be used and what type of content is being sent.

To ingest data using this method you will need to:

  1. Create a Table
  2. Create a Transform
  3. HTTP POST /ingest/event API endpoint (with the headers).

https://<myhost>/ingest/event

$ curl -s \
     -H 'content-type: application/json' \
     -H 'x-hdx-table: <project.table>' \
     -H 'x-hdx-transform: <pre-existing transform>' \
     https://<myhost>/ingest/event -X POST -d @<my_data_file>

HTTP Headers

  • All HTTP headers required for Streaming Ingest are prefixed with x-hdx-.
  • Firewall rules need to allow for the additional headers x-hdx-. If these are not supplied a 4xx style response is issued.
Header keyDescriptionValues
content-typeThe format of the data payload.application/json, text/csv
x-hdx-tableAn existing Hydrolix <project.table> where the data should land.Format: <project_name>.<table_name>

Optional headers all have the default of NONE if they are not present.

Header keyDescription
content-encodingIf the message is compressed, this value should be set to the compression type. More detail below. If this value is not declared, Hydrolix will try to infer any applied compression from content-encoding.
x-hdx-transformA transform schema for ingest. If this value is not present, a transform (write) schema must be included in the request.

Existing Schema Ingestion Scenarios

Transform Schemas are associated with tables and define how data should be treated on ingestion. When a transform schema already exists within the Hydrolix platform, it is referenced by name in the HDX HTTP headers.

Default Transform

The user has the ability to mark a transform as the default transform for a table. This means that all streaming traffic for that table will be processed with that default transform, unless a different transform is selected by the HTTP header value.

Example :
x-hdx-transform: <pre-existing transform>

Specific notice should be taken when using self-describing JSON events (events which carry their transform with them) as these will not work until an additional header is added indicating that the traffic is self-describing. The work around for the moment is to unset the default setting within the transform.

Supported Payloads

The payload for the POST can be

  • a single JSON object
  • an array of JSON objects
  • new-line delimited (ND_JSON) format
  • a CSV (Character Separated) file.

Native HTTP compression is supported out of the box, with specific payload compression being specified within the transform (write schema). If another format is needed, please contact Hydrolix Support

JSON

JSON can be sent as a single JSON object, an array of JSON object or ND_JSON format.

Single JSON object

{
   "timestamp": "2020-03-01 00:00:00 PST",
   "the_metric": 45.2,
   "the_tag": "My First Silly Tag"
}

Array of JSON objects

[
  {
    "timestamp": "2020-03-01 00:00:00 PST",
    "the_metric": 45.2,
    "the_tag": "My First Silly Tag"
  },
  {
    "timestamp": "2020-03-01 00:00:15 PST",
    "the_metric": 12.6,
    "the_tag": "My Second Silly Tag"
  }
]

ND_JSON

{"timestamp": "2020-03-01 00:00:00 PST", "the_metric": 45.2,"the_tag": "My First Silly Tag"}
{"timestamp": "2020-03-01 00:00:15 PST","the_metric": 12.6,"the_tag": "My Second Silly Tag"}

Assumptions for this example:

  • You have already created a table.
  • You have an existing schema attached to that table named my_transform_schema that has:
    - timestamp defined as "type":"datetime","primary":true
    - my_metric defined as "type":"double"
    - my_tag defeined as "type":"string".
$ curl -s \
  -H "x-hdx-table: the_project.the_table" \
  -H "x-hdx-transform: my_transform_schema" \
  -H "content-type: application/json" \
  https://demo.hydrolix.io/ingest/event -X POST -d '
  [
    {
       "timestamp": "2020-03-01 00:00:00 PST",
       "the_metric": 45.2,
       "the_tag": "My First Silly Tag"
    },
    {
       "timestamp": "2020-03-01 00:00:15 PST",
       "the_metric": 12.6,
       "the_tag": "My Second Silly Tag"
    }
]'

CSV

CSVs, or CHARACTER separated values, are sent as data structured as CSV. The transform schema my_transform_schema must have:

  • the same delimiter defined in its properties as the character used as the delimiter in the data
  • a position of each column that matches the structure of the data
$ curl -s \
     -H "x-hdx-table: the_project.the_table" \
     -H "x-hdx-transform: my_transform_schema" \
     -H "content-type: text/csv" \
     https://demo.hydrolix.io/ingest/event -X POST -d '
    "2020-03-01 00:00:00 PST", 45.2, "My First Silly Tag"
    "2020-03-01 00:00:15 PST", 12.6, "My Second Silly Tag"
        '

Compressed HTTP Request Document

The http request document can be compressed using multiple methods before being ingested to Hydrolix.

If data was encoded with A, then B, then C,
encoded_data = C(B(A(data)))

then content-encoding: A, B, C

meaning Hydrolix will:

decoded_data = decodeA(decodeB(decodeC(encoded_data)))

For example:

$ gzip mydata.json
$ curl -s \
     -H "x-hdx-table: my_table" \
     -H "x-hdx-transform: my_transform_schema" \
     -H "content-type: application/json" \
     -H "content-encoding: gzip, bzip2, zip" \
     https://demo.hydrolix.io/ingest/event -X POST --data-binary @mydata.json.gz.bzip2.zip

See Compression for a complete list of supported compression types.

If no compression is used on the request document, the content-encoding header is not required.

🚧

Handling Error

In case you receive an error response (400 for example) it means the data won't be indexed, you need to manage error by either retrying the request or store the data for manual review
You can retry but if it fails more than 3 times you should store that data for manual review.
Our error message contains the reason it's failing. (Unexpected EOF for example)

Ad Hoc Ingestion

In some cases, an ingest transform may not yet exist in the system. In this case, it can be included in the HTTP Request document. In these cases, the HTTP request document will have two parts:

$ curl -s \
    -H 'content-type: <data type being passed>' \
    -H 'x-hdx-table: <project.table>' \
    https://demo.hydrolix.io/ingest -X POST -d 
    '{
      "transform": <ingest Transform doc>,
      "data": <either JSON or CSV data payload>
    }'

📘

Notes on data:

  • For ad hoc JSON ingestion can be
    - a single document
    - an array of documents, each representing a row
    - an array of values in the same order as the transform
    - For ad hoc CSV ingestion, the data element must adhere to the data description. In this case, data is typically a string.
    - "data": "the encoded data"

Example Ad Hoc API Call

In the following example, the data sent is JSON, but the values are provided as arrays. Note, position is not used, and the order of the data elements are the same as the transform.

$ curl -s \
     -H 'content-type: application/json' \
     -H 'x-hdx-table: <project.table>' \
     https://demo.hydrolix.io/ingest -X POST -d 
'{
  "transform": {
    "type": "json",
    "output_columns": [
      {
        "name": "timestamp",
        "datatype": {
          "type": "datetime",
          "primary": true,
          "format": "2006-01-02 15:04:05 MST"
        }
      },
      {
        "name": "clientId",
        "datatype": {
          "type": "uint64"
        }
      },
      {
        "name": "clientIp",
        "datatype": {
          "type": "string",
          "index": true,
          "default": "0.0.0.0"
        }
      },
      {
        "name": "clientCityCode",
        "datatype": {
          "type": "uint32"
        }
      },
      {
        "name": "resolverIp",
        "datatype": {
          "type": "string",
          "index": true,
          "default": "0.0.0.0"
        }
      },
      {
        "name": "resolveDuration",
        "datatype": {
          "type": "double",
          "default": -1.0
        }
      }
    ]
  },
  "data": [
  ["2020-02-26 16:01:27 PST", 29991, "1.2.3.4/24", 1223, "1.4.5.7", 1.234],
  ["2020-02-26 16:01:28 PST", 29989, "1.2.3.5/24", 9190, "1.4.5.7", 1.324],
  ["2020-02-26 16:01:28 PST", 29990, "1.2.3.5/24", null, "1.4.5.7", 12.34]
    ]
}'

Did this page help you?