Streaming Data with the Ingest API

API streaming is used to continuously consume events via HTTP for ingestion into Hydrolix.

Events as sent via HTTP POST to the ingest/event API endpoint. HTTP headers x-hdx-table and x-hdx-transform and content-type are required.

To ingest data using this method you will need to:

  1. Create a Table
  2. Create a Transform
  3. HTTP POST /ingest/event API endpoint

https://<myhost>/ingest/event

For example:

$ curl -s \
     -H 'content-type: <data type being passed>' \
     -H 'x-hdx-table: <project.table>' \
     -H 'x-hdx-transform: <pre-existing transform>' \
     https://<myhost>/ingest/event -X POST -d @<filename>

Supported Data Formats

Hydrolix currently supports streaming ingestion of JSON and CSV data, either raw or compressed. If another format is needed, please contact Hydrolix Support

HTTP Headers

  • All HTTP headers required for Streaming Ingest are prefixed with x-hdx-.
  • Firewall rules would need to allow for x-hdx- headers.

Streaming ingest requires the following items that are passed in as HTTP headers.

Header key Description Values
content-type The format of the data payload. application/json, text/csv
x-hdx-table An existing Hydrolix <project.table> where the data should land. Format: <project_name>.<table_name>

Optional headers all have the default of NONE if they are not present.

Header key Description
content-encoding If the message is compressed, this value should be set to the compression type. More detail below. If this value is not declared, Hydrolix will try to infer any applied compression from content-encoding.
x-hdx-transform A transform schema for ingest. If this value is not present, a transform document must be included in the request.

Existing Schema Ingestion Scenarios

Transform Schemas are associated with tables and define how data should be treated on ingestion. When a transform schema already exists, it is referenced by name in the HDX HTTP headers.

Default Transform

The user has the ability to mark a transform as the default transform for a table. This means that all streaming traffic for that table will be processed with that default transform, unless a different transform is selected by the HTTP header value.

Example : x-hdx-transform: <pre-existing transform>

Specific notice should be taken when using self-describing JSON events (events which carry their transform with them) as these will not work until an additional header is added indicating that the traffic is self-describing. The work around for the moment is to unset the default setting within the transform.

JSON

JSON is sent as an array of one or more JSON documents to Hydrolix.

Assumptions for this example:

  • You have already created a table.
  • You have an existing schema attached to that table named my_transform_schema that has:
    • timestamp as the primary index
    • a metric named the_metric
    • and a tag named the_tag.
$ curl -s \
     -H "x-hdx-table: the_project.the_table" \
     -H "x-hdx-transform: my_transform_schema" \
		 -H "content-type: application/json" \
		 
     https://demo.hydrolix.io/ingest/event -X POST -d '
		 [
			 {
				 "timestamp": "2020-03-01 00:00:00 PST",
				 "the_metric": 45.2,
				 "the_tag": "My First Silly Tag"
			},
			{
				 "timestamp": "2020-03-01 00:00:15 PST",
				 "the_metric": 12.6,
				 "the_tag": "My Second Silly Tag"
			}
		]'

CSV

CSVs, or CHARACTER separated values, are sent as data structured as CSV. The transform schema my_transform_schema must have:

  • the same delimiter defined in its properties as the character used as the delimiter in the data
  • a position of each column that matches the structure of the data
$ curl -s \
     -H "x-hdx-table: the_project.the_table" \
     -H "x-hdx-transform: my_transform_schema" \
		 -H "content-type: text/csv" \
     https://demo.hydrolix.io/ingest/event -X POST -d '
		 "2020-03-01 00:00:00 PST", 45.2, "My First Silly Tag"
		 "2020-03-01 00:00:15 PST", 12.6, "My Second Silly Tag"
		'

Compressed HTTP Request Document

The http request document can be compressed using multiple methods before being ingested to Hydrolix.

If data was encoded with A, then B, then C, encoded_data = C(B(A(data)))

then content-encoding: A, B, C

meaning Hydrolix will:

decoded_data = decodeA(decodeB(decodeC(encoded_data)))

For example:

$ gzip mydata.json
$ curl -s \
     -H "x-hdx-table: my_table" \
     -H "x-hdx-transform: my_transform_schema" \
     -H "content-type: application/json" \
		 -H "content-encoding: gzip, bzip2, zip" \
     https://demo.hydrolix.io/ingest/event -X POST --data-binary @mydata.json.gz.bzip2.zip

See Ingest Transforms for a complete list of supported compression types.

If no compression is used on the request document, the content-encoding header is not required.

Ad Hoc Ingestion

In some cases, an ingest transform may not yet exist in the system. In this case, it can be included in the HTTP Request document. In these cases, the HTTP request document will have two parts:

$ curl -s \
     -H 'content-type: <data type being passed>' \
     -H 'x-hdx-table: <project.table>' \
     https://demo.hydrolix.io/ingest -X POST -d @<filename>
{
	"transform": [<a proper Ingest Transform Document>](/use/concepts/transforms),
	"data": <the data, type depends on input format>
}

Notes on data: - For ad hoc JSON ingestion can be - a single document - an array of documents, each representing a row - an array of values in the same order as the transform - For ad hoc CSV ingestion, the data element must adhere to the data description. In this case, data is typically a string. - "data": "the encoded data"

Example Ad Hoc API Call

In the following example, the data sent is JSON, but the values are provided as arrays. Note, position is not used, and the order of the data elements are the same as the transform.

$ curl -s \
     -H 'content-type: application/json' \
     -H 'x-hdx-table: <project.table>' \
     https://demo.hydrolix.io/ingest -X POST -d 
'{
	"transform": {
		"type": "json",
		"output_columns": [{
				"name": "timestamp",
				"type": "datetime",
				"format": "2006-01-02 15:04:03 MST",
				"treatment": "primary"
			},
			{
				"name": "clientId",
				"type": "uint64",
				"treatment": "tag",
				"default": 0
			},
			{
				"name": "clientIp",
				"type": "string",
				"treatment": "tag",
				"default": "0.0.0.0"
			},
			{
				"name": "clientCityCode",
				"type": "uint64",
				"treatment": "tag",
				"default": 0
			},
			{
				"name": "resolverIp",
				"type": "string",
				"treatment": "tag",
				"default": "0.0.0.0"
			},
			{
				"name": "resolveDuration",
				"type": "double",
				"treatment": "metric",
				"default": -1.0
			}
		]
	},
	"data": [
		["2020-02-26 16:01:27 PST", 29991, "1.2.3.4/24", 1223, "1.4.5.7", 1.234],
		["2020-02-26 16:01:28 PST", 29989, "1.2.3.5/24", 9190, "1.4.5.7", 1.324],
		["2020-02-26 16:01:28 PST", 29990, "1.2.3.5/24", null, "1.4.5.7", 12.34]
	]
}'