Kafka Streaming Ingestion

You can configure a Hydrolix Projects and Tables to continuously ingest data from one or more Kafka-based streaming sources.

Preparing with a transform

As with any kind of Hydrolix data ingestion, you must prepare the target table with a transform schema that maps the incoming Kafka events onto the columns of the target Hydrolix table.

With that done, you can then use either your stack's Hydrolix API or its web UI to specify the Kafka servers and topics that your table can receive events from.

Setting up a Kafka source through the API

To create, update, or view a table's Kafka sources through the Hydrolix API, use the /sources/kafka API endpoint. Note that you will need the IDs of your target organization, project, and table in order to address that endpoint's full path.

Adding Kafka sources to a table involves /sources/kafka API endpoint, sending it a JSON document describing the connection between your Hydrolix table and the Kafka data streams.

Configuration properties

The JSON document describing a Kafka-based data source must have the following properties:

PropertyPurpose
nameA name for this data source. Must be unique within the target table's organization.
typeThe type of ingestion.
subtypeThe literal value "kafka".
transformThe name of the transform to apply to this ingestion.
tableThe Hydrolix project and table to ingest into, expressed in the format "PROJECT.TABLE".
settingsThe settings to use for this particular Kafka source.
pool_nameThe name that Hydrolix will assign to the ingest pool.
instance_typeThe type of instance Hydrolix will use within the Kafka ingest pool. (We recommend using m5.large.)
instance_countThe number of instances Hydrolix will apply to the Kafka ingest pool.

The settings property contains a JSON object that defines the Kafka servers and topics this table should receive events from.

ElementDescription
bootstrap_serversAn array of Kafka bootstrap server addresses, in "HOST:PORT" format.
topicsAn array of Kafka topics to import from the given servers.

An example configuration

For example, this JSON document sets up a connection between a pair of Kafka sources running at the domain example.com and the Hydrolix table my-project.my-table.

{
    "name": "my-kafka-ingest",
    "type": "pull",
    "subtype": "kafka",
    "transform": "my_transform",
    "table": "my_project.my_table",
    "settings": {
        "bootstrap_servers": [
            "kafka-1.example.com:9092",
            "kafka-2.example.com:9092"
        ],
        "topics": [ "my_topic" ]
    },
    "pool_name": "my-kafka-pool",
    "instance_type": "m5.large",
    "instance_count": "2"
}

You would then install this streaming configuration by sending it as a POST request to:

https://YOUR-HYDROLIX-HOST.hydrolix.live/orgs/ORG-ID/projects/PROJECT-ID/tables/TABLE-ID/sources/kafka/

Setting up a Kafka source through the UI

To create, view, and manage your stack's Kafka sources through its web UI, visit https://YOUR-HYDROLIX-HOST.hydrolix.live/data_sources in your web browser.

Getting more help

If you need more help using Kafka with Hydrolix, or you'd just like to learn more about this integration, please contact Hydrolix support.


Did this page help you?