Kafka Streaming Ingestion

Kafka streaming is used to continuously consume events from one or more Kafka topics for ingestion into Hydrolix.

The sources/kafka API endpoint allows you to define which Kafka brokers and topics to connect to. Along with the scale and instance type required to meet your throughput needs.

To ingest data using this method you will need to:

  1. Create a Table (stream settings)
  2. Create a Transform ("is_default": true)
  3. Create a Kafka source via the UI or API

https://<myhost>/orgs/<org_id>/projects/<project_id>/tables/<table_id>/sources/kafka/

Example sources/kafka Request

Click arrow to expand

https://<myhost>/orgs/<org_id>/projects/<project_id>/tables/<table_id>/sources/kafka/
	
{
    "name": "kafkaIngest",
    "type": "pull",
    "subtype": "kafka",
    "transform": "my_transform",
    "table": "my_project.my_table",
    "settings": {
        "bootstrap_servers": [ "111.222.333.444:9092" ],
        "topics": [ "my_topic" ]
    },
    "pool_name": "aKafkaPool",
    "instance_type": "m5.large",
    "instance_count": "2"
}

Create a Kafka Source.

https://<myhost>/orgs/<org_id>/projects/<project_id>/tables/<table_id>/sources/kafka/

{
    "name": "<Source Name>",
    "type": "pull",
    "subtype": "kafka",
    "transform": "<transform_name>",
    "table": "<project_name>.<table_name>",
    "settings": {
        "bootstrap_servers": [ "<boot_strap servers:<PORT>" ],
        "topics": [ "<topic_name>" ]
    },
    "pool_name": "<ingest pool name>",
    "instance_type": "<instance type>",
    "instance_count": "<number of instances>"
}

Job Attributes

A transform describes the shape of individual pieces of data. A job describes how to treat the data set as a whole as it is being ingested.

Element Purpose
name A unique name for this Kafka source in this organization
type The type of ingestion
subtype Only accepts the value kafka
transform The name of the transform to use
table The name of the project and the name of the table ‘.’ seperated
settings The settings to use for this particular Kafka source
pool_name The settings to use for this particular Kafka source
instance_type The type of instances to be used within the Kafka ingest pool, reommended - m5.large
instance_count The number of instances to be used within the Kafka ingest pool

settings

Example Settings

Click arrow to expand

{ ...
"settings": {
		"bootstrap_servers": [ "111.222.333.444:9092" ],
     "topics": [ "my_topic" ]
	}
}

Element Description
bootstrap_servers A list of kafka boot strap servers and their ports.
topics The kafka Topic names to be imported.
group_id unused
pool unused