Kafka Streaming Ingestion

You can configure a Hydrolix fluid table to continuously ingest data from one or more Kafka-based streaming sources.

Preparing with a transform

As with any kind of Hydrolix data ingestion, you must prepare the target table with a transform schema that maps the incoming Kafka events onto the columns of the target Hydrolix table.

With that done, you can then use either your stack’s Hydrolix API or its web UI to specify the Kafka servers and topics that your table can receive events from.

Setting up a Kafka source through the API

To create, update, or view a table’s Kafka sources through the Hydrolix API, use the /sources/kafka API endpoint. Note that you will need the IDs of your target organization, project, and table in order to address that endpoint’s full path.

Adding Kafka sources to a table involves invoking POST on the sources/kafka endpoint, sending it a JSON document describing the connection between your Hydrolix table and the Kafka data streams.

Configuration properties

The JSON document describing a Kafka-based data source must have the following properties:

Property Purpose
name A name for this data source. Must be unique within the target table’s organization.
type The type of ingestion.
subtype The literal value "kafka".
transform The name of the transform to apply to this ingestion.
table The Hydrolix project and table to ingest into, expressed in the format "PROJECT.TABLE".
settings The settings to use for this particular Kafka source.
pool_name The name that Hydrolix will assign to the ingest pool.
instance_type The type of instance Hydrolix will use within the Kafka ingest pool. (We recommend using m5.large.)
instance_count The number of instances Hydrolix will apply to the Kafka ingest pool.

The settings property contains a JSON object that defines the Kafka servers and topics this table should receive events from.

Element Description
bootstrap_servers An array of Kafka bootstrap server addresses, in "HOST:PORT" format.
topics An array of Kafka topics to import from the given servers.

An example configuration

For example, this JSON document sets up a connection between a pair of Kafka sources running at the domain example.com and the Hydrolix table my-project.my-table.

{
    "name": "my-kafka-ingest",
    "type": "pull",
    "subtype": "kafka",
    "transform": "my_transform",
    "table": "my_project.my_table",
    "settings": {
        "bootstrap_servers": [
            "kafka-1.example.com:9092",
            "kafka-2.example.com:9092"
        ],
        "topics": [ "my_topic" ]
    },
    "pool_name": "my-kafka-pool",
    "instance_type": "m5.large",
    "instance_count": "2"
}

You would then install this streaming configuration by sending it as a POST request to:

https://YOUR-HYDROLIX-HOST.hydrolix.live/orgs/ORG-ID/projects/PROJECT-ID/tables/TABLE-ID/sources/kafka/

Setting up a Kafka source through the UI

To create, view, and manage your stack’s Kafka sources through its web UI, visit https://YOUR-HYDROLIX-HOST.hydrolix.live/data_sources in your web browser.

Using TLS with Client Cert Authentication

Hydrolix supports TLS with client certificate authentication, so the Kafka Peer will connect over TLS using a client certificate to the Kafka server. In order to work you need to deploy the certificate using hdxctl:

hdxctl update hdxcli-xxxx hdx-yyyy --kafka-tls-cert "kafka_cert.pem" --kafka-tls-key "kafka_key.pem" --kafka-tls-ca "kafka_ca.pem"

Hydrolix is expecting 3 different informations :

  • Root CA in PEM format
  • Client Cert in PEM format
  • Private Key in PEM format

Exporting the certificate from Java Keystore

By default Kafka stores all the certificates and key into a jks file, the java keystore. Hydrolix supports certificate information in PEM we need to export the certificates from the java keystore and encode them in PEM.

First step is to list all the certificates present in your keystore:

keytool -list -keystore client.keystore.jks

Enter keystore password:  
Keystore type: PKCS12
Keystore provider: SUN

Your keystore contains 2 entries

caroot, May 5, 2021, trustedCertEntry, 
Certificate fingerprint (SHA-256): A5:87:D0:E4:F6:70:4F:8E:07:2E:EE:56:73:D4:AF:88:DA:D5:8C:9F:67:71:F2:C0:7D:A9:CA:64:2F:F7:04:18
clientcert, May 3, 2021, PrivateKeyEntry, 
Certificate fingerprint (SHA-256): 80:A2:28:7C:D9:1B:A8:48:AB:24:76:CC:5A:19:47:29:12:CF:22:A1:8C:92:6E:E4:C0:30:0A:A0:34:73:F7:55

Then select the certificate you want to export here I’ll export the caroot:

keytool -export -alias caroot -file caroot.crt -keystore client.keystore.jks

Enter keystore password:  
Certificate stored in file <caroot.crt>

Once I have the caroot.crt I’ll use openssl command to transform it into PEM format:

openssl x509 -inform DER -in caroot.crt -out kafka_ca.pem -outform PEM

Next step is to do the same for the clientcert stored in my keystore:

keytool -export -alias clientcert -file clientcert.crt -keystore client.keystore.jks

Enter keystore password:  
Certificate stored in file <clientcert.crt>

Extract the certificate using openssl and store in PEM format:

openssl x509 -inform DER -in clientcert.crt -out kafka_cert.pem -outform PEM

To export the private key requires a bit more steps, first we need to create a new PKCS12 store:

keytool -v -importkeystore -srckeystore client.keystore.jks -srcalias clientcert -destkeystore keystore.p12 -deststoretype PKCS12

Importing keystore client.keystore.jks to keystore.p12...
Enter destination keystore password:  
Re-enter new password: 
Enter source keystore password:  
[Storing keystore.p12]

From the keystore.p12 we’ll use openssl to extract the private key in PEM format and sed to remove extra information:

openssl pkcs12 -in keystore.p12 -nodes -nocerts | sed -ne '/-BEGIN PRIVATE KEY-/,/-END PRIVATE KEY-/p' > kafka_key.pem

Enter Import Password:
MAC verified OK

We now have all required files to support TLS authentication with Hydrolix.

Getting more help

If you need more help using Kafka with Hydrolix, or you’d just like to learn more about this integration, please contact Hydrolix support.