via Kafka

Getting Started

Hydrolix Projects and Tables can continuously ingest data from one or more Kafka - based streaming sources.

Kafka source configuration is completed using the Kafka Sources API or through the Web UI.

👍

The basic steps are:

  1. Create a Project/Table
  2. Create a Transform
  3. Configure the Kafka Source and Scale.

It is assumed that the project, table and transform are all already configured. More information on how to set these up can be found here - Projects & Tables, Write Transforms.


Setting up a Kafka source through the API

To create, update, or view a table's Kafka sources through the Hydrolix API, use the /sources/kafka API endpoint. Note that you will need the IDs of your target organization, project, and table in order to address that endpoint's full path.

Adding Kafka sources to a table involves /sources/kafka API endpoint, sending it a JSON document describing the connection between your Hydrolix table and the Kafka data streams.

For example, the following JSON document set up a connection between a pair of Kafka sources running at the domain example.com and the Hydrolix table my-project.my-table.

Example

{
    "name": "my-kafka-ingest",
    "type": "pull",
    "subtype": "kafka",
    "transform": "my_transform",
    "table": "my_project.my_table",
    "settings": {
        "bootstrap_servers": [
            "kafka-1.example.com:9092",
            "kafka-2.example.com:9092"
        ],
        "topics": [ "my_topic" ]
    },
    "pool_name": "my-kafka-pool",
    "k8s_deployment":{
        "cpu": 1,
        "replicas": 1,
        "service": "kafka-peer"
    }
}

Configuration properties

The JSON document describing a Kafka-based data source requires the following properties:

PropertyPurpose
nameA name for this data source. Must be unique within the target table's organization.
typeThe type of ingestion. Pull only supported at this time.
subtypeThe literal value "kafka".
transformThe name of the transform to apply to this ingestion.
tableThe Hydrolix project and table to ingest into, expressed in the format "PROJECT.TABLE".
settingsThe settings to use for this particular Kafka source.
pool_nameThe name that Hydrolix will assign to the ingest pool.
k8s_deploymentOnly used for Kubernetes deployments, describes the replicas, memory and CPU and service to be used in the Kafka ingest pool.

"k8s_deployment":{ "cpu": 1, "replicas": 1, "service": "kafka-peer" }

The Settingsobject.

The settings property contains a JSON object that defines the Kafka servers and topics this table should receive events from.

ElementDescription
bootstrap_serversAn array of Kafka bootstrap server addresses, in "HOST:PORT" format.
topicsAn array of Kafka topics to import from the given servers.

Setting up a Kafka source through the UI

To create a Kafka source, click the Add New button in the upper right of the UI and select Table Source. You will be redirected to a New Ingest Source form at https://YOUR-HYDROLIX-HOST.hydrolix.live/data/new-ingest-source/.

To manage your stack's Kafka sources through its web UI, visit an individual table at https://YOUR-HYDROLIX-HOST.hydrolix.live/data/tables/<table_id> in your web browser and select a source from the Table Sources widget.


Authenticating Kafka connections with TLS

If your Kafka data source requires a TLS-authenticated connection, you can update your Hydrolix cluster with TLS certificate and key information.

To do this, use a Kubernetes curated secret with the following options:

OptionExpected Value
KAFKA_TLS_CAA TLS certificate authority file, in PEM format.
KAFKA_TLS_CERTA TLS certificate file, in PEM format.|
KAFKA_TLS_KEYA TLS Key file, in PEM format.

For example:

---
apiVersion: v1
kind: Secret
metadata:
  name: curated
  namespace: my_namespace
stringData:
  KAFKA_TLS_CA:  |
    -----BEGIN CERTIFICATE-----
    MIIDXjCCAkagAwIBAgIUSIIjbGQAqEYJxyOsW1Q25VW8HWMwDQYJKoZIhvcNAQEL
    BQAwIzEhMB8GA1UEAxMYbm9tLXZrasdfqweq125ha2FtYWkuY29tMB4XDTIzMDMw
    thisisanexample
    -----END CERTIFICATE-----
  KAFKA_TLS_CERT:|
    -----BEGIN CERTIFICATE-----
    MIIDYzCCAkugAwIBAgIUfixZslukVX6PW/m6EuVST9SACJAwDQYJKoZIhvcNAQEL
    BQAwIjEgMB4GA1UEAxMXbm9tLXZrbXMtaW50LmFrYW1haS5jb20wHhcNMjMwMzAy
    thisisanexample
    -----END CERTIFICATE-----
  KAFKA_TLS_KEY: |
    -----BEGIN PRIVATE KEY-----
    MIIDYzCCAkugAwIBAgIUfixZslukVX6PW/m6EuVST9SACJAwDQYJKoZIhvcNAQEL
    BQAwIjEgMB4GA1UEAxMXbm9tLXZrbXMtaW50LmFrYW1haS5jb20wHhcNMjMwMzAy
    thisisanexample
    TRUNCATED
    -----END PRIVATE KEY-----
type: Opaque

To set the secret:

kubectl apply -f kafka-tls.yaml

Exporting certificates and keys from Kafka's Java keystore

By default, Kafka stores its certificate and key information into a java keystore (.jks) file. Because Hydrolix requires this information as files in PEM format, you must export this information before updating your cluster.

To do this, you must have the keytool and openssl command-line programs installed on your system. Then, complete the following steps.

Exporting your CA and certificate files

  1. List all the certificates present in your keystore:
$ keytool -list -keystore client.keystore.jks

    Enter keystore password:  
    Keystore type: PKCS12
    Keystore provider: SUN

    Your keystore contains 2 entries

    caroot, May 5, 2021, trustedCertEntry, 
    Certificate fingerprint (SHA-256): A5:87:D0:E4:F6:70:4F:8E:07:2E:EE:56:73:D4:AF:88:DA:D5:8C:9F:67:71:F2:C0:7D:A9:CA:64:2F:F7:04:18
    clientcert, May 3, 2021, PrivateKeyEntry, 
    Certificate fingerprint (SHA-256): 80:A2:28:7C:D9:1B:A8:48:AB:24:76:CC:5A:19:47:29:12:CF:22:A1:8C:92:6E:E4:C0:30:0A:A0:34:73:F7:55
  1. Locate the CA certificate file--caroot, in this example--and export it:
$ keytool -export -alias caroot -file caroot.crt -keystore client.keystore.jks

    Enter keystore password:  
    Certificate stored in file <caroot.crt>
  1. Use openssl to transform it into PEM format:
$ openssl x509 -inform DER -in caroot.crt -out kafka_ca.pem -outform PEM
  1. Follow the same steps for your TLS certificate file--clientcert, in this example:
$ keytool -export -alias clientcert -file clientcert.crt -keystore client.keystore.jks

    Enter keystore password:  
    Certificate stored in file <clientcert.crt>

$ openssl x509 -inform DER -in clientcert.crt -out kafka_cert.pem -outform PEM

Exporting your key file

Exporting your key from the Java keystore takes a couple of additional steps.

  1. Use keytool to create a new PKCS12 store:
$ keytool -v -importkeystore -srckeystore client.keystore.jks \
      -srcalias clientcert -destkeystore keystore.p12 -deststoretype PKCS12

    Importing keystore client.keystore.jks to keystore.p12...
    Enter destination keystore password:  
    Re-enter new password: 
    Enter source keystore password:  
    [Storing keystore.p12]
  1. Use openssl to extract the private key in PEM format, and use sed to remove extra information:
$ openssl pkcs12 -in keystore.p12 -nodes -nocerts \
            | sed -ne '/-BEGIN PRIVATE KEY-/,/-END PRIVATE KEY-/p' \
            > kafka_key.pem

    Enter Import Password:
    MAC verified OK

At this point, you should have the three PEM files you need to update your Hydrolix cluster with your Kafka TLS information.


Getting more help

If you need more help using Kafka with Hydrolix, or you'd just like to learn more about this integration, please contact Hydrolix support.