Skip to content

Kafka

Get started⚓︎

Hydrolix Projects and Tables can continuously ingest data from one or more Kafka - based streaming sources.

Kafka source configuration is completed using the Kafka Sources API or through the Web UI.

👍 The basic steps are:

  1. Create a Project/Table
  2. Create a Transform
  3. Configure the Kafka Source and Scale.

These instructions assume that the project, table and transform are all already configured. More information on how to set these up can be found here - Projects & Tables, Write Transforms.


Set up a Kafka source through the API⚓︎

To create, update, or view a table's Kafka sources through the Hydrolix API, use the /sources/kafka API endpoint. Note that you will need the IDs of your target organization, project, and table in order to address that endpoint's full path.

Adding Kafka sources to a table involves /sources/kafka API endpoint, sending it a JSON document describing the connection between your Hydrolix table and the Kafka data streams.

For example, the following JSON document set up a connection between a pair of Kafka sources running at the domain example.com and the Hydrolix table my-project.my-table.

Example⚓︎

{
    "name": "my-kafka-ingest",
    "type": "pull",
    "subtype": "kafka",
    "transform": "my_transform",
    "table": "my_project.my_table",
    "settings": {
        "bootstrap_servers": [
            "kafka-1.example.com:9092",
            "kafka-2.example.com:9092"
        ],
        "topics": [ "my_topic" ]
    },
    "pool_name": "my-kafka-pool",
    "k8s_deployment":{
        "cpu": 1,
        "replicas": 1,
        "service": "kafka-peer"
    }
}

Configuration properties⚓︎

The JSON document describing a Kafka-based data source requires the following properties:

Property Purpose
name A name for this data source. Must be unique within the target table's organization.
type The type of ingestion. Pull only supported at this time.
subtype The literal value "kafka".
transform The name of the transform to apply to this ingestion.
table The Hydrolix project and table to ingest into, expressed in the format "PROJECT.TABLE".
settings The settings to use for this particular Kafka source.
pool_name The name that Hydrolix will assign to the ingest pool.
k8s_deployment Only used for Kubernetes deployments, describes the replicas, memory and CPU and service to be used in the Kafka ingest pool.

"k8s_deployment":{<br> "cpu": 1,<br> "replicas": 1,<br> "service": "kafka-peer"<br> }

The Settingsobject⚓︎

The settings property contains a JSON object that defines the Kafka servers and topics this table should receive events from.

Element Description
bootstrap_servers Array of Kafka bootstrap server addresses, in "HOST:PORT" format
credential Name of an authentication credential, mutually exclusive with credential_id
credential_id UUID of an authentication credential, mutually exclusive with credential name
topics Array of Kafka topics to import from the given servers

When authenticating with a Kafka server, client certifcate authentication and SASL/PLAIN can be used together or independently. Match the configuration of the Hydrolix Kafka client to the server configuration.


Set up a Kafka source through the UI⚓︎

To create a Kafka source, click the Add New button in the upper right of the UI and select Table Source. You will be redirected to a New Ingest Source form at https://YOUR-HYDROLIX-HOST.hydrolix.live/data/new-ingest-source/.

To manage your stack's Kafka sources through its web UI, visit an individual table at https://YOUR-HYDROLIX-HOST.hydrolix.live/data/tables/<table_id> in your web browser and select a source from the Table Sources widget.

Detailed view of Table sources in a Tables page in the Hydrolix UI


Authenticate using SASL/PLAIN⚓︎

Some Kafka servers require clients to authenticate at application layer using SASL/PLAIN, a Simple Authentication and Security Layer (SASL) mechanism.

To authenticate to a Kafka server using SASL/PLAIN:

  1. Create a credential of type kafka_sasl_plain
  2. Specify that credential ID in the Kafka data source settings.credential_id attribute

Use the API to create the Kafka data source and specify the credential_id.

As of v5.9, prefer the API

The Hydrolix UI lacks support for the credential_id to communicate the kafka_sasl_plain credential ID. If the data source exists already, use PATCH Kafka data source. The API supports the credential_id field on all Kafka data source endpoints.

Authenticate over TLS to Kafka server⚓︎

Some Kafka servers support mutual TLS and require the client to present an identifying certificate for authentication at the network transport layer. Often, a private certification authority (CA) is in use, as well.

  • Mutual TLS authentication: Install the certificate and corresponding key in Kubernetes Secret keys named KAFKA_TLS_CERT and KAFKA_TLS_KEY. This certificate identifies the client.
  • Publicly-signed server certificate: The Kafka server in this case is using a publicly trusted certificate. The client uses public certificate trust stores to verify the server. No additional configuration is necessary.
  • Privately-signed server certificate: The Kafka server in this case is using a certificate issued from a private CA. Install the trusted, private root CA certificate in a Kubernetes Secret key named KAFKA_TLS_CA. The Kafka client checks that the server cert matches this private CA cert.
Private CAs and client certificates

The combination of the private key and the certificate identify the client to the server. These can be issued by a private Certification Authority (CA).

When connecting to servers using certificates signed by private CAs, you must configure the private, trusted root certificate for TLS connections to succeed.

Frequently, a Kafka installation will use a private CA for the server certificate and client certificates issued to known applications.

Use a Kubernetes curated secret with the following options:

Option Expected Value
KAFKA_TLS_CA A TLS certificate authority file, in PEM format.
KAFKA_TLS_CERT A TLS certificate file, in PEM format.
KAFKA_TLS_KEY A TLS Key file, in PEM format.

For example:

---
apiVersion: v1
kind: Secret
metadata:
  name: curated
  namespace: my_namespace
stringData:
  KAFKA_TLS_CA:  |
    -----BEGIN CERTIFICATE-----
    MIIDXjCCAkagAwIBAgIUSIIjbGQAqEYJxyOsW1Q25VW8HWMwDQYJKoZIhvcNAQEL
    BQAwIzEhMB8GA1UEAxMYbm9tLXZrasdfqweq125ha2FtYWkuY29tMB4XDTIzMDMw
    thisisanexample
    -----END CERTIFICATE-----
  KAFKA_TLS_CERT:|
    -----BEGIN CERTIFICATE-----
    MIIDYzCCAkugAwIBAgIUfixZslukVX6PW/m6EuVST9SACJAwDQYJKoZIhvcNAQEL
    BQAwIjEgMB4GA1UEAxMXbm9tLXZrbXMtaW50LmFrYW1haS5jb20wHhcNMjMwMzAy
    thisisanexample
    -----END CERTIFICATE-----
  KAFKA_TLS_KEY: |
    -----BEGIN PRIVATE KEY-----
    MIIDYzCCAkugAwIBAgIUfixZslukVX6PW/m6EuVST9SACJAwDQYJKoZIhvcNAQEL
    BQAwIjEgMB4GA1UEAxMXbm9tLXZrbXMtaW50LmFrYW1haS5jb20wHhcNMjMwMzAy
    thisisanexample
    TRUNCATED
    -----END PRIVATE KEY-----
type: Opaque

To set the secret:

kubectl apply -f kafka-tls.yaml

Convert Kafka Java keystore certificates to PEM format⚓︎

By default, Kafka stores its certificate and key information in a Java KeyStore (.jks) file. Hydrolix requires the files in Privacy Enhanced Mail (PEM) format.

Use keytool to export the key and certificate files.

Use openssl to convert to the PEM format.

How-to export certificate files⚓︎

  1. List certificates present in your keystore:

       $ keytool -list -keystore client.keystore.jks
    
           Enter keystore password:
           Keystore type: PKCS12
           Keystore provider: SUN
    
           Your keystore contains 2 entries
    
           caroot, May 5, 2021, trustedCertEntry, 
           Certificate fingerprint (SHA-256): A5:87:D0:E4:F6:70:4F:8E:07:2E:EE:56:73:D4:AF:88:DA:D5:8C:9F:67:71:F2:C0:7D:A9:CA:64:2F:F7:04:18
           clientcert, May 3, 2021, PrivateKeyEntry, 
           Certificate fingerprint (SHA-256): 80:A2:28:7C:D9:1B:A8:48:AB:24:76:CC:5A:19:47:29:12:CF:22:A1:8C:92:6E:E4:C0:30:0A:A0:34:73:F7:55
    
  2. Locate the CA certificate file, which is caroot in this example, and export it:

    1
    2
    3
    4
       $ keytool -export -alias caroot -file caroot.crt -keystore client.keystore.jks
    
           Enter keystore password:
           Certificate stored in file <caroot.crt>
    
  3. Use openssl to transform it into PEM format:

       openssl x509 -inform DER -in caroot.crt -out kafka_ca.pem -outform PEM
    
  4. Follow the same steps for your TLS certificate file clientcert:

    1
    2
    3
    4
    5
    6
       $ keytool -export -alias clientcert -file clientcert.crt -keystore client.keystore.jks
    
           Enter keystore password:
           Certificate stored in file <clientcert.crt>
    
       $ openssl x509 -inform DER -in clientcert.crt -out kafka_cert.pem -outform PEM
    

How-to export key files⚓︎

Exporting a key from the Java keystore requires a different set of commands.

  1. Use keytool to create a new PKCS12 store:

    1
    2
    3
    4
    5
    6
    7
    8
       $ keytool -v -importkeystore -srckeystore client.keystore.jks \
             -srcalias clientcert -destkeystore keystore.p12 -deststoretype PKCS12
    
           Importing keystore client.keystore.jks to keystore.p12...
           Enter destination keystore password:
           Re-enter new password: 
           Enter source keystore password:
           [Storing keystore.p12]
    
  2. Use openssl to extract the private key in PEM format, and use sed to remove extra information:

    1
    2
    3
    4
    5
    6
       $ openssl pkcs12 -in keystore.p12 -nodes -nocerts \
                   | sed -ne '/-BEGIN PRIVATE KEY-/,/-END PRIVATE KEY-/p' \
                   > kafka_key.pem
    
           Enter Import Password:
           MAC verified OK
    

At this point, you should have the three PEM files you need to update your Hydrolix cluster with your Kafka TLS information.


Scaling⚓︎

To scale the Kubernetes deployment for the ingest pool used by this source, including the number of replicas, memory, and CPU, see the scaling resource pools page.