Skip to content

Kafka

Hydrolix Projects and Tables can continuously ingest data from one or more Kafka streaming sources.

Configure a Kafka table source using the Config API KafkaSource model with the Hydrolix UI.

Basic steps⚓︎

  1. Create a Project/Table.
  2. Create a Transform.
  3. Configure the Kafka table source and scale.

Before you begin⚓︎

These instructions assume that the project, table, and transform are already configured.

Collect the following information:

  • Kafka service information: hostname and port pairs and one or more topics
  • the project and table names
  • (if using the API) the organization, project, and table UUIDs

Decide on the following names:

  • the Kafka data source
  • the Kubernetes pool running the Kafka clients inside the Hydrolix cluster

Configure using the API⚓︎

Create a Kafka source associated with a table using Create Kafka sources endpoint.

Construct a JSON configuration that describes the Kafka servers and topics from which to ingest data.

Configuration properties⚓︎

A Kafka table source object provides configuration settings to a Kafka client running in a Hydrolix cluster.

Configuration requires the following properties:

Property Purpose
name A name for this table source. Must be unique within the target table's organization.
type The type of ingestion. Pull only supported at this time.
subtype The literal value "kafka".
transform The name of the transform to apply to this ingestion.
table The Hydrolix project and table to ingest into, expressed in the format "PROJECT.TABLE".
settings The settings to use for this particular Kafka source.
pool_name The name that Hydrolix will assign to the ingest pool.
k8s_deployment Kafka ingest pool details for Kubernetes replicas, memory, CPU, and service assignments.

"k8s_deployment":{"cpu": 1, "replicas": 1, "service": "kafka-peer" }

The Settingsobject⚓︎

The settings property contains a JSON object that defines the Kafka servers and topics this table should receive events from.

Element Description
bootstrap_servers Array of Kafka bootstrap server addresses, in "HOST:PORT" format
credential Name of an authentication credential, mutually exclusive with credential_id
credential_id UUID of an authentication credential, mutually exclusive with credential name
topics Array of Kafka topics to import from the given servers

When connecting to a Kafka server, client certificate authentication and SASL/PLAIN can be used together or independently. Match the configuration of the Hydrolix Kafka client to the server configuration.

Configuration example⚓︎

This examples demonstrates a connection between a pair of Kafka sources running in the domain example.com which provide the data source for the Hydrolix table my-project.my-table.

{
    "name": "my-kafka-ingest",
    "type": "pull",
    "subtype": "kafka",
    "transform": "my_transform",
    "table": "my_project.my_table",
    "settings": {
        "bootstrap_servers": [
            "kafka-1.example.com:9092",
            "kafka-2.example.com:9092"
        ],
        "topics": [ "my_topic" ]
    },
    "pool_name": "my-kafka-pool",
    "k8s_deployment":{
        "cpu": 1,
        "replicas": 1,
        "service": "kafka-peer"
    }
}

Configure using the UI⚓︎

Before creating a Kafka table source, create a transform for the table.

Create a Kafka source⚓︎

  1. Click the Add New button in the upper right of the UI.
  2. Select Table Source. You will be redirected to the New Ingest Source form.
  3. Choose a table to receive the data in the Select Table drop down.
  4. Select Kafka as the Source Type.
  5. Fill in the fields describing the Kafka server from which Hydrolix should source data.
    1. Enter a Name for this Kafka table source.
    2. Enter at least one name in Bootstrap servers.
    3. Enter at least one name in Topics.
    4. Choose any existing transform in the Select transform drop down.

Update a Kafka source⚓︎

  1. In the Hydrolix UI, navigate to Data from the left menu.
  2. Select the Tables tab.
  3. Find and select your table in the project and table listing.
  4. Underneath the Table sources configuration, click the line of the Kafka table source you'd like to modify.
  5. Update settings in the Edit ingest source page.

Scaling⚓︎

To adjust the Kubernetes replicas, memory, and CPU for the automatically created kafka-peer pool, see update resource pools.

Authenticating to Kafka servers⚓︎

Authenticate using SASL/PLAIN⚓︎

Some Kafka servers require clients to authenticate at application layer using SASL/PLAIN, a Simple Authentication and Security Layer (SASL) mechanism.

To authenticate to a Kafka server using SASL/PLAIN:

  1. Create a credential of type kafka_sasl_plain
  2. Specify that credential ID in the Kafka table source settings.credential_id attribute

Use the API to set credential_id

Use the create Kafka sources endpoint to specify a kafka_sasl_plain credential ID. If the table source exists already, use PATCH Kafka source. The UI doesn't support the credential_id field on Kafka table source endpoints.

Authenticate using a client certificate⚓︎

Some Kafka servers support mutual TLS and require the client to present an identifying certificate for authentication at the network transport layer. Often, a private certification authority (CA) is in use, as well.

  • Mutual TLS authentication: Install the certificate and corresponding key in Kubernetes Secret keys named KAFKA_TLS_CERT and KAFKA_TLS_KEY. This certificate identifies the client.
  • Publicly-signed server certificate: The Kafka server in this case is using a publicly trusted certificate. The client uses public certificate trust stores to verify the server. No additional configuration is necessary.
  • Privately-signed server certificate: The Kafka server in this case is using a certificate issued from a private CA. Install the trusted, private root CA certificate in a Kubernetes Secret key named KAFKA_TLS_CA. The Kafka client checks that the server cert matches this private CA cert.
Private CAs and client certificates

The combination of the private key and the certificate identify the client to the server. These can be issued by a private Certification Authority (CA).

When connecting to servers using certificates signed by private CAs, you must configure the private, trusted root certificate for TLS connections to succeed.

Frequently, a Kafka installation will use a private CA for the server certificate and client certificates issued to known applications.

Use a Kubernetes curated secret with the following options:

Option Expected Value
KAFKA_TLS_CA A TLS certificate authority file, in PEM format.
KAFKA_TLS_CERT A TLS certificate file, in PEM format.
KAFKA_TLS_KEY A TLS Key file, in PEM format.

For example:

---
apiVersion: v1
kind: Secret
metadata:
  name: curated
  namespace: my_namespace
stringData:
  KAFKA_TLS_CA:  |
    -----BEGIN CERTIFICATE-----
    MIIDXjCCAkagAwIBAgIUSIIjbGQAqEYJxyOsW1Q25VW8HWMwDQYJKoZIhvcNAQEL
    BQAwIzEhMB8GA1UEAxMYbm9tLXZrasdfqweq125ha2FtYWkuY29tMB4XDTIzMDMw
    thisisanexample
    -----END CERTIFICATE-----
  KAFKA_TLS_CERT:|
    -----BEGIN CERTIFICATE-----
    MIIDYzCCAkugAwIBAgIUfixZslukVX6PW/m6EuVST9SACJAwDQYJKoZIhvcNAQEL
    BQAwIjEgMB4GA1UEAxMXbm9tLXZrbXMtaW50LmFrYW1haS5jb20wHhcNMjMwMzAy
    thisisanexample
    -----END CERTIFICATE-----
  KAFKA_TLS_KEY: |
    -----BEGIN PRIVATE KEY-----
    MIIDYzCCAkugAwIBAgIUfixZslukVX6PW/m6EuVST9SACJAwDQYJKoZIhvcNAQEL
    BQAwIjEgMB4GA1UEAxMXbm9tLXZrbXMtaW50LmFrYW1haS5jb20wHhcNMjMwMzAy
    thisisanexample
    TRUNCATED
    -----END PRIVATE KEY-----
type: Opaque

To set the secret:

kubectl apply -f kafka-tls.yaml

Convert Kafka Java keystore certificates to PEM format⚓︎

By default, Kafka stores its certificate and key information in a Java KeyStore (.jks) file.

Hydrolix requires the files in Privacy Enhanced Mail (PEM) format.

Use keytool to export the key and certificate files.

Use openssl to convert to the PEM format.

How-to export certificate files⚓︎

  1. List certificates present in your keystore:

    $ keytool -list -keystore client.keystore.jks
    
        Enter keystore password:
        Keystore type: PKCS12
        Keystore provider: SUN
    
        Your keystore contains 2 entries
    
        caroot, May 5, 2021, trustedCertEntry, 
        Certificate fingerprint (SHA-256): A5:87:D0:E4:F6:70:4F:8E:07:2E:EE:56:73:D4:AF:88:DA:D5:8C:9F:67:71:F2:C0:7D:A9:CA:64:2F:F7:04:18
        clientcert, May 3, 2021, PrivateKeyEntry, 
        Certificate fingerprint (SHA-256): 80:A2:28:7C:D9:1B:A8:48:AB:24:76:CC:5A:19:47:29:12:CF:22:A1:8C:92:6E:E4:C0:30:0A:A0:34:73:F7:55
    
  2. Locate the CA certificate file, which is caroot in this example, and export it:

    1
    2
    3
    4
    $ keytool -export -alias caroot -file caroot.crt -keystore client.keystore.jks
    
        Enter keystore password:
        Certificate stored in file <caroot.crt>
    
  3. Use openssl to transform it into PEM format:

    openssl x509 -inform DER -in caroot.crt -out kafka_ca.pem -outform PEM
    
  4. Follow the same steps for your TLS certificate file clientcert:

    1
    2
    3
    4
    5
    6
    $ keytool -export -alias clientcert -file clientcert.crt -keystore client.keystore.jks
    
        Enter keystore password:
        Certificate stored in file <clientcert.crt>
    
    $ openssl x509 -inform DER -in clientcert.crt -out kafka_cert.pem -outform PEM
    

How-to export key files⚓︎

Exporting a key from the Java keystore requires a different set of commands.

  1. Use keytool to create a new PKCS12 store:

    1
    2
    3
    4
    5
    6
    7
    8
    $ keytool -v -importkeystore -srckeystore client.keystore.jks \
          -srcalias clientcert -destkeystore keystore.p12 -deststoretype PKCS12
    
        Importing keystore client.keystore.jks to keystore.p12...
        Enter destination keystore password:
        Re-enter new password: 
        Enter source keystore password:
        [Storing keystore.p12]
    
  2. Use openssl to extract the private key in PEM format, and use sed to remove extra information:

    1
    2
    3
    4
    5
    6
    $ openssl pkcs12 -in keystore.p12 -nodes -nocerts \
                | sed -ne '/-BEGIN PRIVATE KEY-/,/-END PRIVATE KEY-/p' \
                > kafka_key.pem
    
        Enter Import Password:
        MAC verified OK
    

At this point, you should have the three PEM files you need to update your Hydrolix cluster with your Kafka TLS information.