via AWS Kinesis

Getting Started

Hydrolix is able to load data from AWS Kinesis as a source. This source can be accessed internally of AWS (where Hydrolix is deployed within AWS as well) and externally of AWS (where Hydrolix is deployed externally of AWS).


The basic steps are:

  1. Create a Project/Table
  2. Create a Transform
  3. Configure the Kinesis Source AWS account.
  4. Configure the Hydrolix Kinesis Service and Scale.

It is assumed that the project, table and transform are all already configured. More information on how to set these up can be found here - Projects & Tables, Write Transforms.

AWS Pre-configuration steps.

In order to load data into Hydrolix from Kinesis you will need the following in your AWS Account:

  1. A DynamoDB table to store checkpoint information
  2. An AWS user/role with access to DynamoDB and Kinesis.
  3. Kinesis ARN and region of your Kinesis stream

Create the DynamoDB table

The DynamoDB can be created with the AWS Console. The Table should be created with the following options.


The following options should be applied:

Table NameHydrolix would suggest using your client Id with the string _kinesis_check_point appended e.g. hdxcli-123456_kinesis_check_point
Partition KeyStreamShard
SettingsSelect 'Customized'
Table ClassDynamoDB Standard
Read/Write capacity settingsOn-Demand


Make sure to grab your DynamoDB ARN

Create a User/Role for access

For your Hydrolix cluster to access the Kinesis queue and use the DynamoDB for checkpointing the cluster will need a user/role that can read and write to these services in your account. This can be done within the AWS Console. More information is within AWS's Documentation.


Make sure to record your AWS Secret ID and AWS Secret

Record your Kinesis ARN.

Retrieve your Kinesis ARN from the AWS console. You will need this later when you configure the Hydrolix platform.

Configure Access to your Kinesis

The configuration of the Hydrolix platform comes in two parts, firstly adding the Secret ID and Secret Key to the tunables (AWS CloudFormation) or environment variables (Kubernetes) and then the configuration of the Hydrolix Kinesis acquisition service using the Kinesis Sources API.

Kubernetes - add the Access Key.

This can be done either using the hkt command, or the hydrolixcluster.yaml can be edited directly.

hkt example


Direct example

The values are added within the env: object. For example

  admin_email: .....
    EMAIL_PASSWORD: Hydrolix_Supplied_Password
  host: ..........
  - source: ................

AWS Cloudformation - Enable access to your Kinesis


We'll have this done shortly.

We'll provide instructions on how to do this shortly.... We're just working on it!

Configure the Hydrolix Kinesis Service

To create your Kinesis source in Hydrolix. This is done using the API endpoint Create Kinesis Sources within the API.

   "name": "kinesissource",
   "pool_name": "kinesispool",
   "k8s_deployment": {
      "cpu": 1,
      "memory": 10Gi,
      "service": "kinesis-peer"
   "type": "pull",
   "subtype": "kinesis",
   "transform": "{{transform_name}}",
   "table": "{{project_name}}.{{table_name}}",
   "settings": {
      "stream_name": "arn:aws:kinesis:us-east-2:1234567890:stream/test-kinesis",
      "region": "us-east-2",
      "checkpointer": {
         "name": "arn:aws:dynamodb:us-east-2:1234567890:table/test-kinesis"

The settings for the source are as follows:

nameName of your Kinesis SourcemyKinesisSource
pool_nameName for the pool that will service the sourcemyKinesisPool
k8s_deploymentObject describing the cpu/memory and service for the pool (kinesis-peer){
"cpu": 1,
""memory: 1,
"service": "kinesis-peer"
typeThe method to retrieve stream data. This should be set as pullpull
subtypeThe type of data source. This should be set as kinesiskinesis
transformThe name of the transform to use in ingesting datamyTransform
tableThe project and table to import the data into.myproject.mytable
ARN for the Kinesis streamsettings": {
"stream_name": "arn:aws:kinesis:us-east-2:1234567890:stream/test-kinesis"
Region for the Kinesis streamsettings": {
"region": "us-east-2",
ARN for the DynamoDBsettings": {
"checkpointer": {
"name": "arn:aws:kinesis:us-east-2:1234567890:stream/test-kinesis"

Scale the Kinesis Service

To scale the service you should edit the hydrolixcluster YAML.

 kubectl edit hydrolixcluster
kubectl get hydrolixclusters <NAMESPACE> -o yaml > hydrolix-cluster.yaml
vim hydrolix-cluster.yaml
kubectl apply -f hydrolix-cluster.yaml

Edit the replicas for the number of nodes you wish for Kinesis.

  owner: admin
  - cpu: "1"
    memory: 10Gi
    name: kinesispool-db9feb26-2a8c-4735-9b8a-6e6b7f207cbe
    replicas: "1"
    service: kinesis-peer
    storage: 10Gi
  region: us-central1