AWS Kinesis

Getting Started

Hydrolix is able to load data from AWS Kinesis as a source. This source can be accessed internally of AWS (where Hydrolix is deployed within AWS as well) and externally of AWS (where Hydrolix is deployed externally of AWS).

πŸ‘

The basic steps are:

  1. Create a Project/Table
  2. Create a Transform
  3. Configure the Kinesis Source AWS account.
  4. Configure the Hydrolix Kinesis Service and Scale.

It is assumed that the project, table and transform are all already configured. More information on how to set these up can be found here - Projects & Tables, Write Transforms.


AWS Pre-configuration steps.

In order to load data into Hydrolix from Kinesis you will need the following in your AWS Account:

  1. A DynamoDB table to store checkpoint information
  2. An AWS user/role with access to DynamoDB and Kinesis.
  3. Kinesis ARN and region of your Kinesis stream

Create the DynamoDB table

The DynamoDB can be created with the AWS Console. The Table should be created with the following options.

The following options should be applied:

Option

Value

Table Name

Hydrolix would suggest using your client Id with the string _kinesis_check_point appended e.g. hdxcli-123456_kinesis_check_point

Partition Key

StreamShard

Settings

Select 'Customized'

Table Class

DynamoDB Standard

Read/Write capacity settings

On-Demand

πŸ“˜

Make sure to grab your DynamoDB ARN

Create a User/Role for access

For your Hydrolix cluster to access the Kinesis queue and use the DynamoDB for checkpointing the cluster will need a user/role that can read and write to these services in your account. This can be done within the AWS Console. More information is within AWS's Documentation.

πŸ“˜

Make sure to record your AWS Secret ID and AWS Secret

Record your Kinesis ARN.

Retrieve your Kinesis ARN from the AWS console. You will need this later when you configure the Hydrolix platform.


Configure Access to your Kinesis

The configuration of the Hydrolix platform comes in two parts, firstly adding the Secret ID and Secret Key to the tunables (AWS CloudFormation) or environment variables (Kubernetes) and then the configuration of the Hydrolix Kinesis acquisition service using the Kinesis Sources API.

Kubernetes - add the Access Key.

This can be done either using the hkt command, or the hydrolixcluster.yaml can be edited directly.

hkt example

./hkt hydrolix-cluster --env AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID --env AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY --env EMAIL_PASSWORD=$EMAIL_PASSWORD > hydrolixcluster.yaml

Direct example

The values are added within the env: object. For example

spec:
  admin_email: .....
  ....
  env:
    EMAIL_PASSWORD: Hydrolix_Supplied_Password
    AWS_ACCESS_KEY_ID: AWS_ACCESS_KEY_ID_HERE
    AWS_SECRET_ACCESS_KEY: AWS_ACCESS_SECRET_KEY_HERE
  host: ..........
  ip_allowlist:
  - source: ................

AWS Cloudformation - Enable access to your Kinesis

πŸ“˜

We'll have this done shortly.

We'll provide instructions on how to do this shortly.... We're just working on it!


Configure the Hydrolix Kinesis Service

To create your Kinesis source in Hydrolix. This is done using the API endpoint Create Kinesis Sources within the API.

{
   "name": "kinesissource",
   "pool_name": "kinesispool",
   "k8s_deployment": {
      "cpu": 1,
      "memory": 10Gi,
      "service": "kinesis-peer"
   },
   "type": "pull",
   "subtype": "kinesis",
   "transform": "{{transform_name}}",
   "table": "{{project_name}}.{{table_name}}",
   "settings": {
      "stream_name": "arn:aws:kinesis:us-east-2:1234567890:stream/test-kinesis",
      "region": "us-east-2",
      "checkpointer": {
         "name": "arn:aws:dynamodb:us-east-2:1234567890:table/test-kinesis"
      }
   }
}

The settings for the source are as follows:

Setting

Description

Example

name

Name of your Kinesis Source

myKinesisSource

pool_name

Name for the pool that will service the source

myKinesisPool

k8s_deployment

Object describing the cpu/memory and service for the pool (kinesis-peer)

{
"cpu": 1,
""memory: 1,
"service": "kinesis-peer"
}

type

The method to retrieve stream data. This should be set as pull

pull

subtype

The type of data source. This should be set as kinesis

kinesis

transform

The name of the transform to use in ingesting data

myTransform

table

The project and table to import the data into.

myproject.mytable

settings
stream_name

ARN for the Kinesis stream

settings": {
"stream_name": "arn:aws:kinesis:us-east-2:1234567890:stream/test-kinesis"
}

settings
region

Region for the Kinesis stream

settings": {
"region": "us-east-2",
}

settings
checkpointer
name

ARN for the DynamoDB

settings": {
"checkpointer": {
"name": "arn:aws:kinesis:us-east-2:1234567890:stream/test-kinesis"
}

Scale the Kinesis Service

To scale the service you should edit the hydrolixcluster YAML.

#Either:
 
 kubectl edit hydrolixcluster
 
#or
 
kubectl get hydrolixclusters <NAMESPACE> -o yaml > hydrolix-cluster.yaml
vim hydrolix-cluster.yaml
kubectl apply -f hydrolix-cluster.yaml

Edit the replicas for the number of nodes you wish for Kinesis.

......
  owner: admin
  pools:
  - cpu: "1"
    memory: 10Gi
    name: kinesispool-db9feb26-2a8c-4735-9b8a-6e6b7f207cbe
    replicas: "1"
    service: kinesis-peer
    storage: 10Gi
  region: us-central1
  scale:
......

Did this page help you?