via AWS Kinesis
Getting Started
Hydrolix is able to load data from AWS Kinesis as a source. This source can be accessed internally of AWS (where Hydrolix is deployed within AWS as well) and externally of AWS (where Hydrolix is deployed externally of AWS).
The basic steps are:
- Create a Project/Table
- Create a Transform
- Configure the Kinesis Source AWS account.
- Configure the Hydrolix Kinesis Service and Scale.
It is assumed that the project, table and transform are all already configured. More information on how to set these up can be found here - Projects & Tables, Write Transforms.
AWS Pre-configuration steps.
In order to load data into Hydrolix from Kinesis you will need the following in your AWS Account:
- A DynamoDB table to store checkpoint information
- An AWS user/role with access to DynamoDB and Kinesis.
- Kinesis ARN and region of your Kinesis stream
Create the DynamoDB table
The DynamoDB can be created with the AWS Console. The Table should be created with the following options.
The following options should be applied:
Option | Value |
---|---|
Table Name | Hydrolix would suggest using your client Id with the string _kinesis_check_point appended e.g. hdxcli-123456_kinesis_check_point |
Partition Key | StreamShard |
Settings | Select 'Customized' |
Table Class | DynamoDB Standard |
Read/Write capacity settings | On-Demand |
Make sure to grab your DynamoDB ARN
Create a User/Role for access
For your Hydrolix cluster to access the Kinesis queue and use the DynamoDB for checkpointing the cluster will need a user/role that can read and write to these services in your account. This can be done within the AWS Console. More information is within AWS's Documentation.
Make sure to record your AWS Secret ID and AWS Secret
Record your Kinesis ARN.
Retrieve your Kinesis ARN from the AWS console. You will need this later when you configure the Hydrolix platform.
Configure Access to your Kinesis
The configuration of the Hydrolix platform comes in two parts, firstly adding the Secret ID
and Secret Key
to the tunables (AWS CloudFormation) or environment variables (Kubernetes) and then the configuration of the Hydrolix Kinesis acquisition service using the Kinesis Sources API.
Kubernetes - add the Access Key.
This can be done either using the hkt command, or the hydrolixcluster.yaml can be edited directly.
hkt
example
./hkt hydrolix-cluster --env AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID --env AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY --env EMAIL_PASSWORD=$EMAIL_PASSWORD > hydrolixcluster.yaml
Direct example
The values are added within the env:
object. For example
spec:
admin_email: .....
....
env:
EMAIL_PASSWORD: Hydrolix_Supplied_Password
AWS_ACCESS_KEY_ID: AWS_ACCESS_KEY_ID_HERE
AWS_SECRET_ACCESS_KEY: AWS_ACCESS_SECRET_KEY_HERE
host: ..........
ip_allowlist:
- source: ................
AWS Cloudformation - Enable access to your Kinesis
We'll have this done shortly.
We'll provide instructions on how to do this shortly.... We're just working on it!
Configure the Hydrolix Kinesis Service
To create your Kinesis source in Hydrolix. This is done using the API endpoint Create Kinesis Sources within the API.
{
"name": "kinesissource",
"pool_name": "kinesispool",
"k8s_deployment": {
"cpu": 1,
"memory": 10Gi,
"service": "kinesis-peer"
},
"type": "pull",
"subtype": "kinesis",
"transform": "{{transform_name}}",
"table": "{{project_name}}.{{table_name}}",
"settings": {
"stream_name": "arn:aws:kinesis:us-east-2:1234567890:stream/test-kinesis",
"region": "us-east-2",
"checkpointer": {
"name": "arn:aws:dynamodb:us-east-2:1234567890:table/test-kinesis"
}
}
}
The settings for the source are as follows:
Setting | Description | Example |
---|---|---|
name | Name of your Kinesis Source | myKinesisSource |
pool_name | Name for the pool that will service the source | myKinesisPool |
k8s_deployment | Object describing the cpu/memory and service for the pool (kinesis-peer) | { "cpu": 1, ""memory: 1, "service": "kinesis-peer" } |
type | The method to retrieve stream data. This should be set as pull | pull |
subtype | The type of data source. This should be set as kinesis | kinesis |
transform | The name of the transform to use in ingesting data | myTransform |
table | The project and table to import the data into. | myproject.mytable |
settings stream_name | ARN for the Kinesis stream | settings": { "stream_name": "arn:aws:kinesis:us-east-2:1234567890:stream/test-kinesis" } |
settings region | Region for the Kinesis stream | settings": { "region": "us-east-2", } |
settings checkpointer name | ARN for the DynamoDB | settings": { "checkpointer": { "name": "arn:aws:kinesis:us-east-2:1234567890:stream/test-kinesis" } |
Scale the Kinesis Service
To scale the service you should edit the hydrolixcluster YAML.
#Either:
kubectl edit hydrolixcluster
#or
kubectl get hydrolixclusters <NAMESPACE> -o yaml > hydrolix-cluster.yaml
vim hydrolix-cluster.yaml
kubectl apply -f hydrolix-cluster.yaml
Edit the replicas for the number of nodes you wish for Kinesis.
......
owner: admin
pools:
- cpu: "1"
memory: 10Gi
name: kinesispool-db9feb26-2a8c-4735-9b8a-6e6b7f207cbe
replicas: "1"
service: kinesis-peer
storage: 10Gi
region: us-central1
scale:
......
Updated 5 months ago