Streaming
Configure Components
Traefik
Minimum instances: 2
To check the number of instances currently running, run the following command:
kubectl get deployment/traefik -o wide
To change the number of instances, adjust the following section of your Hydrolix cluster configuration:
...
traefik:
replicas: <scale number>
...
Stream Head
Minimum instances: 2, unless you have migrated to Intake Head (below)
To check the number of instances currently running, run the following command:
kubectl get deployment/stream-head -o wide
To change the number of instances, adjust the following section of your Hydrolix cluster configuration:
...
stream-head:
replicas: <scale number>
...
Intake Head
Minimum instances: 0, unless you have moved away from Stream Head (above)
To check the number of instances currently running, run the following command:
kubectl get deployment/intake-head -o wide
To change the number of instances, adjust the following section of your Hydrolix cluster configuration:
...
intake-head:
replicas: <scale number>
...
RedPanda
Minimum instances: 3
To check the number of instances currently running, run the following command:
kubectl get deployment/redpanda -o wide
To change the number of instances, adjust the following section of your Hydrolix cluster configuration:
...
redpanda:
replicas: <scale number>
...
Stream Peer
Minimum instances: 1
To check the number of instances currently running, run the following command:
kubectl get deployment/stream-peer -o wide
To change the number of instances, adjust the following section of your Hydrolix cluster configuration:
...
stream-peer:
replicas: <scale number>
...
Alerts
Stream Head Ingest Rate Alert
The stream head ingest rate has dropped below a set threshold. This means the Stream head isn’t receiving traffic at all or has a reduced amount being sent to it.
Solution
Likely a result of outside influences. Look at the streaming dashboard for more information.
It is recommended to look at the Log Shipper as well to see what logs it has about any failure or drop in traffic.
Stream Peer Ingest Rate Alert
This can be directly affected by the alert above.
However, if this has occurred in isolation of the Stream Head alert and the Kinesis queue depth is increasing view stream logs on the stream-peer.
Solution
Logs locations can be found in the logs section of this document.
Stream Head Error Count Alert
Fires when the Stream Head serves a number of failure HTTP status codes (4xx, 5xx) in response to receiving messages from the data shipper.
The codes and errors that have been sent can be seen within the Streaming dashboard.
Solution
The error cause can be both from unexpected messages sent with an unknown structure, stream API incorrect use or transform errors where an update or change to a transform now means it no longer matches incoming data.
Error codes and failure states can be seen within the streaming dashboard.
Information on how to configure the streaming service is here - https://docs.hydrolix.io/docs/streaming-http
Transform information can be retrieved via the portal or the API - https://docs.hydrolix.io/reference/list-transforms "
Stream Head Rows Received Differs from Rows Written by Indexer
This alert will fire when the number of rows received is significantly different from the rows written by the indexer on the Stream-peer.
Solution
This is often due to data being rejected or having incorrectly applied enrichment.
Stream-Peer logs should be inspected looking for errors.
End-Point Errors
Issue: Client Connection Timeout
Check
Check Traefik service count is < 1:
kubectl get deployment/traefik -o wide
Fix
Scale Traefik via hydrolixcluster.yaml:
traefik:
replicas: <Scale>
For more information about scale profiles, see the documentation: https://docs.hydrolix.io/docs/scale-profiles.
Check
Check IP Allow list has the requesting IP address.
kubectl get hydrolixcluster -o yaml
Fix
Add the IP to the Allow list:
kubectl get hydrolixcluster -o yaml > hydrolixcluster.yaml
Add the IPs:
kubectl apply -f hydrolixcluster.yaml
For more information about IP Allowlists, see the documentation: https://docs.hydrolix.io/docs/gcp-k8s-setting-up-tls#enable-access-to-your-cluster.
Issue: HTTP 429 Errors from Intake Head
Check
Intake Head count:
kubectl get deployment/intake-head -o wide
Fix
Scale intake-head via hydrolixcluster.yaml
:
intake-head:
replicas: <Scale>
For more information about scale profiles, see the Scale Profiles documentation.
Issue: 503 Service Temporarily Unavailable
Check
Stream Head count is at a minimum of 1:
kubectl get deployment/stream-head -o wide
Fix
Scale stream-head via hydrolixcluster.yaml
:
stream-head:
replicas: <Scale>
For more information about scale profiles, see the Scale Profiles documentation.
Issue: Stream Head 4XX
Check
Check the protocol and URL path.
Fix
Path and protocol should be in the format
http\://\<hostname/ingest/event
or
https\://\<hostname/ingest/event? ?table=project_name.table_name&transform=transform_name
Check
Confirm the headers or query string parameters being sent to the stream head are correct.
Fix
When Headers are used to defined data the following should be provided in the HTTP request
Hydrolix Headers:
x-hdx-table
: project.tablex-hdx-transform
: transformName
Content-type (or)
content-type
: application/jsoncontent-type
: text/csv
Or, if you use a query string:
table=project.table
transform=transformName
Note the content type header should be set if CSV or JSON as above.
For more information about the API, see the documentation: http://docs.hydrolix.io/guide/ingest/api.
Check
Confirm compression format matches within the transform and headers.
Fix
Check the transform has the correct compression format and that headers sent by ingesting system are the correct format.
If unsure, set the compression types as None in the transform. The system will infer the compression type based on the headers in the request.
Issue: Ingested Data Does not Appear in Table
Check
Review Stream-head logs, looking for messages of level Error.
Check:
- Datetime format for Primary. This is often the cause of rejection of rows.
- Strings are trying to be stored as a UINT.
- Transform File type CSV/JSON
Fix
Review transform and edit accordingly.
- https://docs.hydrolix.io/docs/transforms-and-write-schema
- https://docs.hydrolix.io/docs/timestamp-data-types
Check
Review Stream-Peer logs and Stream-Peer-Turbine logs
Check:
- Datetime format for Primary. This is often the cause of rejection of rows.
- Strings are trying to be stored as a UINT.
- Transform File type CSV/JSON
- Transform SQL
Fix
Review transform and edit accordingly.
- https://docs.hydrolix.io/docs/transforms-and-write-schema
- https://docs.hydrolix.io/docs/timestamp-data-types
Streaming Service Components
HTTP Streaming ingest follows either of two paths: the newer, simpler Intake-Head, and the older, proven Stream-Head and Stream-Peer pool. The Traefik load balancer directs traffic to either solution, or a mix of the two.
Accessible via the path: https\://<yourhost>.hydrolix.live/ingest/event
Component | Used to |
---|---|
Traefik - Application Load-balancer | Routes requests to appropriate end-points. Requests to the path /ingest/ are routed to stream head components. API and Portal requests are routed via their own paths. |
Intake-Head | Checks that messages conform to basic syntax and message structure rules. When a message passes these checks, it sends the message to a listing queue. When a message fails these checks, it returns an HTTP 400 response to the client. It applies the Hydrolix transform and outputs indexed database partitions to the Hydrolix Database bucket. It also reports created partition metadata to the Catalog. |
Stream-Head | Receives HTTP requests and sends them to RedPanda. Completes basic verification on incoming messages including datetime, checks, header information checks, basic transform checks and incoming message format. Manages message size onto the queue, if the message is too big it will split messages into suitable sizes. |
RedPanda | Queues messages received from the stream head. Uses persistent volumes for queues. |
Stream-Peer (Stream-Peer) | Consumes messages off of the RedPanda queue. |
Stream-Peer (Indexer) | Compresses and indexes the data into the HDX format using the Transform. Real-time Transformation is completed on data. Sends completed files to Cloud storage (GCS, S3 etc) and updates the Postgres Catalog with partition metadata. |
Catalog | Stores information on the basic storage structure and partitions of the data within Cloud Storage (GCS, S3 etc). Includes a persistent volume within Kubernetes. |
Cloud Storage Bucket | Storage bucket (GCS, S3 etc) containing the “stateful” data required to run the system, including configuration files (/config/), database (/db/) and a copy of the system logs (/logs/). |
UI | User interface / Portal. Is built upon the Turbine-API. |
Turbine-API | REST based API for configuration of the data system. Includes API end-points for creation, deletion, editing of tables and their transforms (schemas). |
Keycloak | Provides authorization and RBAC for access to the Turbine-API and the Portal. Stores metadata and user information with the Catalog DB instance. |
Updated about 2 months ago