Summary
Endpoint Errors
Summary uses the same components as Streaming, so any Summary errors should also refer to Streaming.
Issue: Summary Data not Created
Check Summary Sources API Values
API: https://docs.hydrolix.io/reference/list-summary-sources
Check:
parent_table
- The name of the parent table in Streaming ingest for the incoming datasubtype
– Must be called “summary”table
– the target “project.tablename”transform
– the name of the transform to use to ingest the tabletype
– must be “pull”service
– Must be “summary-peer”
Check Stream-Peer Logs and Stream-Peer-Turbine Logs
API: System Health
Check
- Datetime format for Primary. This is often the cause of rejection of rows.
- Strings are trying to be stored as a UINT.
- Transform SQL
Review transform and edit accordingly.
https://docs.hydrolix.io/docs/transforms-and-write-schema
https://docs.hydrolix.io/docs/timestamp-data-types
https://docs.hydrolix.io/docs/summary-tables-aggregation
Check the Transform
API: https://docs.hydrolix.io/reference/list-transforms
- Ensure the output columns for the parent table match the incoming data.
- Check the transform SQL outputs the expected values to the summary transform.
Stream Summary Service Components
Accessible via the path: https://<yourhost>.hydrolix.live/ingest/event
Component | Used to |
---|---|
Traefik - Application Load-balancer | Routes requests to appropriate end-points. Requests to the path /ingest/ are routed to stream head components. API and Portal requests are routed via their own paths. |
Stream-Head | Receives HTTP requests and sends them to RedPanda. Completes basic verification on incoming messages including datetime, checks, header information checks, basic transform checks and incoming message format. Manages message size onto the queue, if the message is too big it will split messages into suitable sizes. |
RedPanda | Queues messages received from the stream head. Uses persistent volumes for queues. |
Summary-Peer (Summary-Peer) | Consumes messages off the Redpanda queue. |
Summary-Peer (Indexer) | Two stage parsing of data. Firstly data is parsed using the “Parent” tables transform, including any functions, dictionaries or enrichments. The second stage applies the Summaries transform settings including any enrichments. Sends completed Summary files to Cloud storage (GCS, S3 etc) and updates the Postgres Catalog with partition metadata. |
Catalog | Stores information on the basic storage structure and partitions of the data within Cloud Storage (GCS, S3 etc). Includes a persistent volume within Kubernetes. |
Cloud Storage Bucket | Storage bucket (GCS, S3 etc) containing the “stateful” data required to run the system, including configuration files (/config/), database (/db/) and a copy of the system logs (/logs/). |
UI | User interface / Portal. Is built upon the Turbine-API. |
Turbine-API | REST based API for configuration of the data system. Includes API end-points for creation, deletion, editing of tables and their transforms (schemas). |
Keycloak | Provides authorization and RBAC for access to the Turbine-API and the Portal. Stores metadata and user information with the Catalog DB instance. |
Updated 5 months ago