Rejects
During ingestion, Hydrolix sometimes receives incorrect or malformed data. You can specify a set of criteria for correct data and selectively ignore data that doesn't meet the criteria. We call these pieces of data rejects. Hydrolix stores rejected files and rows in a separate reject directory so you can review which data was rejected and, if necessary, adjust your transform to accommodate and re-ingest the rejected data.
Storage Location
Hydrolix stores reject files in cloud storage at the following location:
{storagebucket}/db/hdx/{project_uuid}/{table_uuid}/unknown/{service type}
The following example shows 3 reject files in a GKE deployed architecture. They represent data rejected from a project with UUID 3470db40-bf27-44f9-bc0b-a0890ba2fea8
and a table with UUID 61d30656-dc13-49eb-ae44-234b37e2b2e4
:
$ gsutil ls gs://hdxcli-gcpprodv/db/hdx/3470db40-bf27-44f9-bc0b-a0890ba2fea8/61d30656-dc13-49eb-ae44-234b37e2b2e4/unknown/stream/
gs://hdxcli-gcpprodv/db/hdx/3470db40-bf27-44f9-bc0b-a0890ba2fea8/61d30656-dc13-49eb-ae44-234b37e2b2e4/unknown/stream/20220912142005-rejects-format-YkLZTZVbHbow.json
gs://hdxcli-gcpprodv/db/hdx/3470db40-bf27-44f9-bc0b-a0890ba2fea8/61d30656-dc13-49eb-ae44-234b37e2b2e4/unknown/stream/20220913045633-rejects-format-KN6bVopaF56O.json
gs://hdxcli-gcpprodv/db/hdx/3470db40-bf27-44f9-bc0b-a0890ba2fea8/61d30656-dc13-49eb-ae44-234b37e2b2e4/unknown/stream/20220913210236-rejects-format-3PaNbHf55l0F.json
To find the UUIDs for your projects and tables, use the Organization Summary endpoint.
To see individual messages with complete pathnames of reject files, set the log_level
of the stream-head
service to a log_level
of info
. You will find these messages in the stream-head
service's logs. Note that this can cause high log output for your cluster, and should only be used temporarily. See more information about Log Levels.
File Format
Hydrolix writes rejects files as JSON objects with the following format:
{
"project_id": "<project_uuid>",
"table_id": "<table_uuid>",
"transform_id": "<transform_id>",
"data": [
"data1": "data1",
"data2": "data2"
],
"reason": "The reason for the failure"
}
The project_id
, table_id
and transform_id
record the UUIDs of the project, table, and transform the data was originally ingested into (before being rejected). When you use the default transform to ingest a piece of data, the transform_id
field won't contain a value.
The data
object contains the malformed data that Hydrolix rejected.
Finally the reason
object contains the reason for the rejection. The following table contains some common failures with some additional details to help understand what failed:
Failure | Reason |
---|---|
"strconv.ParseInt: parsing "text": invalid syntax" | The field has been given a string value when a numeric Int was expected. The error can contain the name of the column. Often it is easiest to search through the data to see what has been supplied and match this in the transform. |
"strconv.ParseUInt: parsing "unknown": invalid syntax" | The field has been given a string value of "unknown" when and UInt was expected. This can happen with custom null values, to enable custom nulls use the nulls attribute within the output columns of your transform. |
"reason": "unexpected EOF" | Often occurs when a message body arrives compressed and the compression isn't handled within the Transform. |
"reason":"event primary too old" | Occurs when an event primary being loaded is beyond the max amount of time that is specified in the tables settings cold_data_max_age_days . |
Updated about 1 month ago