Skip to content

Catch All and Catch Rejects

Use the catch_all Feature⚓︎

The catch_all feature allows you to see the shape of your data before you've applied a transform to it. For example, this a transform with a mapping catchall, followed by three lines of sample data.

POST https://$host/validator
Content-Type: application/json
x-hdx-table: sample_project.sample_table
x-hdx-test: true

{
    "transform":
    {
    "name": "demo",
    "settings": {
        "is_default": true,
        "output_columns": [
            {
                "name": "date",
                "datatype": {
                    "type": "datetime",
                    "format": "2006-01-01",
                    "resolution": "seconds",
                    "primary": true
                }
            },
            {
                "name": "catchall",
                "datatype": {
                    "type": "map",
                    "elements":
                    [
                        {
                            "type": "string"
                        },
                        {
                            "type": "string"
                        }
                    ],
                    "catch_all": true
                }
            }
        ],
        "compression": "none",
        "format_details": {
            "flattening": {
                "active": true,
                "depth": 1,
                "map_flattening_strategy": {
                    "left": "."
                }
            }
        }
    },
    "type": "json"
  },
  "data": [ {"timestamp": "2022-06-13 06:03:05.579 +00:00", "component": "query_executor", "level":"info", "user":"anonymous", "duration": 552, "query": "'select version()'", "message": "comment='' admin_comment=''"},
{"timestamp": "2022-06-13 06:03:06.078 +00:00", "component": "query_executor", "level":"info", "user":"anonymous", "duration": 552, "query": "'select version()'", "message": "comment='' admin_comment=''"},
{"timestamp": "2022-06-13 06:03:06.086 +00:00", "component": "query_executor", "level":"info", "message": "user=anonymous duration_ms=0 comment='' admin_comment='' query='select version()'"}]
}

Any JSON key value in the sample data will be interpreted as a map and put in the catchall column.

"parsed": [
    [
      {
        "name": "date",
        "datatype": "datetime",
        "value": null
      },
      {
        "name": "catchall",
        "datatype": "map",
        "value": {
          "component": "query_executor",
          "duration": "552",
          "level": "info",
          "message": "comment='' admin_comment=''",
          "query": "'select version()'",
          "timestamp": "2022-06-13 06:03:05.579 +00:00",
          "user": "anonymous"
        }
      }
    ]

Use the catch_rejects Feature⚓︎

The catch_rejects attribute adds an extra filter to a transform. Invalid ingested data is added to a discrete column, while valid data is ingested as expected.

For example, assume you have a table with two columns.

Name Type
primary_column DateTime
uint_column Uint32

An example transform for this would look like the following:

[
  {
    "name": "primary_column",
    "datatype": {
      "type": "epoch",
      "primary": true,
      "format": "s"
    }
  },
  {
    "name": "uint_column",
    "datatype": {
      "type": "uint32"
    }
  }
]

If your ingested data contains fields with invalid values, these will be rejected entirely. The result may be a partial success, and return an error message.

{
  "code": 207,
  "message": {
    "success_count": 2,
    "errors": [
      "(column {\"name\":\"uint_column\",\"datatype\":{\"type\":\"uint32\",\"source\":{\"from_input_index\":1,\"from_input_fields\":[\"uint_column\"]}},\"position\":1} data -1 (json.Number)): value of unsigned column is negative -1",
      "(column {\"name\":\"uint_column\",\"datatype\":{\"type\":\"uint32\",\"source\":{\"from_input_index\":1,\"from_input_fields\":[\"uint_column\"]}},\"position\":1} data -2 (json.Number)): value of unsigned column is negative -2"
    ]
  }
}

You can use catch_rejects to isolate the invalid values and ingest the good ones by adding the attribute to the transform. This must be a string>string map.

[
  {
    "name": "primary_column",
    "datatype": {
      "type": "epoch",
      "primary": true,
      "format": "s"
    }
  },
  {
    "name": "rejected_data",
    "datatype": {
      "type": "map",
      "catch_rejects": true,
      "elements": [
        {
          "type": "string"
        },
        {
          "type": "string"
        }
      ]
    }
  },
  {
    "name": "uint_column",
    "datatype": {
      "type": "uint32"
    }
  }
]

Ingest the same data again, and you now have three columns. The rejected_data column shows the invalid data, and the other two columns are populated with good data or null values.

primary_column rejected_data uint_column
2024-12-12 18:10:14 UTC {'uint_column': '-2'} null
2024-12-12 18:10:14 UTC {} 1
2024-12-12 18:10:14 UTC {'uint_column': '-1'} null
2024-12-12 18:10:14 UTC {} 2