Run the batch ingest job API

A batch job is simply a one-off S3 ingest task. It could be either a single file, a directory of many files, or a regex expression describing many directories/files matching a pattern to ingest. Batch notification (continuous S3 loading based on new files) is defined via the table.

You can submit a simple batch job with the batch job API.

curl -X POST 'https://my-domain.hydrolix.live/config/v1/orgs/{{org_uuid}}/jobs/batch' \
-H 'Authorization: Bearer thebearertoken1234567890abcdefghijklmnopqrstuvwxyz' \
-H 'Content-Type: application/json' \
-d '{
    "name": "My Batch job",
    "description": "A Description of my job",
    "type": "batch_import",
    "settings": {
        "source": {
            "table": "website.events",
            "type": "batch",
            "subtype": "aws s3",
            "transform": "mytransformname",
            "settings": {
                "url": "s3://root-bucket/folder1/"
            }
        }
    }
}'
{
      "name": "myjob",
      "description": "my job",
      "uuid": "888ba890-3ece-403d-9753-4edd754bef61",
      "created": "2021-06-03T12:28:41.967285Z",
      "modified": "2021-06-03T12:28:42.260107Z",
      "settings": {
         "max_active_partitions": 576,
         "max_rows_per_partition": 33554432,
         "max_minutes_per_partition": 15,
         "input_concurrency": 1,
         "input_aggregation": 1536000000,
         "max_files": 0,
         "dry_run": false,
         "regex_filter": "",
         "source": {
            "type": "batch",
            "subtype": "aws s3",
            "transform": "mytransform",
            "table": "website.events",
            "settings": {
               "url": "job10"
            }
         }
      },
      "status": "ready",
      "type": "batch_import",
      "org": "d1234567-1234-1234-abcd-defgh123456",
      "details": {
         "errors": [],
         "job_id": "jobid-1234-abcdejhijklm",
         "duration_ms": 115,
         "status_detail": {
            "tasks": {
               "LIST": {
                  "READY": 1
               }
            },
            "estimated": true,
            "percent_complete": 0
         }
      }
}

The status field in the response should show that the job is ready.

❗️

In case of failure

Ensure your Hydrolix deployment is appropriately scaled: Did you scale your batch-peers? If not, check out our documentation on scaling your deployment.

Getting the status of your job

You can use the job-status API to check if your job is finished. Hydrolix regularly updates this endpoint with information about running jobs.

curl -X POST 'https://{{hostname}}.hydrolix.live/config/v1/orgs/{{org_uuid}}/jobs/batch/{{job_uuid}}/status' \
-H 'Authorization: Bearer thebearertoken1234567890abcdefghijklmnopqrstuvwxyz' \

When the job is complete, you will get a response like the following example with a status of done.

[
{
      "name": "myjob",
      "description": "my job",
      "uuid": "888ba890-3ece-403d-9753-4edd754bef61",
      "created": "2021-07-28T15:16:10.663363Z",
      "modified": "2021-07-28T15:42:56.511741Z",
      "settings": {
         "max_active_partitions": 576,
         "max_rows_per_partition": 20000000,
         "max_minutes_per_partition": 15,
         "input_concurrency": 1,
         "input_aggregation": 1536000000,
         "max_files": 0,
         "dry_run": false,
         "regex_filter": "",
         "source": {
            "type": "batch",
            "subtype": "aws s3",
            "transform": "mytransform",
            "table": "myproject.mytable",
            "settings": {
               "url": "s3://mys3/path/goes/here/"
            }
         }
      },
      "status": "done",
      "type": "batch_import",
      "org": "d1234567-1234-1234-abcd-defgh123456",
      "details": {
         "errors": [],
         "job_id": "jobid-1234-abcdejhijklm",
         "duration_ms": 7194,
         "status_detail": {
            "tasks": {
               "LIST": {
                  "DONE": 1
               },
               "INDEX": {
                  "DONE": 30
               }
            },
            "estimated": false,
            "percent_complete": 1
         }
      }
   }
]

🚧

Canceling a batch job

To cancel a batch job, query the cancel job end-point.

Now it's time to query the data!

📘

Need support?

If you're stuck, reach out to support at [email protected] or via your Slack channel.