Run the batch ingest job API
A batch job is simply a one-off S3 ingest task. It could be either a single file, a directory of many files, or a regex expression describing many directories/files matching a pattern to ingest. Batch notification (continuous S3 loading based on new files) is defined via the table.
You can submit a simple batch job with the batch job API.
curl -X POST 'https://my-domain.hydrolix.live/config/v1/orgs/{{org_uuid}}/jobs/batch' \
-H 'Authorization: Bearer thebearertoken1234567890abcdefghijklmnopqrstuvwxyz' \
-H 'Content-Type: application/json' \
-d '{
"name": "My Batch job",
"description": "A Description of my job",
"type": "batch_import",
"settings": {
"source": {
"table": "website.events",
"type": "batch",
"subtype": "aws s3",
"transform": "mytransformname",
"settings": {
"url": "s3://root-bucket/folder1/"
}
}
}
}'
{
"name": "myjob",
"description": "my job",
"uuid": "888ba890-3ece-403d-9753-4edd754bef61",
"created": "2021-06-03T12:28:41.967285Z",
"modified": "2021-06-03T12:28:42.260107Z",
"settings": {
"max_active_partitions": 576,
"max_rows_per_partition": 33554432,
"max_minutes_per_partition": 15,
"input_concurrency": 1,
"input_aggregation": 1536000000,
"max_files": 0,
"dry_run": false,
"regex_filter": "",
"source": {
"type": "batch",
"subtype": "aws s3",
"transform": "mytransform",
"table": "website.events",
"settings": {
"url": "job10"
}
}
},
"status": "ready",
"type": "batch_import",
"org": "d1234567-1234-1234-abcd-defgh123456",
"details": {
"errors": [],
"job_id": "jobid-1234-abcdejhijklm",
"duration_ms": 115,
"status_detail": {
"tasks": {
"LIST": {
"READY": 1
}
},
"estimated": true,
"percent_complete": 0
}
}
}
The status field in the response should show that the job is ready.
In case of failure
Ensure your Hydrolix deployment is appropriately scaled: Did you scale your
batch-peers
? If not, check out our documentation on scaling your deployment.
Getting the status of your job
You can use the job-status API to check if your job is finished. Hydrolix regularly updates this endpoint with information about running jobs.
curl -X POST 'https://{{hostname}}.hydrolix.live/config/v1/orgs/{{org_uuid}}/jobs/batch/{{job_uuid}}/status' \
-H 'Authorization: Bearer thebearertoken1234567890abcdefghijklmnopqrstuvwxyz' \
When the job is complete, you will get a response like the following example with a status
of done
.
[
{
"name": "myjob",
"description": "my job",
"uuid": "888ba890-3ece-403d-9753-4edd754bef61",
"created": "2021-07-28T15:16:10.663363Z",
"modified": "2021-07-28T15:42:56.511741Z",
"settings": {
"max_active_partitions": 576,
"max_rows_per_partition": 20000000,
"max_minutes_per_partition": 15,
"input_concurrency": 1,
"input_aggregation": 1536000000,
"max_files": 0,
"dry_run": false,
"regex_filter": "",
"source": {
"type": "batch",
"subtype": "aws s3",
"transform": "mytransform",
"table": "myproject.mytable",
"settings": {
"url": "s3://mys3/path/goes/here/"
}
}
},
"status": "done",
"type": "batch_import",
"org": "d1234567-1234-1234-abcd-defgh123456",
"details": {
"errors": [],
"job_id": "jobid-1234-abcdejhijklm",
"duration_ms": 7194,
"status_detail": {
"tasks": {
"LIST": {
"DONE": 1
},
"INDEX": {
"DONE": 30
}
},
"estimated": false,
"percent_complete": 1
}
}
}
]
Canceling a batch job
To cancel a batch job, query the cancel job end-point.
Now it's time to query the data!
Need support?
If you're stuck, reach out to support at [email protected] or via your Slack channel.
Updated 3 months ago