Skip to content

Overview

Hydrolix is an append-only system, but data may need to be changed or removed for compliance or other reasons.

The Alter Service executes ALTER TABLE queries on a set time range of data. The query can contain either an UPDATE or DELETE statement.

Altering data is a long-running task performed as an asynchronous background task by the Alter Service. The response to an alter query request is an alter job job_id. The alter job API enables checking progress status, verifying changes, and committing the result. The altered data partitions aren't visible to end users until after commit.

Create an alter job⚓︎

  1. Create the alter job
    • API: Post an ALTER TABLE query using any query interface. A job_id is returned in response.
    • UI: Enter the alter job SQL at https://hostname.hydrolix.live/jobs/alter-jobs/create/ and select Start job.
  2. Check the status of the alter job using the job status API endpoint or in the UI at https://hostname.hydrolix.live/jobs/alter-jobs/. The status will be one of the following:

    • ready - the job is starting and the list of partitions to work on is being calculated
    • running - alter-peers are working on a partition
    • failed - see error message for more detail
    • pending - all partitions have been altered, awaiting either commit or cancel

    See Alter job status for more information.

  3. Commit the alter job once its status reaches pending. Once the status reaches done, the altered data is visible to end users.

ALTER TABLE syntax⚓︎

To update or delete specific columnar data or rows, send an ALTER TABLE query statement that includes the target project.table followed by either an UPDATE or DELETE command. A time range WHERE clause must be provided. Any number of additional predicates can also be included.

ALTER TABLE UPDATE
1
2
3
4
ALTER TABLE my_project.my_table
UPDATE column_name = 'value'
WHERE timestamp > '0000-00-00 00:00:00' AND timestamp < '0000-00-00 00:00:00'
AND ...
ALTER TABLE DELETE
1
2
3
4
ALTER TABLE my_project.my_table
DELETE
WHERE timestamp > '0000-00-00 00:00:00' AND timestamp < '0000-00-00 00:00:00'
AND ...

The query updates or deletes rows for which the WHERE clause returns true.

Using predicates that scope the alter statement to a smaller subset of data speeds up alter job time. An alter job only opens partitions that match all specified conditions, which can significantly reduce job duration when the query targets a subset of data within the time range.

Alter syntax

Unlike similar queries in OLTP databases, ALTER TABLE is a heavy operation not designed for frequent, ongoing use.

UPDATE statements⚓︎

UPDATE Example
1
2
3
4
ALTER TABLE my_project.my_table
UPDATE 'client_ip' = 'secret'
WHERE timestamp > '2021-01-01 00:00:00' AND timestamp < '2022-01-01 00:00:00'
AND isNotNull(client_ip)

This query finds all rows where the timestamp is between 2021-01-01 00:00:00 and 2022-01-01 00:00:00 and the client_ip isn't null, then updates the client_ip column to 'secret' in the table my_table. The following tables show how the alter statement affects example data:

timestamp client_ip method path status
2020-06-15 08:30:00 192.168.1.10 GET /index.html 200
2021-03-10 14:22:31 10.0.0.45 POST /api/login 200
2021-07-04 09:15:00 NULL GET /favicon.ico 304
2021-09-22 18:45:12 172.16.0.99 PUT /api/user/42 200
2021-12-31 23:59:59 203.0.113.7 DELETE /api/session 204
2022-02-14 11:00:00 198.51.100.3 GET /dashboard 200
2023-01-01 00:00:01 10.10.10.10 GET /health 200
timestamp client_ip method path status
2020-06-15 08:30:00 192.168.1.10 GET /index.html 200
2021-03-10 14:22:31 secret POST /api/login 200
2021-07-04 09:15:00 NULL GET /favicon.ico 304
2021-09-22 18:45:12 secret PUT /api/user/42 200
2021-12-31 23:59:59 secret DELETE /api/session 204
2022-02-14 11:00:00 198.51.100.3 GET /dashboard 200
2023-01-01 00:00:01 10.10.10.10 GET /health 200

DELETE statements⚓︎

DELETE Example
1
2
3
4
ALTER TABLE my_project.my_table
DELETE
WHERE timestamp > '2021-01-01 00:00:00' AND timestamp < '2022-01-01 00:00:00'
AND customer_id = 'f4957a96-2dac-11ed-a261-0242ac120002'

This query deletes all rows in the specified time range that match the customer_id predicate.

timestamp customer_id event url status
2020-11-03 16:42:00 f4957a96-2dac-11ed-a261-0242ac120002 page_view /pricing 200
2021-02-14 09:10:33 f4957a96-2dac-11ed-a261-0242ac120002 login /auth/login 200
2021-05-22 11:08:47 0467d8b8-99d6-4eea-92bb-bc45adb05dc4 page_view /docs 200
2021-07-04 14:55:01 f4957a96-2dac-11ed-a261-0242ac120002 api_call /api/v1/data 200
timestamp customer_id event url status
2020-11-03 16:42:00 f4957a96-2dac-11ed-a261-0242ac120002 page_view /pricing 200
2021-05-22 11:08:47 0467d8b8-99d6-4eea-92bb-bc45adb05dc4 page_view /docs 200

Verify alter before commit⚓︎

Once an alter job has processed all its partitions but hasn't been committed, verify the old and new data using these SQL functions.

Verify New Data
1
2
3
4
SELECT client_ip
FROM alter_new('project.my_table','65724ae0-fc9b-485c-be55-59fb62cb44d5')
WHERE timestamp > '2021-01-01 00:00:00' AND timestamp < '2022-01-01 00:00:00'
AND isNotNull(client_ip)
Verify Old Data
1
2
3
4
SELECT client_ip
FROM alter_old('project.my_table','65724ae0-fc9b-485c-be55-59fb62cb44d5')
WHERE timestamp > '2021-01-01 00:00:00' AND timestamp < '2022-01-01 00:00:00'
AND isNotNull(client_ip)

The alter_new function takes two parameters:

  • projectname.tablename
  • job_id from the alter API endpoint

The alter_new function returns all rows in the modified partitions. This may include non-modified rows.

To see only the rows that would be modified, use the same WHERE clause as the original ALTER TABLE query.

Use these functions to compare what the partitions will look like before and after committing the alter job. Any partition not selected by the alter job won't be returned by these functions.

Commit an alter job⚓︎

After an alter job finishes, commit the changes to make the modified partitions visible to end users.

Alter jobs run on dedicated alter-peer pods, separate from the query infrastructure. This separation allows queries to continue while data is altered. Alter-peers process partitions one by one. To speed up the alter process, scale up the number of peers.

For example, if a table has 1,000 partitions and each takes 30 seconds on average, a single peer takes about 8 hours. Scaling to 50 alter-peers reduces this to about 10 minutes.

Cancel an alter job⚓︎

Canceling failed jobs was introduced in Hydrolix version 5.11. Previously, only running or pending jobs could be canceled. As of v5.11, failed jobs can also be canceled without a retry first.

Cancel an ALTER job that's in running, pending, or failed state:

Cancel Alter Job Endpoint
POST /v1/orgs/{org_id}/jobs/alter/{job_id}/cancel

Alter job status⚓︎

Check the status of an alter job using the get alter job status endpoint.

Alter Job Status Endpoint
https://hostname.hydrolix.live/config/v1/orgs/{org_id}/jobs/alter/{job_id}/status
Alter Job Status Response
{
    "name": "job_1660812184",
    "description": null,
    "uuid": "b40576a2-8245-4964-895b-d2b249837f19",
    "created": "2022-08-18T08:43:04.952123Z",
    "modified": "2022-08-18T08:44:06.400328Z",
    "settings": {
        "max_timestamp": 1641202980,
        "min_timestamp": 1641116582,
        "table_id": "260bb873-5d72-422f-8e05-1b5acdd455ab",
        "project_id": "970a78d3-62a7-4a81-b515-2f421537b1c5",
        "reliable_update": true,
        "shard_key": null,
        "sql_stmt": "ALTER TABLE sample.cts\n \
        UPDATE \"data.leaf_cert.issuer.O\" = 'testamaz' \n \
        WHERE timestamp < toString(date_sub(DAY, 2, now())) \
        AND timestamp > toString(date_sub(DAY, 3, now())) \
        AND data.leaf_cert.issuer.O IN ('Amazon')",
        "reliable_update": true
    },
    "status": "running",                <-- alter-peers are processing the request
    "type": "alter_table",
    "org": "0ffa6312-61ba-4620-8d57-96514a7f3859",
    "details": {
        "errors": [],
        "job_id": "b40576a2-8245-4964-895b-d2b249837f19",
        "duration_ms": 53108,           <-- current elapsed processing time
        "status_detail": {
            "tasks": {
                "ALTER": {              <-- 105 partitions will be impacted
                    "DONE": 86,         <-- already completed
                    "READY": 18,        <-- not yet started
                    "RUNNING": 1        <-- with a scale of 1 alter-peer, runs 1 at a time
                },
                "ALTER_LIST": {
                    "DONE": 1           <-- initial step, list all partitions for alter-peers
                }
            },
            "percent_complete": 0.8207547,  <-- 82% complete
            "estimated": true
        }
    }
}

Alter job lifecycle⚓︎

An ALTER job transitions through the following states:

Alter Job State Transitions
1
2
3
4
5
6
7
ready → running → pending → (commit) → done
                    ↓           ↓
                  failed ←------┘
                    ↓  ↑
                    ↓  retry
                  cancel → (locks released)
State Description
ready Hydrolix is calculating the list of partitions to process.
running Alter-peers are actively processing partitions.
failed Processing encountered an error. See the error message for details.
pending All partitions have been altered. Awaiting commit or cancel.
done The job has been committed. Altered data is visible to end users.

Valid actions per state⚓︎

Action Valid from states Endpoint
Status Any GET /v1/orgs/{org_id}/jobs/alter/{job_id}/status
Commit pending POST /v1/orgs/{org_id}/jobs/alter/{job_id}/commit
Cancel running, pending, failed POST /v1/orgs/{org_id}/jobs/alter/{job_id}/cancel
Retry failed POST /v1/orgs/{org_id}/jobs/alter/{job_id}/retry

Retry a failed alter job⚓︎

If an ALTER job fails, retry it.

  • API: Use the retry an alter job endpoint POST /v1/orgs/{org_id}/jobs/alter/{job_id}/retry.
  • UI: Visit https://hostname.hydrolix.live/jobs/alter-jobs/. Select then Retry for the job.

Known limitations⚓︎

  • ALTER on summary tables isn't supported. Summary tables use pre-aggregated intermediate states (avgState, avgMerge, etc.) and ALTER can't correctly modify this data format.
  • Only one ALTER job can run per table at a time. Concurrent ALTER jobs on the same table aren't supported.
  • No ALTER on shard keys. You can't alter a column that's used as a shard key.
  • ALTER UPDATE can't add new columns. You can only modify existing column values.
  • ALTER DELETE operates on entire rows. It can't delete a subset of columns on a row.