Overview
Hydrolix is an append-only system, but data may need to be changed or removed for compliance or other reasons.
The Alter Service executes ALTER TABLE queries on a set time range of data. The query can contain either an UPDATE or DELETE statement.
Altering data is a long-running task performed as an asynchronous background task by the Alter Service. The response to an alter query request is an alter job job_id. The alter job API enables checking progress status, verifying changes, and committing the result. The altered data partitions aren't visible to end users until after commit.
Create an alter job⚓︎
- Create the alter job
- API: Post an
ALTER TABLEquery using any query interface. Ajob_idis returned in response. - UI: Enter the alter job SQL at
https://hostname.hydrolix.live/jobs/alter-jobs/create/and selectStart job.
- API: Post an
-
Check the status of the alter job using the job status API endpoint or in the UI at
https://hostname.hydrolix.live/jobs/alter-jobs/. The status will be one of the following:ready- the job is starting and the list of partitions to work on is being calculatedrunning- alter-peers are working on a partitionfailed- see error message for more detailpending- all partitions have been altered, awaiting either commit or cancel
See Alter job status for more information.
-
Commit the alter job once its status reaches
pending. Once the status reachesdone, the altered data is visible to end users.
ALTER TABLE syntax⚓︎
To update or delete specific columnar data or rows, send an ALTER TABLE query statement that includes the target project.table followed by either an UPDATE or DELETE command. A time range WHERE clause must be provided. Any number of additional predicates can also be included.
The query updates or deletes rows for which the WHERE clause returns true.
Using predicates that scope the alter statement to a smaller subset of data speeds up alter job time. An alter job only opens partitions that match all specified conditions, which can significantly reduce job duration when the query targets a subset of data within the time range.
Alter syntax
Unlike similar queries in OLTP databases, ALTER TABLE is a heavy operation not designed for frequent, ongoing use.
UPDATE statements⚓︎
This query finds all rows where the timestamp is between 2021-01-01 00:00:00 and 2022-01-01 00:00:00 and the client_ip isn't null, then updates the client_ip column to 'secret' in the table my_table. The following tables show how the alter statement affects example data:
| timestamp | client_ip | method | path | status |
|---|---|---|---|---|
| 2020-06-15 08:30:00 | 192.168.1.10 | GET | /index.html | 200 |
| 2021-03-10 14:22:31 | 10.0.0.45 | POST | /api/login | 200 |
| 2021-07-04 09:15:00 | NULL | GET | /favicon.ico | 304 |
| 2021-09-22 18:45:12 | 172.16.0.99 | PUT | /api/user/42 | 200 |
| 2021-12-31 23:59:59 | 203.0.113.7 | DELETE | /api/session | 204 |
| 2022-02-14 11:00:00 | 198.51.100.3 | GET | /dashboard | 200 |
| 2023-01-01 00:00:01 | 10.10.10.10 | GET | /health | 200 |
| timestamp | client_ip | method | path | status |
|---|---|---|---|---|
| 2020-06-15 08:30:00 | 192.168.1.10 | GET | /index.html | 200 |
| 2021-03-10 14:22:31 | secret | POST | /api/login | 200 |
| 2021-07-04 09:15:00 | NULL | GET | /favicon.ico | 304 |
| 2021-09-22 18:45:12 | secret | PUT | /api/user/42 | 200 |
| 2021-12-31 23:59:59 | secret | DELETE | /api/session | 204 |
| 2022-02-14 11:00:00 | 198.51.100.3 | GET | /dashboard | 200 |
| 2023-01-01 00:00:01 | 10.10.10.10 | GET | /health | 200 |
DELETE statements⚓︎
This query deletes all rows in the specified time range that match the customer_id predicate.
| timestamp | customer_id | event | url | status |
|---|---|---|---|---|
| 2020-11-03 16:42:00 | f4957a96-2dac-11ed-a261-0242ac120002 | page_view | /pricing | 200 |
| 2021-02-14 09:10:33 | f4957a96-2dac-11ed-a261-0242ac120002 | login | /auth/login | 200 |
| 2021-05-22 11:08:47 | 0467d8b8-99d6-4eea-92bb-bc45adb05dc4 | page_view | /docs | 200 |
| 2021-07-04 14:55:01 | f4957a96-2dac-11ed-a261-0242ac120002 | api_call | /api/v1/data | 200 |
| timestamp | customer_id | event | url | status |
|---|---|---|---|---|
| 2020-11-03 16:42:00 | f4957a96-2dac-11ed-a261-0242ac120002 | page_view | /pricing | 200 |
| 2021-05-22 11:08:47 | 0467d8b8-99d6-4eea-92bb-bc45adb05dc4 | page_view | /docs | 200 |
Verify alter before commit⚓︎
Once an alter job has processed all its partitions but hasn't been committed, verify the old and new data using these SQL functions.
The alter_new function takes two parameters:
projectname.tablenamejob_idfrom the alter API endpoint
The alter_new function returns all rows in the modified partitions. This may include non-modified rows.
To see only the rows that would be modified, use the same WHERE clause as the original ALTER TABLE query.
Use these functions to compare what the partitions will look like before and after committing the alter job. Any partition not selected by the alter job won't be returned by these functions.
Commit an alter job⚓︎
After an alter job finishes, commit the changes to make the modified partitions visible to end users.
- To commit, send a request to POST /v1/orgs/{org_id}/jobs/alter/{job_id}/commit (only valid from
pendingstate). - To roll back, cancel the alter job (valid from
running,pending, orfailedstates).
Alter jobs run on dedicated alter-peer pods, separate from the query infrastructure. This separation allows queries to continue while data is altered. Alter-peers process partitions one by one. To speed up the alter process, scale up the number of peers.
For example, if a table has 1,000 partitions and each takes 30 seconds on average, a single peer takes about 8 hours. Scaling to 50 alter-peers reduces this to about 10 minutes.
Cancel an alter job⚓︎
Canceling failed jobs was introduced in Hydrolix version 5.11. Previously, only running or pending jobs could be canceled. As of v5.11, failed jobs can also be canceled without a retry first.
Cancel an ALTER job that's in running, pending, or failed state:
| Cancel Alter Job Endpoint | |
|---|---|
Alter job status⚓︎
Check the status of an alter job using the get alter job status endpoint.
| Alter Job Status Endpoint | |
|---|---|
Alter job lifecycle⚓︎
An ALTER job transitions through the following states:
| Alter Job State Transitions | |
|---|---|
| State | Description |
|---|---|
ready |
Hydrolix is calculating the list of partitions to process. |
running |
Alter-peers are actively processing partitions. |
failed |
Processing encountered an error. See the error message for details. |
pending |
All partitions have been altered. Awaiting commit or cancel. |
done |
The job has been committed. Altered data is visible to end users. |
Valid actions per state⚓︎
| Action | Valid from states | Endpoint |
|---|---|---|
| Status | Any | GET /v1/orgs/{org_id}/jobs/alter/{job_id}/status |
| Commit | pending |
POST /v1/orgs/{org_id}/jobs/alter/{job_id}/commit |
| Cancel | running, pending, failed |
POST /v1/orgs/{org_id}/jobs/alter/{job_id}/cancel |
| Retry | failed |
POST /v1/orgs/{org_id}/jobs/alter/{job_id}/retry |
Retry a failed alter job⚓︎
If an ALTER job fails, retry it.
- API: Use the retry an alter job endpoint
POST /v1/orgs/{org_id}/jobs/alter/{job_id}/retry. - UI: Visit
https://hostname.hydrolix.live/jobs/alter-jobs/. Select then Retry for the job.
Known limitations⚓︎
- ALTER on summary tables isn't supported. Summary tables use pre-aggregated intermediate states (
avgState,avgMerge, etc.) and ALTER can't correctly modify this data format. - Only one ALTER job can run per table at a time. Concurrent ALTER jobs on the same table aren't supported.
- No ALTER on shard keys. You can't alter a column that's used as a shard key.
- ALTER UPDATE can't add new columns. You can only modify existing column values.
- ALTER DELETE operates on entire rows. It can't delete a subset of columns on a row.