Skip to content

Manual Ingest

Use manual data insertion for one-time jobs, not for ongoing insertion or ingestion.

For standard data ingestion that scales with load, either use Stream Ingest or Batch Ingest. Use Summary Tables for aggregations.

There are two ways to insert data manually:

  • From a query result
  • From raw data

Before you begin⚓︎

Set up a table⚓︎

If you don't have a table configured in Hydrolix, make one. See Projects & Tables to create a table.

Configure the default transform⚓︎

The table must have a default transform configured with "is_default": true. Hydrolix uses the default transform's output_columns to discover the column schema when writing data from INSERT INTO statements. The default transform must include:

  • The primary timestamp column for the table
  • Every column you intend to insert data into, with matching data types

If your table already has a default transform for streaming or batch ingest, you don't need to create a separate one. To check, use either the UI or API:

  • UI: Navigate to Data. Select your table and look for (default) in the list under Table Transforms.
  • API: List the table's transforms and look for "is_default": true in the transform settings.

If you need to create a default transform, see Write Transforms for details.

The following example transform defines a table with three columns: a timestamp primary column, a mycolumn1 string column, and a mycolumn2 unsigned integer column. The INSERT INTO examples on this page use this schema.

{
    "name": "mytransform",
    "type": "json",
    "description": "an example for manual data insertion",
    "settings": {
        "is_default": true,
        "output_columns": [
            {
                "name": "timestamp",
                "datatype": {
                    "primary": true,
                    "resolution": "seconds",
                    "format": "2006-01-02 15:04:05",
                    "type": "datetime"
                }
            },
            {
                "name": "mycolumn1",
                "datatype": {
                    "type": "string"
                }
            },
            {
                "name": "mycolumn2",
                "datatype": {
                    "type": "uint32"
                }
            }
        ],
        "compression": "none",
        "format_details": {}
    }
}

Insert data⚓︎

You can query another Hydrolix table in the cluster and directly insert the query results. For data not already in the cluster, you can also manually insert data.

By default, INSERT INTO writes data to the table's primary storage bucket. To write data to a different bucket, add the SETTINGS hdx_storage_id clause to the statement. See Specify a storage location for details.

Automatic data routing features don't apply to INSERT INTO statements:

When inserting data, keep these things in mind:

  • Insertion fails if the primary timestamp field is missing.
  • The transform must be the default: "is_default": true.
  • No ingestion scripting features execute when inserting data manually.

Insert from a query result⚓︎

To insert from a query result, combine the INSERT INTO statement with a SELECT statement.

1
2
3
4
INSERT INTO myproject.mytable (timestamp, mycolumn1, mycolumn2)
SELECT toStartOfDay(timestamp) as day, mycolumn1, mycolumn2
FROM myproject.anothertable
GROUP BY day, mycolumn1

Hydrolix runs the SELECT query and inserts the resulting rows into myproject.mytable.

Insert manually⚓︎

Insert data using literals in SQL statements.

INSERT INTO myproject.mytable (timestamp, mycolumn1, mycolumn2)
VALUES ('2021-11-30 15:04:05', 'aardvark', 1)

You can insert multiple rows in a single statement.

INSERT INTO myproject.mytable (timestamp, mycolumn1, mycolumn2)
VALUES ('2021-11-30 15:04:05', 'badger', 1),('2021-12-01 15:04:05', 'civet', 2)

Those commands insert rows into the table.

1
2
3
4
> SELECT mycolumn1, mycolumn2, timestamp FROM myproject.mytable
aardvark        1       2021-11-30 15:04:05
badger  1       2021-11-30 15:04:05
civet   2       2021-12-01 15:04:05

The response for the INSERT INTO is a JSON object containing the bytes written, number of partitions, and rows written. Authorization credentials are omitted from the next example. See HTTP Query API for examples.

1
2
3
4
5
6
$ curl -X POST --url "$HDX_HYDROLIX_URL/query" --data-binary @insert-into.sql
{
    "bytes_written": 194315,
    "num_partitions": 1,
    "rows_written": 13872
}

Specify a storage location⚓︎

By default, INSERT INTO writes data to the table's primary storage bucket. To write data to a different bucket, add SETTINGS hdx_storage_id = '{storage_id}' to the statement, where {storage_id} is the UUID of the target storage location.

The SETTINGS clause works with both manually inserting data and inserting data from a query result using INSERT INTO ... SELECT statements.

1
2
3
INSERT INTO myproject.mytable (timestamp, mycolumn1, mycolumn2)
SETTINGS hdx_storage_id = 'f47ac10b-58cc-4372-a567-0e02b2c3d479'
VALUES ('2021-11-30 15:04:05', 'aardvark', 1)
1
2
3
4
5
INSERT INTO myproject.mytable (timestamp, mycolumn1, mycolumn2)
SETTINGS hdx_storage_id = 'f47ac10b-58cc-4372-a567-0e02b2c3d479'
SELECT toStartOfDay(timestamp) as day, mycolumn1, mycolumn2
FROM myproject.anothertable
GROUP BY day, mycolumn1

Hydrolix records the target storage location in the storage_id metadata field of the catalog entry for each partition. Without the SETTINGS hdx_storage_id clause, the catalog records the table's default storage ID. With the clause, the catalog records the specified value.

To find the UUID of a storage location, use the storage API endpoints or check the storage settings in the Hydrolix UI.