Skip to content

Insert Data Manually

Use manual data insertion for ad-hoc jobs, not for ongoing insertion or ingestion.

For standard data ingestion that scales with load, either use Stream Ingest or Batch Ingest . Summary Tables should be used for aggregations.

There are two ways to insert data manually:

  • From a query result
  • From raw data

Set up a table⚓︎

See Projects & Tables to create a table.

Configure the default transform⚓︎

A default transform must be deployed. See Write Transforms for more detail.

{
    "name": "mytransform",
    "type": "json",
    "description": "an example for manual data insertion",
    "settings": {
        "is_default": true,
        "output_columns": [
            {
                "name": "timestamp",
                "datatype": {
                    "primary": true,
                    "resolution": "seconds",
                    "format": "2006-01-02 15:04:05",
                    "type": "datetime"
                }
            },
            {
                "name": "mycolumn1",
                "datatype": {
                    "type": "string"
                }
            },
            {
                "name": "mycolumn2",
                "datatype": {
                    "type": "uint32"
                }
            }
        ],
        "compression": "none",
        "format_details": {}
    }
}

Limitations

  • Insertion fails if the primary timestamp field is missing
  • No ingestion scripting features execute when inserting data manually
  • The transform must be the default: "is_default": true

Insert data⚓︎

You can query another Hydrolix table in the cluster and directly insert the query results. For data not already in the cluster, you can also manually insert data.

Insert from a query result⚓︎

To insert from a query result, combine the INSERT INTO statement with a SELECT statement.

1
2
3
4
INSERT INTO myproject.mytable (timestamp, mycolumn1, mycolumn2)
SELECT toStartOfDay(timestamp) as day, mycolumn1, mycolumn2
FROM myproject.anothertable
GROUP BY day, mycolumn1

This generates a query with the SELECT toStartOfDay(timestamp) as day ... and inserts the resulting rows into myproject.mytable.

Insert manually⚓︎

Insert data manually.

INSERT INTO myproject.mytable (timestamp, mycolumn1, mycolumn2)
VALUES ('2021-11-30 15:04:05', 'aardvark', 1)

You can insert multiple rows in a single statement.

INSERT INTO myproject.mytable (timestamp, mycolumn1, mycolumn2)
VALUES ('2021-11-30 15:04:05', 'badger', 1),('2021-12-01 15:04:05', 'civet', 2)

After the above two commands

1
2
3
4
> SELECT timestamp, mycolumn1, mycolumn2 FROM myproject.mytable
aardvark        1       2021-11-30 15:04:05
badger  1       2021-11-30 15:04:05
civet   2       2021-12-01 15:04:05

The response for the INSERT INTO is a JSON object containing the bytes written, number of partitions, and rows written. Authorization credentials are omitted from the below example, see HTTP Query API for examples.

1
2
3
4
5
6
$ curl -X POST --url "$HDX_HYDROLIX_URL/query" --data-binary @insert-into.sql
{
    "bytes_written": 194315,
    "num_partitions": 1,
    "rows_written": 13872
}