Insert / Save Data Manually

❗️

Note:

Manual insertion should be used for specific individual jobs and not be used as an ongoing insertion methodology.

For standard data ingest that will scale either use Stream Ingest or Batch Ingest. Summary Tables should be used for aggregations.

There are two ways to insert data manually.

  • Inserting from a query result
  • Inserting raw data.

Set-up your tables.

To insert data into a table the table must already exist, more information on creating projects and tables can be found in the Projects & Tables section.

In addition a default transform must be deployed using the "type": "hdx", - more information can be found in the Write Transforms section. An exmaple is provided below.

{
    "name": "mytransform",
    "type": "hdx",
    "description": null,
    "settings": {
        "is_default": true,
        "output_columns": [
            {
                "name": "day",
                "datatype": {
                    "primary": true,
                    "resolution": "seconds",
                    "format": "2006-01-02 15:04:05",
                    "type": "datetime"
                }
            },
            {
                "name": "mycolumn1",
                "datatype": {
                    "type": "string"
                }
            },
            {
                "name": "mycolumn2",
                "datatype": {
                    "type": "metric"
                }
            }
        ],
        "compression": "none",
        "format_details": {}
    }
}

🚧

Gotchas

  • Hydrolix table requires a timestamp or epoch as a primary field
  • The transform must be the default - "is_default": true
  • The type must be hdx - "type": "hdx",
  • No scripting is completed on the data, these features can not be utilised.

Inserting from a query result

To insert from a query result the INSERT statement is used combined with a SELECT statement. For example:

INSERT INTO myproject.mytable (timestamp, mycolumn1, mycolumn2)
SELECT toStartOfDay(timestamp) as day, mycolumn1, mycolumn2
FROM myproject.anothertable
GROUP BY day, mycolumn1

This will generate a query with the SELECT toStartOfDay(timestamp) as day..... and insert the result into the myproject.mytable.

📘

INSERT INTO

The INSERT INTO function allow to specify the project.table and then you can specify the different by separating those using a coma like that (column_name1, column_name2 etc...)
Then write the SELECT query you need to index

Insert data manually

On top of being able to insert data from a SQL query, insert into allows manual data insertion.
If we take our previous example we can use the following to insert data:

INSERT INTO myproject.mytable (timestamp, mycolumn1, mycolumn2)
VALUES ('2021-11-30 15:04:05', "its a column", 1)

You can also index multiple values at once like that:

INSERT INTO myproject.mytable (timestamp, mycolumn1, mycolumn2)
VALUES ('2021-11-30 15:04:05', "its a row", 1),('2021-12-01 15:04:05', "its a another row", 2)

If your insert contains more column than the SQL query, it'll automatically fill those columns with NULL.

The response for the INSERT into is a JSON object which contains the number of rows written, the number of partitions and finally the number of bytes written like the following:

{
    "bytes_written": 194315,
    "num_partitions": 1,
    "rows_written": 13872
}