Summary Tables

Hydrolix supports Summary Tables, this type of table can be used for multiple use cases:

  • Aggregation
  • Rollup
  • Data transformation (regex, changing column and value, casting to another format etc...)

The principle of summary table is to write the response of a SQL query into a table instead of the standard output.

How to use it

Hydrolix Summary Tables are similar to any other table you create, they will have the same characteristics and is required to have a timestamp as primary field.

The first step is to create the table which will receive the data, see the example here.

Once the table is created you have to specify the schema of the data using transforms, see example here.


Hydrolix table requires timestamp as primary field

Even for Summary Table we are using timestamp for primary field, so your transform must have a datetime or an epoch field to ingest the data.

Insert SQL Query result

Once the table is setup you can write SQL query and index the result into your new table. Let's assume we have created a new table sample.summary.

We can use the following query to insert into the new table:

INSERT INTO sample.summary (dropoff_boroname, day)
SELECT dropoff_boroname, toStartOfDay(pickup_datetime) as day
FROM sample.taxi_trips
GROUP BY day, dropoff_boroname

This will generate a query which will index all the different value possible for dropoff_boroname aggregated by day.

For this example the transform used is the following:

  "settings": {
    "is_default": true,
        "output_columns": [
                "name": "day",
                    "datatype": {
                        "primary": true,
                        "resolution": "seconds",
                        "format": "2006-01-02 15:04:05",
                        "type": "datetime"
                "name": "dropoff_boroname",
                "datatype": {
                    "type": "string"
                "name": "pickup_boroname",
                "datatype": {
                    "type": "string"
    "compression": "none",
    "format_details": {}
  "name": "taxi_transform_summary",
  "table": "{{tableid}}",
  "type": "hdx",
  "description": null


Transform settings for Summary table

The transform must be the default one otherwise the insert will fail, use the following in the settings:
"is_default": true


Transform type HDX

By default when you define transform you specify the input format as either JSON, CSV or Parquet, for summary table we have a new type called hdx
As INSERT INTO writes directly the partition and do not leverage the ingest component script is not supported in transform.

As we can see the field day is a datetime type and the functiontoStartOfDay in SQL is rounds down a date with time to the start of the day.



The INSERT INTO function allow to specify the project.table and then you can specify the different by separating those using a coma like that (column_name1, column_name2 etc...)
Then write the SELECT query you need to index

Insert data manually

On top of being able to insert data from a SQL query, insert into allows you to index values manually too.
If we take our previous example we can use the following to insert data:

INSERT INTO sample.summary (dropoff_boroname, day)
VALUES ('Manhattan', 2021-11-30 15:04:05)

You can also index multiple values at once like that:

INSERT INTO sample.summary (dropoff_boroname, day)
VALUES ('Manhattan', 2021-11-30 15:04:05), ('Bronx', 2021-11-30 15:10:05)

If your Summary Table contains more column than the SQL query, it'll automatically fill those columns with NULL.

The response for the INSERT into is a JSON object which contains the number of rows written, the number of partitions and finally the number of bytes written like the following:

    "bytes_written": 194315,
    "num_partitions": 1,
    "rows_written": 13872

Did this page help you?