Summary Tables (Aggregation)

Hydrolix supports Summary Tables, this type of table can be used for multiple use cases:

  • Aggregation
  • Rollups
  • Data transformation (regex, changing column and value, casting to another format etc...)

The principle of summary table is to write the response of a SQL query into a table instead of the standard output.

How to use it

Hydrolix Summary Tables are similar to any other table you create, they will have the same characteristics and are required to have a timestamp as primary field.

The first step is to create the table which will receive the data, more information here Create a Project & Table with API.

Once the table is created, the write schema (transform) needs to be created - Write Transforms.

🚧

Hydrolix table requires timestamp as primary field

Even for Summary Tables a primary timestamp (datetime or an epoch) field is required.

Insert an SQL Query result

Once the table is setup an SQL query is used to select the data, the result is then written into the new table. For example: Let's assume the following new table has been created sample.summary.

The following query is used to insert data into the new table:

INSERT INTO sample.summary (dropoff_boroname, day)
SELECT dropoff_boroname, toStartOfDay(pickup_datetime) as day
FROM sample.taxi_trips
GROUP BY day, dropoff_boroname
ORDER BY day

This will generate a query which will index all the different values possible for dropoff_boroname aggregated by day.

For this example the transform used is as follows:

{
  "settings": {
    "is_default": true,
        "output_columns": [
           {
                "name": "day",
                    "datatype": {
                        "primary": true,
                        "resolution": "seconds",
                        "format": "2006-01-02 15:04:05",
                        "type": "datetime"
                    }
            },
            {
                "name": "dropoff_boroname",
                "datatype": {
                    "type": "string"
                }
            },
            {
                "name": "pickup_boroname",
                "datatype": {
                    "type": "string"
                }
            }
        ],
    "compression": "none",
    "format_details": {}
  },
  "name": "taxi_transform_summary",
  "table": "{{tableid}}",
  "type": "hdx",
  "description": null
}

❗️

Transform settings for Summary table

The transform must be the default one otherwise the insert will fail, use the following in the settings:
"is_default": true

🚧

Transform type HDX

By default when defining the transform you should specify the input format as a type called hdx.

As INSERT INTO writes directly the partition and does not leverage the ingest component, script is not supported in transform.

As we can see the field day is a datetime type and the functiontoStartOfDay in SQL rounds down a date with time to the start of the day

πŸ“˜

INSERT INTO

The INSERT INTO function allows the specification of the project.table and then you can specify the different columns by separating them by using a coma - for example column_name1, column_name2....

Inserting data manually

On top of being able to insert data from a SQL query, insert into allows the user to index values manually too. Using the previous example the following can be used to insert data:

INSERT INTO sample.summary (dropoff_boroname, day)
VALUES ('Manhattan', 2021-11-30 15:04:05)

Multiple rows can be input as follows:

INSERT INTO sample.summary (dropoff_boroname, day)
VALUES ('Manhattan', 2021-11-30 15:04:05), ('Bronx', 2021-11-30 15:10:05)

If the Summary Table contains more columns than the SQL query, it'll automatically fill the additonal columns with NULL.

The response for the INSERT into is a JSON object showing the number of rows written, the number of partitions and the number of bytes :

{
    "bytes_written": 194315,
    "num_partitions": 1,
    "rows_written": 13872
}

Did this page help you?