Summary Table

Hydrolix supports Summary Table, those can be used for multiple use cases:

  • Aggregation
  • Rollup
  • Data transformation (regex, changing column and value, casting to another format etc...)

The principle of summary table is to write the response of a SQL query into a table instead of the standard output.

How to use it

Hydrolix Summary Table is similar to any other table you create, they will have the same characteristics and requires to have a timestamp as primary field.
The first step is to create the table which will receive the data, see the example here
Once the table is created you have to specify the schema of the data using transforms, see example here

🚧

Hydrolix table requires timestamp as primary field

Even for Summary Table we are using timestamp for primary field, so your transform must have a datetime or an epoch field to ingest the data.

Once the table is setup you can write SQL query and index the result into your new table.
Let's assume we have created a new table sample.summary.
We can use the following query to insert into the new table:

INSERT INTO sample.summary (dropoff_boroname, day)
SELECT dropoff_boroname, toStartOfDay(pickup_datetime) as day
FROM sample.taxi_trips
GROUP BY day, dropoff_boroname
ORDER BY day

This will generate a query which will index all the different value possible for dropoff_boroname aggregated by day.

For this example the transform used is the following:

{
  "settings": {
    "is_default": true,
        "output_columns": [
           {
                "name": "day",
                    "datatype": {
                        "primary": true,
                        "resolution": "seconds",
                        "format": "2006-01-02 15:04:05",
                        "type": "datetime"
                    }
            },
            {
                "name": "dropoff_boroname",
                "datatype": {
                    "type": "string"
                }
            },
            {
                "name": "pickup_boroname",
                "datatype": {
                    "type": "string"
                }
            }
        ],
    "compression": "none",
    "format_details": {}
  },
  "name": "taxi_transform_summary",
  "table": "{{tableid}}",
  "type": "hdx",
  "description": null
}

❗️

Transform settings for Summary table

The transform must be the default one otherwise the insert will fail, use the following in the settings:
"is_default": true

🚧

Transform type HDX

By default when you define transform you specify the input format as either JSON, CSV or Parquet, for summary table we have a new type called hdx

As we can see the field day is a datetime type and the functiontoStartOfDay in SQL is rounds down a date with time to the start of the day.

📘

INSERT INTO

The INSERT INTO function allow to specify the project.table and then you can specify the different by separating those using a coma like that (column_name1, column_name2 etc...)
Then write the SELECT query you need to index

If your Summary Table contains more column than the SQL query, it'll automatically fill those columns with NULL

The response for the INSERT into is a JSON object which contains the number of rows written, the number of partitions and finally the number of bytes written like the following:

{
    "bytes_written": 194315,
    "num_partitions": 1,
    "rows_written": 13872
}

Did this page help you?