Set Up a Summary Table for Data Ingested via Kinesis

To configure a summary table, complete the following steps:

  1. Define summary transform.
  2. Create summary table.

Define Summary Transform

The summary transform defines a SQL statement that aggregates your data over some period of time.

The following example shows a general form for summary transforms:

SELECT
   <time_field_aggregation> AS <alias>,
   <function_on_column> AS <alias>,
   .......
FROM
    <parent_project.parent_table>
GROUP BY 
        <time_field_alias>
SETTINGS 
        hdx_primary_key = '<time_field_alias>'

Summary transforms are subject to the following limitations:

  • In the SETTINGS clause, you must specify a hdx_primary_key. This primary key should correspond to a time field aggregation alias in your SELECT statement.
  • In the FROM clause, specify the project name and the table name of the parent table.

📘

Test Summary Transform

To test your summary transform, run it without the SETTINGS clause on the parent table.

Example

Consider a summary table that aggregates, on an hourly basis:

  • the sum of item cost
  • the average tax
  • the 95th percentile of my.journeys

The following SQL shows how you could translate these requirements into a summary transform:

select
    toStartOfHour(timestamp) as hour,
    sum(cost) as sumcost,
    avg(tax) as avgtax,
    quantile(.95)(distance['local']) as _95th
from
    my.journeys
group by
    hour 
SETTINGS hdx_primary_key = 'hour'

Create Summary Table

The summary table persists the aggregation results generated by your summary transform. You can create your summary table with the Hydrolix UI or the API.

To create a table with the Hydrolix API, see the API Reference.

Your API request should contain a kinesis field that specifyies a parent_source similar to the following JSON request body:

{
    "name": "<table_name>",
    "type": "summary",
    "settings": {
        "summary": {
            "sql": "<summary transform>",
            "kinesis": {
                "parent_source": "<parent_project.parent_table>",
                "checkpointer": { // optional. defaults to parent checkpointer
                    "name": "<custom checkpointer table>"
                }
            }
        }
    }
}

Add your summary transform here, as well as the project and table you'd like to use for your parent table. You can optionally specify a checkpointer table.