Set Up a Summary Table for Data Ingested via Kinesis

To configure a summary table, complete the following steps:

  1. Define summary transform.
  2. Create summary table.

Define Summary Transform

The summary transform defines a SQL statement that aggregates your data over some period of time.

The following example shows a general form for summary transforms:

SELECT
   <time_field_aggregation> AS <alias>,
   <function_on_column> AS <alias>,
   .......
FROM
    <parent_project.parent_table>
GROUP BY 
		<time_field_alias>
SETTINGS 
		hdx_primary_key = '<time_field_alias>'

Summary transforms are subject to the following limitations:

  • In the SETTINGS clause, you must specify a hdx_primary_key. This primary key should correspond to a time field aggregation alias in your SELECT statement.
  • In the FROM clause, specify the project name and the table name of the parent table.

📘

Test Summary Transform

To test your summary transform, run it without the SETTINGS clause on the parent table.

Example

Consider a summary table that aggregates, on an hourly basis:

  • the sum of item cost
  • the average tax
  • the 95th percentile of my.journeys

The following SQL shows how you could translate these requirements into a summary transform:

select
    toStartOfHour(timestamp) as hour,
    sum(cost) as sumcost,
    avg(tax) as avgtax,
    quantile(.95)(distance['local']) as _95th
from
    my.journeys
group by
    hour 
SETTINGS hdx_primary_key = 'hour'

Create Summary Table

The summary table persists the aggregation results generated by your summary transform. You can create your summary table with the Hydrolix UI or the API.

To create a table with the Hydrolix API, see the API Reference.

Your API request should contain a kinesis field that specifyies a parent_source similar to the following JSON request body:

{
	"name": "<table_name>",
	"type": "summary",
	"settings": {
		"summary": {
			"sql": "<summary transform>",
			"kinesis": {
				"parent_source": "<parent_project.parent_table>",
				"checkpointer": { // optional. defaults to parent checkpointer
					"name": "<custom checkpointer table>"
				}
			}
		}
	}
}

Add your summary transform here, as well as the project and table you'd like to use for your parent table. You can optionally specify a checkpointer table.