Kinesis
Set Up a Summary Table for Data Ingested via Kinesis
To configure a summary table, complete the following steps:
- Define summary transform.
- Create summary table.
Define Summary Transform
The summary transform defines a SQL statement that aggregates your data over some period of time.
The following example shows a general form for summary transforms:
SELECT
<time_field_aggregation> AS <alias>,
<function_on_column> AS <alias>,
.......
FROM
<parent_project.parent_table>
GROUP BY
<time_field_alias>
SETTINGS
hdx_primary_key = '<time_field_alias>'
Summary transforms are subject to the following limitations:
- In the
SETTINGS
clause, you must specify ahdx_primary_key
. This primary key should correspond to a time field aggregation alias in yourSELECT
statement. - In the
FROM
clause, specify the project name and the table name of the parent table.
Test Summary Transform
To test your summary transform, run it without the
SETTINGS
clause on the parent table.
Example
Consider a summary table that aggregates, on an hourly basis:
- the sum of item cost
- the average tax
- the 95th percentile of
my.journeys
The following SQL shows how you could translate these requirements into a summary transform:
select
toStartOfHour(timestamp) as hour,
sum(cost) as sumcost,
avg(tax) as avgtax,
quantile(.95)(distance['local']) as _95th
from
my.journeys
group by
hour
SETTINGS hdx_primary_key = 'hour'
Create Summary Table
The summary table persists the aggregation results generated by your summary transform. You can create your summary table with the Hydrolix UI or the API.
To create a table with the Hydrolix API, see the API Reference.
Your API request should contain a kinesis
field that specifyies a parent_source
similar to the following JSON request body:
{
"name": "<table_name>",
"type": "summary",
"settings": {
"summary": {
"sql": "<summary transform>",
"kinesis": {
"parent_source": "<parent_project.parent_table>",
"checkpointer": { // optional. defaults to parent checkpointer
"name": "<custom checkpointer table>"
}
}
}
}
}
Add your summary transform here, as well as the project and table you'd like to use for your parent table. You can optionally specify a checkpointer table.
Updated about 1 month ago