Working with Time Functions
Time-series data are measurements or events that are monitored, where time is a critical dimension to the data. This could be server metrics, application performance monitoring metrics, network data, logs, sensor data, events, clicks, IOT, and many other types of analytics data.
Time-series analytics is the measurement of change over time.
All fields in Hydrolix are indexed, however, time is the primary index. Time as the primary key is one of the reasons why Hydrolix can get query results from object storage quickly. Internally, a table consists of many time partitioned data files in a columnar format. Each file represents sequentially time ordered data and associated metadata. This is all handled behind the scenes and invisible to the user or administrator.
Working with time
Since Hydrolix uses time as a primary key, queries that reference tables must contain a time range in the WHERE clause. This is essential for time partition lookup.
SELECT count() FROM sample.metrics WHERE (timestamp BETWEEN '2016-01-01 00:00:00' AND '2016-01-02 12:00:00')
Grouping and subdivision of time ranges is achieved through the use of time functions (reference). These functions can be used in either
SELECT expression or
GROUP BY clause position.
There are general time functions for
toStartOfSecond(). Along with common patterns
toStartofFiveMinute(), Ten and Fifteen and many more. Each function accepts a Date Time data type and rounds values to their respective time unit.
SELECT toStartOfFiveMinute(timestamp) AS five, max(usage_guest) FROM sample.metrics WHERE (timestamp BETWEEN '2016-01-01 00:00:00' AND '2016-01-01 04:00:00') GROUP BY five ORDER BY five
This will aggregate the data into 5 minutes buckets and compute max values of the specified metric. If the time buckets are not required in the result, we could move the function into the
GROUP BY clause.
SELECT max(usage_guest) FROM sample.metrics WHERE (timestamp BETWEEN '2016-01-01 00:00:00' AND '2016-01-01 04:00:00') GROUP BY toStartOfMinute(timestamp) AS five ORDER BY five
Fine grained time grouping
Fine grained grouping can be achieved using
toStartOfInterval(t, INTERVAL unit) where unit can be
SELECT toStartOfInterval(timestamp, INTERVAL 3 hour) AS three, max(usage_guest) FROM sample.metrics WHERE (timestamp BETWEEN '2016-01-01 00:00:00' AND '2016-01-02 00:00:00') GROUP BY three ORDER bY three
Using time functions together
Time functions can be used together for more complex calculations.
SELECT toStartOfHour(timestamp) AS hour, count() FROM sample.metrics WHERE (timestamp BETWEEN '2016-01-01 00:00:00' AND '2016-01-01 02:00:00') GROUP BY hour, toStartOfInterval(timestamp, INTERVAL 30 minute) AS mins ORDER BY hour, mins
Equally sized time slots
timeSlots(t, duration, size) is useful for defining equally sizes slots. Where duration is specified in seconds as UInt32 (type conversion) and size controls the number of slots contained in duration.
SELECT timeSlots(timestamp, toUInt32(60*10), 6*10) AS slots, max(usage_guest) FROM sample.metrics WhERE (timestamp BETWEEN '2016-01-01 00:00:00' AND '2016-01-01 01:00:00') GROUP BY slots ORDER BY slots
This produces 600 second slots, each containing an array of time in 60 second intervals.
Finding the latest value of a metric is a common monitoring requirement.
argMax(metric,t) succinctly expresses finding the latest value over a specified time range.
SELECT argMax(usage_guest, timestamp) FROM sample.metrics WHERE (timestamp between '2016-01-01 00:00:00' AND '2016-01-01 12:00:00')
Rate of change over time
Calculating the rate of change over time is an important concept in time series analytics.
boundingRatio(t,metric) will calculate rate change based on metric argMax - argMin / max - min time. Adding a grouping time function will give you a breakdown per time bucket.
SELECT team, boundingRatio(timestamp, usage_iowait) AS rate FROM sample.metrics WHERE (timestamp BETWEEN '2016-01-01 00:00:00' AND '2016-01-01 01:00:00') GROUP BY team ORDER BY rate DESC
Range of quantiles
Calculating a range of quantiles for a single metric in a single pass is also easy using
quantiles(L1, L2, ..)(metric). Where levels are specified as a number between 0 and 1.
SELECT quantiles(0.25, 0.5, 0.75, 0.99)(usage_guest) FROM sample.metrics WHERE (timestamp BETWEEN '2016-01-01 00:00:00' AND '2016-01-01 12:00:00')
Aggregations on groups of data
<aggregation>Resample(start, end, step)(<aggFunction_params>, resampling_key) lets you divide data into groups, and then separately aggregates the data in those groups. Groups are created by splitting the values from one column into intervals.
SELECT countResample(80,90,2)(usage_guest, usage_guest) FROM sample.metrics WHERE (timestamp BETWEEN '2016-01-01 00:00:00' AND '2016-01-01 12:00:00')
This groups the metric values [80, 82, 84, 86, 88, 90] and provides counts for each bucket. Very useful!