降采樣查詢是在時間維度的聚合查詢,是時序數據應用場景常用的降低采樣率的查詢。
引擎與版本
降采樣查詢僅支持時序引擎,且引擎版本需為3.4.15及以上版本。
語法
select_sample_by_statement ::= SELECT ( select_clause | '*' )
FROM table_identifier
WHERE where_clause
SAMPLE BY time_interval [ OFFSET offset_interval ] [ FILL fill_option ]
select_clause ::= selector [ AS identifier ] ( ',' selector [ AS identifier ] )
selector ::= tag_identifier | time | function_identifier '(' field_identifier [ ',' function_args ] ')'
where_clause ::= relation ( AND relation )* (OR relation)*
relation ::= ( field_identifier | tag_identifier ) operator term
operator ::= '=' | '<' | '>' | '<=' | '>=' | '!=' | IN | CONTAINS | CONTAINS KEY
time_interval ::= interval units | 0
offset_interval ::= interval units
降采樣支持的聚合函數列表
SAMPLE BY
是針對每條獨立的時間線(關于時間線的概念,請參見數據模型)上的降采樣操作。
SAMPLE BY
支持的函數列表:
函數 | 說明 |
SUM | 每個指定時間窗口內求和,具體請參見SUM函數。 |
AVG | 每個指定時間窗口內求均值,具體請參見AVG函數。 |
COUNT | 每個指定時間窗口內值個數,具體請參見COUNT函數。 |
MIN | 每個指定時間窗口內最小值,具體請參見MIN函數。 |
MAX | 每個指定時間窗口內最大值,具體請參見MAX函數。 |
FIRST | 每個指定時間窗口內第一個值,具體請參見FIRST函數。 |
LAST | 每個指定時間窗口內最后值,具體請參見LAST函數。 |
PERCENTILE | 每個指定時間窗口內求百分位,具體請參見PERCENTILE函數。 |
LATEST | 整個時間區間最新值,具體請參見LATEST函數。 |
RATE | 與前一行對應值的變化率,具體請參見RATE函數。 |
DELTA | 與前一行對應值的差,具體請參見DELTA函數。 |
示例
SELECT查詢指定的tag列無需指定降采樣函數,其他field列必須指定降采樣函數。
假定查詢的sensor表數據如下:
SELECT * FROM sensor;
返回結果如下:
+-----------+----------+---------------------------+-------------+-----------+
| device_id | region | time | temperature | humidity |
+-----------+----------+---------------------------+-------------+-----------+
| F07A1260 | north-cn | 2021-01-01T09:00:00+08:00 | 0.000000 | 9.000000 |
| F07A1260 | north-cn | 2021-01-01T12:01:00+08:00 | 1.000000 | 45.000000 |
| F07A1260 | north-cn | 2021-01-01T14:03:00+08:00 | 2.000000 | 46.000000 |
| F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 10.000000 | 47.000000 |
| F07A1261 | north-cn | 2021-02-10T12:00:30+08:00 | 3.000000 | 40.000000 |
| F07A1261 | north-cn | 2021-03-01T12:01:00+08:00 | 4.000000 | 41.000000 |
| F07A1261 | north-cn | 2021-03-08T12:08:00+08:00 | 5.000000 | 42.000000 |
| F07A1261 | north-cn | 2021-05-01T13:00:00+08:00 | 6.000000 | 43.000000 |
+-----------+----------+---------------------------+-------------+-----------+
降采樣與子查詢示例
降采樣查詢中不支持嵌套子查詢,但可以作為子查詢被其他查詢嵌套。
示例1:默認UTC對齊降采樣,時間線按照8h時間區間聚合分別求count。
SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 8h;
返回結果如下:
+-----------+----------+---------------------------+----------------+ | device_id | region | time | count_humidity | +-----------+----------+---------------------------+----------------+ | F07A1260 | north-cn | 2021-01-01T08:00:00+08:00 | 3 | | F07A1260 | north-cn | 2021-01-01T16:00:00+08:00 | 1 | +-----------+----------+---------------------------+----------------+
示例2:默認UTC對齊降采樣,指定窗口offset偏移,時間線按照8h時間區間聚合,開始窗口偏移3h,求count。
SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 8h offset 3h;
返回結果如下:
+-----------+----------+---------------------------+----------------+ | device_id | region | time | count_humidity | +-----------+----------+---------------------------+----------------+ | F07A1260 | north-cn | 2021-01-01T03:00:00+08:00 | 1 | | F07A1260 | north-cn | 2021-01-01T11:00:00+08:00 | 2 | | F07A1260 | north-cn | 2021-01-01T19:00:00+08:00 | 1 | +-----------+----------+---------------------------+----------------+
示例3: 默認UTC對齊降采樣,對齊到當地時間0點(例如東8區時間),時間線按照24h時間區間聚合,開始窗口偏移16h,求count。
SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 24h offset 16h
返回結果如下:
+-----------+----------+---------------------------+----------------+ | device_id | region | time | count_humidity | +-----------+----------+---------------------------+----------------+ | F07A1260 | north-cn | 2021-01-01T00:00:00+08:00 | 4 | +-----------+----------+---------------------------+----------------+
示例4:sample by目前不支持與group by、limit offset、order by配合,但可以配合子查詢使用。
SELECT device_id, max(avg_humidity) AS max_humidity FROM (SELECT device_id,region,time,avg(humidity) AS avg_humidity FROM sensor sample by 8h) group by device_id;
返回結果如下:
+-----------+--------------+ | device_id | max_humidity | +-----------+--------------+ | F07A1261 | 43.000000 | | F07A1260 | 47.000000 | +-----------+--------------+
示例5:配合limit offset限制結果條數。
SELECT device_id,region, avg_humidity FROM (select device_id,region,time,avg(humidity) AS avg_humidity FROM sensor sample by 8h) limit 1 offset 1;
返回結果如下:
+-----------+----------+--------------+ | device_id | region | avg_humidity | +-----------+----------+--------------+ | F07A1261 | north-cn | 40.000000 | +-----------+----------+--------------+
降采樣窗口插值示例
示例1:固定值插值。
SELECT * from (select device_id,region,time, avg(humidity) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h fill 1) order by device_id;
返回結果如下:
+-----------+----------+---------------------------+-----------+ | device_id | region | time | humidity | +-----------+----------+---------------------------+-----------+ | F07A1260 | north-cn | 2021-01-01T08:00:00+08:00 | 9.000000 | | F07A1260 | north-cn | 2021-01-01T10:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 45.000000 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 46.000000 | | F07A1260 | north-cn | 2021-01-01T16:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T18:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 47.000000 | +-----------+----------+---------------------------+-----------+
示例2:后值插值。
SELECT * from (select device_id,region,time,avg(humidity) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h fill after) order by device_id;
返回結果如下:
+-----------+----------+---------------------------+-----------+ | device_id | region | time | humidity | +-----------+----------+---------------------------+-----------+ | F07A1260 | north-cn | 2021-01-01T08:00:00+08:00 | 9.000000 | | F07A1260 | north-cn | 2021-01-01T10:00:00+08:00 | 45.000000 | | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 45.000000 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 46.000000 | | F07A1260 | north-cn | 2021-01-01T16:00:00+08:00 | 47.000000 | | F07A1260 | north-cn | 2021-01-01T18:00:00+08:00 | 47.000000 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 47.000000 | +-----------+----------+---------------------------+-----------+
降采樣后再轉換示例
示例1:先按照2h窗口avg降采樣,然后對降采樣后的數據計算
rate
斜率。SELECT device_id,region,time,rate(avg(humidity)) AS rate_humidity FROM sensor WHERE device_id='F07A1260' sample by 2h;
返回結果如下:
+-----------+----------+---------------------------+---------------+ | device_id | region | time | rate_humidity | +-----------+----------+---------------------------+---------------+ | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 0.002500 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 0.000139 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 0.000046 | +-----------+----------+---------------------------+---------------+
示例2:先按照2h窗口降采樣,然后對降采樣后的數據計算差值delta。
SELECT device_id,region,time,delta(avg(humidity)) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h;
返回結果如下:
+-----------+----------+---------------------------+-----------+ | device_id | region | time | humidity | +-----------+----------+---------------------------+-----------+ | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 36.000000 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 1.000000 | +-----------+----------+---------------------------+-----------+
插值
降采樣先把所有時間線按照指定時間窗口切分,并把每個降采樣區間內的數據做一次運算,降采樣后如果某個精度區間沒有值,插值可以指定在這個時間點填充具體的值。比如某條時間線降采樣后的時間戳為:t+0, t+20, t+30,此時如果不指定插值,只有3個值,如果指定了插值為1,此時間線會有4個值,其中t+10時刻的值為1。
插值函數表:
Fill Policy | 填充值 |
none | 默認行為,不填值。 |
zero | 固定填入0。 |
linear | 線性填充值。 |
previous | 之前的一個值。 |
near | 鄰近的一個值。 |
after | 之后的一個值。 |
fixed | 用指定的一個固定填充值。 |