使用實時物化視圖加速帶可變參數(shù)的查詢
物化視圖或?qū)崟r物化視圖都需要針對查詢提前創(chuàng)建,在查詢語句帶有可變參數(shù)的場景中(例如查詢某一天或者某一個小時內(nèi)所有訂單的數(shù)量),您可以通過實時物化視圖配合自動查詢改寫功能來加速帶有可變參數(shù)的查詢。
操作方法
以下內(nèi)容以TPC-H Q1的查詢?yōu)槔?,介紹如何通過實時物化視圖與自動查詢改寫功能對帶有可變參數(shù)的查詢進(jìn)行加速。
該查詢語句引用于TPC-H。
本文的TPC-H的實現(xiàn)基于TPC-H的基準(zhǔn)測試,并不能與已發(fā)布的TPC-H基準(zhǔn)測試結(jié)果相比較,本文中的測試并不符合TPC-H基準(zhǔn)測試的所有要求。
建表語句如下:
CREATE TABLE lineitem (
l_orderkey bigint not null,
l_partkey integer not null,
l_suppkey integer not null,
l_linenumber integer not null,
l_quantity numeric not null,
l_extendedprice numeric not null,
l_discount numeric not null,
l_tax numeric not null,
l_returnflag "char" not null,
l_linestatus "char" not null,
l_shipdate date not null,
l_commitdate date not null,
l_receiptdate date not null,
l_shipinstruct char(25) not null,
l_shipmode char(10) not null,
l_comment varchar(44) not null
) DISTRIBUTED BY (l_orderkey);
TPC-H Q1查詢語句如下:
SELECT l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
FROM
lineitem
WHERE
l_shipdate <= date '1998-12-01' - interval '$1' day --動態(tài)條件 $1 = 取值范圍為[60,120]
GROUP BY
l_returnflag,
l_linestatus
ORDER BY
l_returnflag,
l_linestatus
LIMIT 1;
通過TPC-H Q1的查詢可以看出,WHERE條件中的l_shipdate的范圍根據(jù)業(yè)務(wù)會在[60,120]
之間變化。
根據(jù)TPC-H Q1查詢語句,物化視圖設(shè)計如下:
CREATE INCREMENTAL MATERIALIZED VIEW q1_mv
AS
SELECT
l_returnflag,
l_linestatus,
l_shipdate,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
count(*) as count_order,
sum(l_extendedprice) as sum_price,
sum(l_discount) as sum_disc,
count(l_quantity) as count_qty,
count(l_extendedprice) as count_price,
count(l_discount) as count_disc
FROM
lineitem
WHERE
l_shipdate <= date '1998-12-01' - interval '60' day
GROUP BY
l_returnflag,
l_linestatus,
l_shipdate
DISTRIBUTED BY (l_returnflag, l_linestatus);
以上操作中自動查詢改寫主要使用了如下特性:
重聚合補償
按照SQL執(zhí)行規(guī)則,SQL會先執(zhí)行WHERE條件過濾,再執(zhí)行GROUP BY。TPC-H Q1的原始查詢GROUP BY只有
l_returnflag,l_linestatus
2個字段,而為了支持WHERE條件中包含可變參數(shù),創(chuàng)建的實時物化視圖GROUP BY中包含了l_returnflag,l_linestatus
和l_shipdate
3個字段。在執(zhí)行真實查詢時,在物化視圖上應(yīng)用動態(tài)的WHERE條件,然后執(zhí)行重聚合,最終得到GROUP BYl_returnflag,l_linestatus
2個字段的結(jié)果。說明由于avg函數(shù)不支持重聚合,所以這里利用了自動查詢改寫的表達(dá)式補償機制,在物化視圖中創(chuàng)建了支持重聚合的sum和count,通過
avg = sum/count
這一特性完成改寫。WHERE條件范圍匹配
雖然在創(chuàng)建物化視圖時不指定WHERE條件情況下,自動查詢改寫也會補充上條件,但是該業(yè)務(wù)限制了查詢條件動態(tài)變化的范圍為
[60,120]
。在創(chuàng)建物化視圖時,您可以將WHERE條件范圍縮小至60,物化視圖即可包含60~120間的任意取值。自動改寫查詢會判斷實際查詢SQL的條件是否屬于物化視圖的子集,如果屬于,將進(jìn)行改寫并補償需要增加的條件。
執(zhí)行查詢計劃,查看物化視圖的使用情況。查詢計劃示例如下:
EXPLAIN SELECT
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
FROM
lineitem
WHERE
l_shipdate <= date '1998-12-01' - interval '100' day
GROUP BY
l_returnflag,
l_linestatus
ORDER BY
l_returnflag,
l_linestatus
LIMIT 1;
返回示例如下:
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Limit (cost=1.01..1.13 rows=1 width=234)
-> Gather Motion 3:1 (slice1; segments: 3) (cost=1.01..1.13 rows=1 width=234)
Merge Key: l_returnflag, l_linestatus
-> Limit (cost=1.01..1.11 rows=1 width=234)
-> GroupAggregate (cost=1.01..1.11 rows=1 width=234)
Group Key: l_returnflag, l_linestatus
-> Sort (cost=1.01..1.01 rows=1 width=194)
Sort Key: l_returnflag, l_linestatus
-> Seq Scan on q1_mv (cost=0.00..1.00 rows=1 width=194)
Filter: (l_shipdate <= '1998-08-23 00:00:00'::timestamp without time zone)
Optimizer: Postgres query optimizer
(11 rows)
測試結(jié)果
TPC-H Q1查詢語句經(jīng)過實時物化視圖與自動查詢改寫的加速后,查詢速度得到了巨大的提升。測試數(shù)據(jù)表明,在16個計算節(jié)點組成的AnalyticDB PostgreSQL版實例上進(jìn)行1 TB數(shù)據(jù)的TPC-H測試,Q1的查詢使用實時物化視圖配合自動查詢改寫,可以將查詢時間從約340s優(yōu)化至0.04s,查詢速度提升接近一萬倍。