丰满人妻系列无码专区,国产真人做受免费视频,亚洲无码国产一区二区

Hologres從V2.0版本開始支持Runtime Filter，在多表Join場景下自動(dòng)優(yōu)化Join過程的過濾行為，提升Join的查詢性能。本文為您介紹在Hologres中Runtime Filter的使用。

背景信息

應(yīng)用場景

Hologres從V2.0版本開始支持Runtime Filter，通常應(yīng)用在多表（兩表及以上）Join的Hash Join場景，尤其是大表Join小表的場景中，無需手動(dòng)設(shè)置，優(yōu)化器和執(zhí)行引擎會(huì)在查詢時(shí)自動(dòng)優(yōu)化Join過程的過濾行為，從而降低I/O開銷，以提升Join的查詢性能。

原理介紹

在了解Runtime Filter原理之前，需先了解Join過程。兩個(gè)表Join的SQL示例如下：

select * from test1 join test2 on test1.x = test2.x;

其對(duì)應(yīng)的執(zhí)行計(jì)劃如下。

image..png

如上執(zhí)行計(jì)劃，兩個(gè)表Join時(shí)，會(huì)通過test2表構(gòu)建Hash表，然后匹配test1表的數(shù)據(jù)，最后返回結(jié)果。在這個(gè)過程中，Join時(shí)會(huì)涉及到兩個(gè)名詞：

build端：兩表（或者子查詢）做Hash Join時(shí)，其中一張表（子查詢）的數(shù)據(jù)會(huì)構(gòu)建成Hash表，這一部分稱為build端，對(duì)應(yīng)計(jì)劃里的Hash節(jié)點(diǎn)。
probe端：Hash Join的另一邊，主要是讀取數(shù)據(jù)然后和build端的Hash表進(jìn)行匹配，這一部分稱為probe端。

通常來說，在執(zhí)行計(jì)劃正確的情況下，小表是build端，大表是probe端。

Runtime Filter的原理就是在HashJoin過程中，利用build端的數(shù)據(jù)分布，生成一個(gè)輕量的過濾器（filter），發(fā)送給probe端，對(duì)probe端的數(shù)據(jù)進(jìn)行裁剪，從而減少probe端真正參與Hash Join以及網(wǎng)絡(luò)傳輸?shù)臄?shù)據(jù)量，以此來提升Hash Join性能。

因此Runtime Filter更適用于大小表Join，且表數(shù)據(jù)量相差較大的場景，性能將會(huì)比普通Join有更多的提升。

使用限制和觸發(fā)條件

使用限制

僅Hologres V2.0及以上版本支持Runtime Filter。
僅支持Join條件中只有一個(gè)字段，如果有多個(gè)字段將不會(huì)觸發(fā)Runtime Filter。從Hologres V2.1版本開始，Runtime Filter支持多個(gè)字段Join，如果多個(gè)Join字段滿足觸發(fā)條件，也會(huì)觸發(fā)Runtime Filter。

觸發(fā)條件

Hologres本身支持高性能的Join，因此Runtime Filter會(huì)根據(jù)查詢條件在底層自動(dòng)觸發(fā)，但是需要SQL滿足下述所有條件才能觸發(fā)：

probe端的數(shù)據(jù)量在100000行及以上。
掃描的數(shù)據(jù)量比例：build端 / probe端 <= 0.1（比例越小，越容易觸發(fā)Runtime Filter）。
Join出的數(shù)據(jù)量比例：build端 / probe端 <= 0.1（比例越小，越容易觸發(fā)Runtime Filter）。

Runtime Filter的類型

可以根據(jù)以下兩個(gè)維度對(duì)Runtime Filter進(jìn)行分類。

按照Hash Join的probe端是否需要進(jìn)行Shuffle，可分為Local和Global類型。
- Local類型：Hologres V2.0及以上版本支持。當(dāng)Hash Join的probe端不需要Shuffle時(shí)，build端數(shù)據(jù)有如下三種情況，均可以使用Local類型的Runtime Filter：
  - build端和probe端的Join Key是同一種分布方式。
  - build端數(shù)據(jù)broadcast給probe端。
  - build端數(shù)據(jù)按照probe端數(shù)據(jù)的分布方式Shuffle給Probe端。
- Global類型：Hologres V2.2及以上版本支持。當(dāng)probe端數(shù)據(jù)需要Shuffle時(shí)，Runtime Filter需要合并后才可以使用，這種情況需要使用Global類型的Runtime Filter。
Local類型的Runtime Filer僅可能減少數(shù)據(jù)掃描量以及參與Hash Join計(jì)算的數(shù)據(jù)量，Global類型的Runtime Filter由于probe端數(shù)據(jù)會(huì)Shuffle，在數(shù)據(jù)Shuffle之前做過濾還可以減少數(shù)據(jù)的網(wǎng)絡(luò)傳輸量。類型都無需手動(dòng)指定，引擎會(huì)自適應(yīng)。
按照Filter類型，可分為Bloom Filter、In Filter和MinMAX Filter。
- Bloom Filter：Hologres V2.0及以上版本支持。Bloom Filter具有一定假陽性，導(dǎo)致少過濾一些數(shù)據(jù)，但其應(yīng)用范圍廣，在build端數(shù)據(jù)量較多是仍能有較高的過濾效率，提升查詢性能。
- In Filter：Hologres V2.0及以上版本支持。In Filter在build端數(shù)據(jù)NDV（Number of Distinct Value，列的非重復(fù)值的個(gè)數(shù)）較小時(shí)使用，其會(huì)使用build端數(shù)據(jù)構(gòu)建一個(gè)HashSet發(fā)送給probe端進(jìn)行過濾，In Filter的優(yōu)勢(shì)是可以過濾所有應(yīng)該過濾的數(shù)據(jù)，且可以和Bitmap索引結(jié)合使用。
- MinMAX Filter：Hologres V2.0及以上版本支持。MinMAX Filter會(huì)根據(jù)build端數(shù)據(jù)得到最大值和最小值，發(fā)送給probe端做過濾，其優(yōu)勢(shì)為可能根據(jù)元數(shù)據(jù)信息直接過濾掉文件或一個(gè)Batch的數(shù)據(jù)，減少I/O成本。
三種Filter類型無需您手動(dòng)指定，Hologres會(huì)根據(jù)運(yùn)行時(shí)Join情況自適應(yīng)使用各種類型的Filter。

驗(yàn)證Runtime Filter

如下示例幫助您更好地理解Runtime Filter。

示例1：Join條件中只有1列，使用Local類型Runtime Filter

示例代碼：

begin; 
create table test1(x int, y int);
call set_table_property('test1', 'distribution_key', 'x');

create table test2(x int, y int);
call set_table_property('test2', 'distribution_key', 'x');
end;

insert into test1 select t, t from generate_series(1, 100000) t;
insert into test2 select t, t from generate_series(1, 1000) t;
analyze test1;
analyze test2;

explain analyze select * from test1 join test2 on test1.x = test2.x;

執(zhí)行計(jì)劃：
- test2表只有1000行，test1表有100000行，build端和probe端的數(shù)據(jù)量比例是0.01，小于0.1，且Join出來的數(shù)據(jù)量build端和probe端比例是0.01，小于0.1，滿足Runtime Filter的默認(rèn)觸發(fā)條件，因此引擎會(huì)自動(dòng)使用Runtime Filter。
- probe端的test1表有Runtime Filter Target Expr節(jié)點(diǎn)，表示probe端使用了Runtime Filter下推。
- probe端的scan_rows代表從存儲(chǔ)中讀取的數(shù)據(jù)，有100000行，rows代表使用Runtime Filter過濾后，scan算子的行數(shù)，為1000行，可以從這兩個(gè)數(shù)據(jù)上看Runtime Filter的過濾效果。

示例2：Join條件中有多列（Hologres V2.1版本支持），使用Local類型Runtime Filter

示例代碼：

drop table if exists test1, test2;
begin;
create table test1(x int, y int);
create table test2(x int, y int);
end;
insert into test1 select t, t from generate_series(1, 1000000) t;
insert into test2 select t, t from generate_series(1, 1000) t;
analyze test1;
analyze test2;

explain analyze select * from test1 join test2 on test1.x = test2.x and test1.y = test2.y;

執(zhí)行計(jì)劃：
- Join條件有多列，Runtime Filter也生成了多列。
- build端broadcast，可以使用Local類型的Runtime Filter。

示例3：Global類型Runtime Filter支持Shuffle Join（Hologres V2.2版本支持）

示例代碼：

SET hg_experimental_enable_result_cache = OFF;

drop table if exists test1, test2;
begin;
create table test1(x int, y int);
create table test2(x int, y int);
end;
insert into test1 select t, t from generate_series(1, 100000) t;
insert into test2 select t, t from generate_series(1, 1000) t;
analyze test1;
analyze test2;

explain analyze select * from test1 join test2 on test1.x = test2.x;

執(zhí)行計(jì)劃：
從上述執(zhí)行計(jì)劃可以看出，probe端數(shù)據(jù)被Shuffle到Hash Join算子，引擎會(huì)自動(dòng)使用Global Runtime Filter來加速查詢。

示例4：In類型的Filter結(jié)合bitmap索引（Hologres V2.2版本支持）

示例代碼：

set hg_experimental_enable_result_cache=off;

drop table if exists test1, test2;

begin;
create table test1(x text, y text);
call set_table_property('test1', 'distribution_key', 'x');
call set_table_property('test1', 'bitmap_columns', 'x');
call set_table_property('test1', 'dictionary_encoding_columns', '');

create table test2(x text, y text);
call set_table_property('test2', 'distribution_key', 'x');
end;

insert into test1 select t::text, t::text from generate_series(1, 10000000) t;

insert into test2 select t::text, t::text from generate_series(1, 50) t;

analyze test1;
analyze test2;

explain analyze select * from test1 join test2 on test1.x = test2.x;

執(zhí)行計(jì)劃：
從上述執(zhí)行計(jì)劃可以看出，在probe端的scan算子上，使用了bitmap，因?yàn)镮n Filter可以精確過濾，因此過濾后還剩50行，scan算子中的scan_rows為700多萬，比原始行數(shù)1000萬少，這是因?yàn)镮n Filter可以推到存儲(chǔ)引擎，有可能減少I/O成本，最終結(jié)果是從存儲(chǔ)引擎中讀取的數(shù)據(jù)變少了，In類型的Runtime Filter結(jié)合bitmap通常在Join Key為STRING類型時(shí)，有明顯作用。

示例5：MinMax類型的Filter減少I/O（Hologres V2.2版本支持）

示例代碼：

set hg_experimental_enable_result_cache=off;

drop table if exists test1, test2;

begin;
create table test1(x int, y int);
call set_table_property('test1', 'distribution_key', 'x');

create table test2(x int, y int);
call set_table_property('test2', 'distribution_key', 'x');
end;

insert into test1 select t::int, t::int from generate_series(1, 10000000) t;
insert into test2 select t::int, t::int from generate_series(1, 100000) t;

analyze test1;
analyze test2;

explain analyze select * from test1 join test2 on test1.x = test2.x;

執(zhí)行計(jì)劃：
從上述執(zhí)行計(jì)劃可以看出，probe端scan算子從存儲(chǔ)引擎讀取的行數(shù)為32萬多，比原始行數(shù)1000萬少了很多，這是因?yàn)镽untime Filter被下推到存儲(chǔ)引擎，利用一個(gè)batch數(shù)據(jù)的meta信息整批過濾數(shù)據(jù)，有可能大量減少I/O成本。通常在Join Key為數(shù)值類型，且build端值域范圍比probe端的值域范圍小時(shí)，有明顯效果。

日本熟妇hd丰满老熟妇,中文字幕一区二区三区在线不卡 ,亚洲成片在线观看,免费女同在线一区二区

背景信息

應(yīng)用場景

原理介紹

使用限制和觸發(fā)條件

使用限制

觸發(fā)條件

Runtime Filter的類型

驗(yàn)證Runtime Filter

示例1：Join條件中只有1列，使用Local類型Runtime Filter

示例2：Join條件中有多列（Hologres V2.1版本支持），使用Local類型Runtime Filter

示例3：Global類型Runtime Filter支持Shuffle Join（Hologres V2.2版本支持）

示例4：In類型的Filter結(jié)合bitmap索引（Hologres V2.2版本支持）

示例5：MinMax類型的Filter減少I/O（Hologres V2.2版本支持）