本文通過示例為您介紹,如何快速將數據隨機寫入ClickHouse集群各個節點的本地表。
前提條件
已創建ClickHouse集群,詳情請參見創建ClickHouse集群。
操作步驟
使用SSH方式登錄ClickHouse集群,詳情請參見登錄集群。
執行以下命令,下載官方樣例數據集。
curl https://datasets.clickhouse.com/hits/tsv/hits_v1.tsv.xz | unxz --threads=`nproc` > hits_v1.tsv
執行如下命令,啟動ClickHouse客戶端。
clickhouse-client -h core-1-1 -m
說明本示例登錄core-1-1節點,如果您有多個Core節點,可以登錄任意一個節點。
執行如下命令,創建數據庫。
可以使用on CLUSTER參數在集群的所有節點創建數據庫,默認集群標識為cluster_emr。
CREATE DATABASE IF NOT EXISTS demo on CLUSTER cluster_emr;
返回信息如下所示。
在集群上的所有節點創建一張復制表(Replicated表)。
復制表(Replicated表)會根據副本的個數,實現數據的多副本,并實現數據的最終一致性。
CREATE TABLE demo.hits_local ON CLUSTER cluster_emr ( `WatchID` UInt64, `JavaEnable` UInt8, `Title` String, `GoodEvent` Int16, `EventTime` DateTime, `EventDate` Date, `CounterID` UInt32, `ClientIP` UInt32, `ClientIP6` FixedString(16), `RegionID` UInt32, `UserID` UInt64, `CounterClass` Int8, `OS` UInt8, `UserAgent` UInt8, `URL` String, `Referer` String, `URLDomain` String, `RefererDomain` String, `Refresh` UInt8, `IsRobot` UInt8, `RefererCategories` Array(UInt16), `URLCategories` Array(UInt16), `URLRegions` Array(UInt32), `RefererRegions` Array(UInt32), `ResolutionWidth` UInt16, `ResolutionHeight` UInt16, `ResolutionDepth` UInt8, `FlashMajor` UInt8, `FlashMinor` UInt8, `FlashMinor2` String, `NetMajor` UInt8, `NetMinor` UInt8, `UserAgentMajor` UInt16, `UserAgentMinor` FixedString(2), `CookieEnable` UInt8, `JavascriptEnable` UInt8, `IsMobile` UInt8, `MobilePhone` UInt8, `MobilePhoneModel` String, `Params` String, `IPNetworkID` UInt32, `TraficSourceID` Int8, `SearchEngineID` UInt16, `SearchPhrase` String, `AdvEngineID` UInt8, `IsArtifical` UInt8, `WindowClientWidth` UInt16, `WindowClientHeight` UInt16, `ClientTimeZone` Int16, `ClientEventTime` DateTime, `SilverlightVersion1` UInt8, `SilverlightVersion2` UInt8, `SilverlightVersion3` UInt32, `SilverlightVersion4` UInt16, `PageCharset` String, `CodeVersion` UInt32, `IsLink` UInt8, `IsDownload` UInt8, `IsNotBounce` UInt8, `FUniqID` UInt64, `HID` UInt32, `IsOldCounter` UInt8, `IsEvent` UInt8, `IsParameter` UInt8, `DontCountHits` UInt8, `WithHash` UInt8, `HitColor` FixedString(1), `UTCEventTime` DateTime, `Age` UInt8, `Sex` UInt8, `Income` UInt8, `Interests` UInt16, `Robotness` UInt8, `GeneralInterests` Array(UInt16), `RemoteIP` UInt32, `RemoteIP6` FixedString(16), `WindowName` Int32, `OpenerName` Int32, `HistoryLength` Int16, `BrowserLanguage` FixedString(2), `BrowserCountry` FixedString(2), `SocialNetwork` String, `SocialAction` String, `HTTPError` UInt16, `SendTiming` Int32, `DNSTiming` Int32, `ConnectTiming` Int32, `ResponseStartTiming` Int32, `ResponseEndTiming` Int32, `FetchTiming` Int32, `RedirectTiming` Int32, `DOMInteractiveTiming` Int32, `DOMContentLoadedTiming` Int32, `DOMCompleteTiming` Int32, `LoadEventStartTiming` Int32, `LoadEventEndTiming` Int32, `NSToDOMContentLoadedTiming` Int32, `FirstPaintTiming` Int32, `RedirectCount` Int8, `SocialSourceNetworkID` UInt8, `SocialSourcePage` String, `ParamPrice` Int64, `ParamOrderID` String, `ParamCurrency` FixedString(3), `ParamCurrencyID` UInt16, `GoalsReached` Array(UInt32), `OpenstatServiceName` String, `OpenstatCampaignID` String, `OpenstatAdID` String, `OpenstatSourceID` String, `UTMSource` String, `UTMMedium` String, `UTMCampaign` String, `UTMContent` String, `UTMTerm` String, `FromTag` String, `HasGCLID` UInt8, `RefererHash` UInt64, `URLHash` UInt64, `CLID` UInt32, `YCLID` UInt64, `ShareService` String, `ShareURL` String, `ShareTitle` String, `ParsedParams` Nested(Key1 String,Key2 String,Key3 String,Key4 String,Key5 String,ValueDouble Float64), `IslandID` FixedString(16), `RequestNum` UInt32, `RequestTry` UInt8 ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/hits_local', '{replica}') PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID);
說明{shard}和{replica}是阿里云EMR為ClickHouse集群自動生成的宏定義,可以直接使用。
執行以下命令,創建分布式(Distributed)表。
分布式表不存儲數據,僅僅是底層表的一個View,但可以在多個服務器上進行分布式查詢。本例中使用隨機函數rand(),表示數據會隨機寫入各個節點的本地表。
CREATE TABLE demo.hits_all on CLUSTER cluster_emr AS demo.hits_local ENGINE = Distributed(cluster_emr, demo, hits_local, rand());
退出ClickHouse客戶端,在樣例數據的目錄下執行以下命令,導入數據。
clickhouse-client -h core-1-1 --query "INSERT INTO demo.hits_all FORMAT TSV" --max_insert_block_size=100000 < hits_v1.tsv;
重新啟動ClickHouse客戶端,查看數據。
因為數據是隨機寫入的,各節點數據量可能不同。
查看core-1-1節點demo.hits_all的數據量。
select count(*) from demo.hits_all;
查看core-1-1節點demo.hits_local的數據量。
select count(*) from demo.hits_local;
查看core-1-2節點demo.hits_local的數據量。
說明其余節點,您也可以按照以下步驟來查看demo.hits_local的數據量。節點名稱您可以在EMR控制臺的節點管理頁面查看。
執行以下命令,登錄ClickHouse客戶端。
clickhouse-client -h core-1-2 -m
在ClickHouse客戶端,執行以下命令,查看demo.hits_local的數據量。
select count(*) from demo.hits_local;