本文介紹如何使用Impala或Presto查詢JindoFS上的數據。

JindoFS配置

以EMR-3.35版本為例,創建名為emr-jfs的命名空間,相關配置參數示例如下:
  • jfs.namespaces=emr-jfs
  • jfs.namespaces.emr-jfs.oss.uri=oss://oss-bucket/oss-dir
  • jfs.namespaces.emr-jfs.mode=block

使用JindoFS

目前, 在E-MapReduce 3.22.0及以上版本,Impala或Presto支持Hive元數據的讀取,對于存儲在JindoFS的Hive數據表,E-MapReduce Impala/Presto可以直接讀取。

同樣,也可以將建表語句的Location設置為JindoFS路徑,即可實現表數據落在JindoFS上。

例如,原有HDFS上的建表語句如下:

Create external table lineitem (L_ORDERKEY INT, L_PARTKEY INT, L_SUPPKEY INT, L_LINENUMBER INT, L_QUANTITY DOUBLE, L_EXTENDEDPRICE DOUBLE, L_DISCOUNT DOUBLE, L_TAX DOUBLE, L_RETURNFLAG STRING, L_LINESTATUS STRING, L_SHIPDATE STRING, L_COMMITDATE STRING, L_RECEIPTDATE STRING, L_SHIPINSTRUCT STRING, L_SHIPMODE STRING, L_COMMENT STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'hdfs:///tpch_impala/lineitem';

相應的改成如下命令即可:

Create external table lineitem (L_ORDERKEY INT, L_PARTKEY INT, L_SUPPKEY INT, L_LINENUMBER INT, L_QUANTITY DOUBLE, L_EXTENDEDPRICE DOUBLE, L_DISCOUNT DOUBLE, L_TAX DOUBLE, L_RETURNFLAG STRING, L_LINESTATUS STRING, L_SHIPDATE STRING, L_COMMITDATE STRING, L_RECEIPTDATE STRING, L_SHIPINSTRUCT STRING, L_SHIPMODE STRING, L_COMMENT STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'jfs://emr-jfs/tpch_impala/lineitem';