本章節介紹如何將對象存儲OSS上的數據遷移到LindormDFS。
準備工作
開通LindormDFS,詳情請參見開通指南 。
搭建Hadoop集群。建議您使用的Hadoop版本不低于2.7.3,本文檔中使用的Hadoop版本為Apache Hadoop 2.7.3,修改Hadoop配置信息,詳情參見使用開源HDFS客戶端訪問。
在Hadoop集群所有節點上安裝JDK,本操作要求JDK版本不低于1.8。
在Hadoop集群安裝OSS客戶端JindoFS SDK。JindoFS SDK詳細介紹請參見JindoFS SDK。
下載 jindofs-sdk.jar。
cp ./jindofs-sdk-*.jar ${HADOOP_HOME}/share/hadoop/hdfs/lib/
為Hadoop集群所有節點創建JindoFS SDK配置文件。
添加如下環境變量到
/etc/profile
文件。
export B2SDK_CONF_DIR=/etc/jindofs-sdk-conf
創建OSS存儲工具配置文件
/etc/jindofs-sdk-conf/bigboot.cfg
。
[bigboot] logger.dir=/tmp/bigboot-log[bigboot-client] client.oss.retry=5 client.oss.upload.threads=4 client.oss.upload.queue.size=5 client.oss.upload.max.parallelism=16 client.oss.timeout.millisecond=30000 client.oss.connection.timeout.millisecond=4000
加載環境變量使之生效。
source /etc/profile
驗證是否可以在Hadoop 集群上使用OSS。
${HADOOP_HOME}/bin/hadoop fs -ls oss://<accessKeyId>:<accessKeySecret>@<bucket-name>.<endpoint>/
將對象存儲OSS數據遷移到LindormDFS
檢查并且確定需要遷移的數據大小。
${HADOOP_HOME}/bin/hadoop du -h oss://<accessKeyId>:<accessKeySecret>@<bucket-name>.<endpoint>/test_data
啟動Hadoop MapReduce任務(DistCp)將測試數據遷移至LindormDFS。
${HADOOP_HOME}/bin/hadoop distcp \ oss://<accessKeyId>:<accessKeySecret>@<bucket-name>.<endpoint>/test_data.txt \ hdfs://${實例Id}/
其中${實例Id}請根據您的實際情況進行修改。
參數說明如下表所示:
參數
說明
accessKeyId
訪問對象存儲OSS API的密鑰。獲取方式請參見創建AccessKey。
accessKeySecret
bucket-name.endpoint
對象存儲OSS的訪問域名,包括存儲空間(Bucket)名稱和對應的地域域名(Endpoint)地址。
任務執行完成后,查看遷移結果。
如果回顯包含如下類似信息,說明遷移成功。
20/09/29 12:23:59 INFO mapreduce.Job: map 100% reduce 0% 20/09/29 12:23:59 INFO mapreduce.Job: Job job_1601195105349_0015 completed successfully 20/09/29 12:23:59 INFO mapreduce.Job: Counters: 38 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=122343 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=470 HDFS: Number of bytes written=47047709 HDFS: Number of read operations=15 HDFS: Number of large read operations=0 HDFS: Number of write operations=4 OSS: Number of bytes read=0 OSS: Number of bytes written=0 OSS: Number of read operations=0 OSS: Number of large read operations=0 OSS: Number of write operations=0 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=5194 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=5194 Total vcore-milliseconds taken by all map tasks=5194 Total megabyte-milliseconds taken by all map tasks=5318656 Map-Reduce Framework Map input records=1 Map output records=0 Input split bytes=132 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=64 CPU time spent (ms)=2210 Physical memory (bytes) snapshot=222294016 Virtual memory (bytes) snapshot=2672074752 Total committed heap usage (bytes)=110100480 File Input Format Counters Bytes Read=338 File Output Format Counters Bytes Written=0 org.apache.hadoop.tools.mapred.CopyMapper$Counter BYTESCOPIED=47047709 BYTESEXPECTED=47047709 COPY=1 20/09/29 12:23:59 INFO common.AbstractJindoFileSystem: Read total statistics: oss read average -1 us, cache read average -1 us, read oss percent 0%
驗證遷移結果。
查看遷移到LindormDFS的測試數據大小。
${HADOOP_HOME}/bin/hadoop fs -du -s -h hdfs://${實例Id}/
文檔內容是否對您有幫助?