diff --git a/README.md b/README.md index be6d370..bca123d 100644 --- a/README.md +++ b/README.md @@ -86,7 +86,7 @@ TODO ## 七、Sqoop -1. Sqoop简介 +1. [Sqoop简介与安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Sqoop简介与安装.md) 2. Sqoop的基本使用 diff --git a/notes/Sqoop简介与安装.md b/notes/Sqoop简介与安装.md new file mode 100644 index 0000000..dcd333a --- /dev/null +++ b/notes/Sqoop简介与安装.md @@ -0,0 +1,149 @@ +# Sqoop 简介与安装 + + + + +## 一、Sqoop 简介 + +一言以蔽之,Sqoop就是一个数据迁移工具。主要就是实现数据的导入与导出。 + ++ 导入数据:从MySQL,Oracle等关系型数据库中导入数据到HDFS、Hive、HBase等分布式文件存储系统中 + ++ 导出数据:从 分布式文件系统中导出数据到关系数据库中。 + +其原理就是将命令转化成 MapReduce 作业来实现数据的迁移。下图就很好的体现了Sqoop的功能和原理。 + +
+ +## 二、安装 + +这里先说一下版本的选择: + +目前Sqoop有Sqoop 1和Sqoop 2两个版本,但是截至到目前,官方并不推荐使用Sqoop 2,因为其与Sqoop 1并不兼容,且功能还没有完善,所以这里优先推荐使用Sqoop 1。 + +
+ + + +### 2.1 下载并解压 + +下载所需版本的Sqoop ,这里我下载的是`cdh5.15.2`版本,对应Sqoop版本为`1.4.6` 。下载地址为:http://archive.cloudera.com/cdh5/cdh/5/ + +```shell +# 下载后进行解压 +tar -zxvf sqoop-1.4.6-cdh5.15.2.tar.gz +``` + +### 2.2 配置环境变量 + +```shell +# vim /etc/profile +``` + +添加环境变量: + +```shell +export SQOOP_HOME=/usr/app/sqoop-1.4.6-cdh5.15.2 +export PATH=$SQOOP_HOME/bin:$PATH +``` + +使得配置的环境变量立即生效: + +```shell +# source /etc/profile +``` + +### 2.3 修改配置 + +进入安装目录下的`conf/`目录,拷贝Sqoop的环境配置模板`sqoop-env.sh.template` + +```shell +# cp sqoop-env-template.sh sqoop-env.sh +``` + +修改`sqoop-env.sh`,增加如下配置(以下配置中`HADOOP_COMMON_HOME`和`HADOOP_MAPRED_HOME`是必须的,其他都是可选的): + +```shell +# Set Hadoop-specific environment variables here. +#Set path to where bin/hadoop is available +export HADOOP_COMMON_HOME=/usr/app/hadoop-2.6.0-cdh5.15.2 + +#Set path to where hadoop-*-core.jar is available +export HADOOP_MAPRED_HOME=/usr/app/hadoop-2.6.0-cdh5.15.2 + +#set the path to where bin/hbase is available +export HBASE_HOME=/usr/app/hbase-1.2.0-cdh5.15.2 + +#Set the path to where bin/hive is available +export HIVE_HOME=/usr/app/hive-1.1.0-cdh5.15.2 + +#Set the path for where zookeper config dir is +export ZOOCFGDIR=/usr/app/zookeeper-3.4.13/conf + +``` + +### 2.4 拷贝数据库驱动 + +将MySQL驱动拷贝到Sqoop安装目录的`lib`目录下, MySQL驱动的下载地址为https://dev.mysql.com/downloads/connector/j/ , 在本仓库的[resources](https://github.com/heibaiying/BigData-Notes/tree/master/resources)目录下我也上传了一份,有需要的可以自行下载。 + +
+ + + +### 2.5 验证 + +由于已经将sqoop的bin目录配置到环境变量,直接使用以下命令验证是否配置成功 + +```shell +# sqoop version +``` + +出现对应的版本信息则代表配置成功 + +
+ +这里出现的两个`Warning`警告是因为我们本身就没有用到`HCatalog`,`Accumulo`等软件,忽略即可。默认会去检查在环境变量中是否有配置以上软件,这些都是在`bin/configure-sqoop`文件中配置的,如果想去除这些警告,注释掉不必要的检查即可。 + +```shell +# Check: If we can't find our dependencies, give up here. +if [ ! -d "${HADOOP_COMMON_HOME}" ]; then + echo "Error: $HADOOP_COMMON_HOME does not exist!" + echo 'Please set $HADOOP_COMMON_HOME to the root of your Hadoop installation.' + exit 1 +fi +if [ ! -d "${HADOOP_MAPRED_HOME}" ]; then + echo "Error: $HADOOP_MAPRED_HOME does not exist!" + echo 'Please set $HADOOP_MAPRED_HOME to the root of your Hadoop MapReduce installation.' + exit 1 +fi + +## Moved to be a runtime check in sqoop. +if [ ! -d "${HBASE_HOME}" ]; then + echo "Warning: $HBASE_HOME does not exist! HBase imports will fail." + echo 'Please set $HBASE_HOME to the root of your HBase installation.' +fi + +## Moved to be a runtime check in sqoop. +if [ ! -d "${HCAT_HOME}" ]; then + echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail." + echo 'Please set $HCAT_HOME to the root of your HCatalog installation.' +fi + +if [ ! -d "${ACCUMULO_HOME}" ]; then + echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail." + echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.' +fi +if [ ! -d "${ZOOKEEPER_HOME}" ]; then + echo "Warning: $ZOOKEEPER_HOME does not exist! Accumulo imports will fail." + echo 'Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.' +fi +``` + diff --git a/pictures/sqoop-mysql-connect.png b/pictures/sqoop-mysql-connect.png new file mode 100644 index 0000000..86ffd00 Binary files /dev/null and b/pictures/sqoop-mysql-connect.png differ diff --git a/pictures/sqoop-mysql-jar.png b/pictures/sqoop-mysql-jar.png new file mode 100644 index 0000000..25ab79d Binary files /dev/null and b/pictures/sqoop-mysql-jar.png differ diff --git a/pictures/sqoop-tool.png b/pictures/sqoop-tool.png new file mode 100644 index 0000000..32c189e Binary files /dev/null and b/pictures/sqoop-tool.png differ diff --git a/pictures/sqoop-version-selected.png b/pictures/sqoop-version-selected.png new file mode 100644 index 0000000..56ae7af Binary files /dev/null and b/pictures/sqoop-version-selected.png differ diff --git a/pictures/sqoop-version.png b/pictures/sqoop-version.png new file mode 100644 index 0000000..217cb69 Binary files /dev/null and b/pictures/sqoop-version.png differ