BigData-Notes/notes/Sqoop简介与安装.md
2019-04-07 17:35:20 +08:00

150 lines
5.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Sqoop 简介与安装
<nav>
<a href="#一Sqoop-简介">一、Sqoop 简介</a><br/>
<a href="#二安装">二、安装</a><br/>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#21-下载并解压">2.1 下载并解压</a><br/>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#22-配置环境变量">2.2 配置环境变量</a><br/>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#23-修改配置">2.3 修改配置</a><br/>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#24-拷贝数据库驱动">2.4 拷贝数据库驱动</a><br/>
&nbsp;&nbsp;&nbsp;&nbsp;<a href="#25-验证">2.5 验证</a><br/>
</nav>
## 一、Sqoop 简介
一言以蔽之Sqoop就是一个数据迁移工具。主要就是实现数据的导入与导出。
+ 导入数据从MySQLOracle等关系型数据库中导入数据到HDFS、Hive、HBase等分布式文件存储系统中
+ 导出数据:从 分布式文件系统中导出数据到关系数据库中。
其原理就是将命令转化成 MapReduce 作业来实现数据的迁移。下图就很好的体现了Sqoop的功能和原理。
<div align="center"> <img src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/sqoop-tool.png"/> </div>
## 二、安装
这里先说一下版本的选择:
目前Sqoop有Sqoop 1和Sqoop 2两个版本但是截至到目前官方并不推荐使用Sqoop 2因为其与Sqoop 1并不兼容且功能还没有完善所以这里优先推荐使用Sqoop 1。
<div align="center"> <img src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/sqoop-version-selected.png"/> </div>
### 2.1 下载并解压
下载所需版本的Sqoop ,这里我下载的是`cdh5.15.2`版本对应Sqoop版本为`1.4.6` 。下载地址为http://archive.cloudera.com/cdh5/cdh/5/
```shell
# 下载后进行解压
tar -zxvf sqoop-1.4.6-cdh5.15.2.tar.gz
```
### 2.2 配置环境变量
```shell
# vim /etc/profile
```
添加环境变量:
```shell
export SQOOP_HOME=/usr/app/sqoop-1.4.6-cdh5.15.2
export PATH=$SQOOP_HOME/bin:$PATH
```
使得配置的环境变量立即生效:
```shell
# source /etc/profile
```
### 2.3 修改配置
进入安装目录下的`conf/`目录拷贝Sqoop的环境配置模板`sqoop-env.sh.template`
```shell
# cp sqoop-env-template.sh sqoop-env.sh
```
修改`sqoop-env.sh`,增加如下配置(以下配置中`HADOOP_COMMON_HOME``HADOOP_MAPRED_HOME`是必须的,其他都是可选的):
```shell
# Set Hadoop-specific environment variables here.
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/usr/app/hadoop-2.6.0-cdh5.15.2
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/usr/app/hadoop-2.6.0-cdh5.15.2
#set the path to where bin/hbase is available
export HBASE_HOME=/usr/app/hbase-1.2.0-cdh5.15.2
#Set the path to where bin/hive is available
export HIVE_HOME=/usr/app/hive-1.1.0-cdh5.15.2
#Set the path for where zookeper config dir is
export ZOOCFGDIR=/usr/app/zookeeper-3.4.13/conf
```
### 2.4 拷贝数据库驱动
将MySQL驱动拷贝到Sqoop安装目录的`lib`目录下, MySQL驱动的下载地址为https://dev.mysql.com/downloads/connector/j/ , 在本仓库的[resources](https://github.com/heibaiying/BigData-Notes/tree/master/resources)目录下我也上传了一份,有需要的可以自行下载。
<div align="center"> <img src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/sqoop-mysql-jar.png"/> </div>
### 2.5 验证
由于已经将sqoop的bin目录配置到环境变量直接使用以下命令验证是否配置成功
```shell
# sqoop version
```
出现对应的版本信息则代表配置成功
<div align="center"> <img src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/sqoop-version.png"/> </div>
这里出现的两个`Warning`警告是因为我们本身就没有用到`HCatalog`,`Accumulo`等软件,忽略即可。默认会去检查在环境变量中是否有配置以上软件,这些都是在`bin/configure-sqoop`文件中配置的,如果想去除这些警告,注释掉不必要的检查即可。
```shell
# Check: If we can't find our dependencies, give up here.
if [ ! -d "${HADOOP_COMMON_HOME}" ]; then
echo "Error: $HADOOP_COMMON_HOME does not exist!"
echo 'Please set $HADOOP_COMMON_HOME to the root of your Hadoop installation.'
exit 1
fi
if [ ! -d "${HADOOP_MAPRED_HOME}" ]; then
echo "Error: $HADOOP_MAPRED_HOME does not exist!"
echo 'Please set $HADOOP_MAPRED_HOME to the root of your Hadoop MapReduce installation.'
exit 1
fi
## Moved to be a runtime check in sqoop.
if [ ! -d "${HBASE_HOME}" ]; then
echo "Warning: $HBASE_HOME does not exist! HBase imports will fail."
echo 'Please set $HBASE_HOME to the root of your HBase installation.'
fi
## Moved to be a runtime check in sqoop.
if [ ! -d "${HCAT_HOME}" ]; then
echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
fi
if [ ! -d "${ACCUMULO_HOME}" ]; then
echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
fi
if [ ! -d "${ZOOKEEPER_HOME}" ]; then
echo "Warning: $ZOOKEEPER_HOME does not exist! Accumulo imports will fail."
echo 'Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.'
fi
```