spark 单机环境搭建

This commit is contained in:
罗祥 2019-03-22 13:31:05 +08:00
parent a5b8c61ce3
commit b05b437421
11 changed files with 193 additions and 23 deletions

View File

@ -1,20 +1,20 @@
# Linux下JDK的安装
**系统环境**centos 7.6
**JDK版本**jdk 1.8.0_20
>**系统环境**centos 7.6
>
>**JDK版本**jdk 1.8.0_20
## 安装步骤:
### 1. 下载jdk安装包
在[官网](https://www.oracle.com/technetwork/java/javase/downloads/index.html)下载所需版本的jdk上传至服务器对应位置(这里我们下载的版本为[jdk1.8](https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html) ,上传至服务器的/usr/java/目录下)
在[官网](https://www.oracle.com/technetwork/java/javase/downloads/index.html)下载所需版本的jdk上传至服务器对应位置这里我们下载的版本为[jdk1.8](https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html) ,上传至服务器的/usr/java/目录下)
### 2. 解压jdk-8u201-linux-x64.tar.gz安装包
### 2. 解压安装包
```shell
[root@ java]# tar -zxvf jdk-8u201-linux-x64.tar.gz

View File

@ -0,0 +1,121 @@
# Spark单机版本环境搭建
>**系统环境**centos 7.6
>
>**Spark版本**spark-2.2.3-bin-hadoop2.6
### 1. Spark安装包下载
官网下载地址http://spark.apache.org/downloads.html
因为Spark常常和Hadoop联合使用所以下载时候需要选择Spark版本和对应的Hadoop版本后再下载
<div align="center"> <img width="600px" src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/spark-download.png"/> </div>
### 2. 解压安装包
```shell
# tar -zxvf spark-2.2.3-bin-hadoop2.6.tgz
```
### 3. 配置环境变量
```shell
# vim /etc/profile
```
添加环境变量:
```shell
export SPARK_HOME=/usr/app/spark-2.2.3-bin-hadoop2.6
export PATH=${SPARK_HOME}/bin:$PATH
```
使得配置的环境变量生效:
```shell
# source /etc/profile
```
### 4. Standalone模式启动Spark
进入`${SPARK_HOME}/conf/`目录下,拷贝配置样本并进行相关配置:
```shell
# cp spark-env.sh.template spark-env.sh
```
`spark-env.sh`中增加如下配置:
```shell
# 主机节点地址
SPARK_MASTER_HOST=hadoop001
# Worker节点的最大并发task数
SPARK_WORKER_CORES=2
# Worker节点使用的最大内存数
SPARK_WORKER_MEMORY=1g
# 每台机器启动Worker实例的数量
SPARK_WORKER_INSTANCES=1
# JDK安装位置
JAVA_HOME=/usr/java/jdk1.8.0_201
```
进入`${SPARK_HOME}/sbin/`目录下,启动服务:
```shell
# ./start-all.sh
```
### 5. 验证启动是否成功
访问8080端口查看Spark的Web-UI界面
<div align="center"> <img width="600px" src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/spark-web-ui.png"/> </div>
## 附一个简单的词频统计例子感受spark的魅力
#### 1. 准备一个词频统计的文件样本wc.txt,内容如下:
```txt
hadoop,spark,hadoop
spark,flink,flink,spark
hadoop,hadoop
```
#### 2. 指定spark master 节点地址启动spark-shell
```shell
# spark-shell --master spark://hadoop001:7077
```
#### 3. 在scala交互式命令行中执行如下命名
```scala
val file = spark.sparkContext.textFile("file:///usr/app//wc.txt")
val wordCounts = file.flatMap(line => line.split(",")).map((word => (word, 1))).reduceByKey(_ + _)
wordCounts.collect
```
执行过程如下:
<div align="center"> <img src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/spark-shell.png"/> </div>
通过spark shell web-ui可以查看作业的执行情况访问端口为4040
<div align="center"> <img width="600px" src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/spark-shell-web-ui.png"/> </div>

View File

@ -2,29 +2,23 @@
>**系统环境**centos 7.6
>
>**JDK版本**jdk 1.8.0_20
>
>**Hadoop版本**hadoop-2.6.0-cdh5.15.2
<nav>
<a href="#一安装JDK">一、安装JDK</a><br/>
<a href="#二配置-SSH-免密登录">二、配置 SSH 免密登录</a><br/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#21-配置ip地址和主机名映射在配置文件末尾添加ip地址和主机名映射">2.1 配置ip地址和主机名映射在配置文件末尾添加ip地址和主机名映射</a><br/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#22--执行下面命令行一路回车生成公匙和私匙"> 2.2 执行下面命令行,一路回车,生成公匙和私匙</a><br/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#33-进入`~ssh`目录下查看生成的公匙和私匙并将公匙写入到授权文件">3.3 进入`~/.ssh`目录下,查看生成的公匙和私匙,并将公匙写入到授权文件</a><br/>
<a href="#三HadoopHDFS环境搭建">三、Hadoop(HDFS)环境搭建</a><br/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#31-下载CDH-版本的Hadoop">3.1 下载CDH 版本的Hadoop</a><br/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#32-解压软件压缩包">3.2 解压软件压缩包</a><br/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#33-修改Hadoop相关配置文件">3.3 修改Hadoop相关配置文件</a><br/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#34-关闭防火墙">3.4 关闭防火墙</a><br/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#35-启动HDFS">3.5 启动HDFS</a><br/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#36-验证是否启动成功">3.6 验证是否启动成功</a><br/>
<a href="#四HadoopYARN环境搭建">四、Hadoop(YARN)环境搭建</a><br/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#41-修改Hadoop配置文件指明mapreduce运行在YARN上">4.1 修改Hadoop配置文件指明mapreduce运行在YARN上</a><br/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#42-在sbin目录下启动YARN">4.2 在sbin目录下启动YARN</a><br/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#43-验证是否启动成功">4.3 验证是否启动成功</a><br/>
</nav>
## 一、安装JDK
Hadoop 需要在java环境下运行所以需要先安装Jdk,安装步骤见[Linux下JDK的安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/JDK%E5%AE%89%E8%A3%85.md)

View File

@ -0,0 +1,38 @@
# 虚拟机静态IP配置
> 虚拟机环境centos 7.6
### 1. 查看当前网卡名称
本机网卡名称为`enp0s3`
<div align="center"> <img src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/en0s3.png"/> </div>
### 2. 编辑网络配置文件
```shell
# vim /etc/sysconfig/network-scripts/ifcfg-enp0s3
```
添加如下网络配置指明静态IP和DNS
```shell
BOOTPROTO=static
IPADDR=192.168.200.226
NETMASK=255.255.255.0
GATEWAY=192.168.200.254
DNS1=114.114.114.114
```
修改后完整配置如下:
<div align="center"> <img src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/ifconfig.png"/> </div>
### 3. 重启网络服务
```shell
# systemctl restart network
```

View File

@ -1,9 +1,26 @@
## 大数据环境搭建指南
### 一、JDK
1. [linux环境下JDK的安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/JDK安装.md)
### 二、Hadoop
## 一、JDK
1. [Linux环境下JDK的安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/JDK安装.md)
## 二、Hadoop
1. [Hadoop单机版本环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Hadoop单机版本环境搭建.md)
## 三、Spark
1. [Spark单机版本环境搭建]((https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Spark单机版本环境搭建.md))
## 网络配置
+ [虚拟机静态IP配置](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/虚拟机静态IP配置.md)
1. [hadoop单机版本环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/hadoop单机版本环境搭建.md)

BIN
pictures/en0s3.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

BIN
pictures/ifconfig.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 21 KiB

BIN
pictures/spark-download.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 49 KiB

BIN
pictures/spark-shell.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

BIN
pictures/spark-web-ui.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 73 KiB