diff --git a/notes/installation/JDK安装.md b/notes/installation/JDK安装.md index e6c58f0..e7bace6 100644 --- a/notes/installation/JDK安装.md +++ b/notes/installation/JDK安装.md @@ -1,20 +1,20 @@ # Linux下JDK的安装 -**系统环境**:centos 7.6 - -**JDK版本**:jdk 1.8.0_20 +>**系统环境**:centos 7.6 +> +>**JDK版本**:jdk 1.8.0_20 + -## 安装步骤: ### 1. 下载jdk安装包 -在[官网](https://www.oracle.com/technetwork/java/javase/downloads/index.html)下载所需版本的jdk,上传至服务器对应位置。(这里我们下载的版本为[jdk1.8](https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html) ,上传至服务器的/usr/java/目录下) +在[官网](https://www.oracle.com/technetwork/java/javase/downloads/index.html)下载所需版本的jdk,上传至服务器对应位置(这里我们下载的版本为[jdk1.8](https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html) ,上传至服务器的/usr/java/目录下) -### 2. 解压jdk-8u201-linux-x64.tar.gz安装包 +### 2. 解压安装包 ```shell [root@ java]# tar -zxvf jdk-8u201-linux-x64.tar.gz diff --git a/notes/installation/Spark单机版本环境搭建.md b/notes/installation/Spark单机版本环境搭建.md new file mode 100644 index 0000000..5feab8c --- /dev/null +++ b/notes/installation/Spark单机版本环境搭建.md @@ -0,0 +1,121 @@ +# Spark单机版本环境搭建 + + + +>**系统环境**:centos 7.6 +> +>**Spark版本**:spark-2.2.3-bin-hadoop2.6 + + + +### 1. Spark安装包下载 + +官网下载地址:http://spark.apache.org/downloads.html + +因为Spark常常和Hadoop联合使用,所以下载时候需要选择Spark版本和对应的Hadoop版本后再下载 + +
+ + + +### 2. 解压安装包 + +```shell +# tar -zxvf spark-2.2.3-bin-hadoop2.6.tgz +``` + + + +### 3. 配置环境变量 + +```shell +# vim /etc/profile +``` + +添加环境变量: + +```shell +export SPARK_HOME=/usr/app/spark-2.2.3-bin-hadoop2.6 +export PATH=${SPARK_HOME}/bin:$PATH +``` + +使得配置的环境变量生效: + +```shell +# source /etc/profile +``` + + + +### 4. Standalone模式启动Spark + +进入`${SPARK_HOME}/conf/`目录下,拷贝配置样本并进行相关配置: + +```shell +# cp spark-env.sh.template spark-env.sh +``` + +在`spark-env.sh`中增加如下配置: + +```shell +# 主机节点地址 +SPARK_MASTER_HOST=hadoop001 +# Worker节点的最大并发task数 +SPARK_WORKER_CORES=2 +# Worker节点使用的最大内存数 +SPARK_WORKER_MEMORY=1g +# 每台机器启动Worker实例的数量 +SPARK_WORKER_INSTANCES=1 +# JDK安装位置 +JAVA_HOME=/usr/java/jdk1.8.0_201 +``` + +进入`${SPARK_HOME}/sbin/`目录下,启动服务: + +```shell +# ./start-all.sh +``` + + + +### 5. 验证启动是否成功 + +访问8080端口,查看Spark的Web-UI界面 + +
+ + + + + +## 附:一个简单的词频统计例子,感受spark的魅力 + +#### 1. 准备一个词频统计的文件样本wc.txt,内容如下: + +```txt +hadoop,spark,hadoop +spark,flink,flink,spark +hadoop,hadoop +``` + +#### 2. 指定spark master 节点地址,启动spark-shell + +```shell +# spark-shell --master spark://hadoop001:7077 +``` + +#### 3. 在scala交互式命令行中执行如下命名 + +```scala +val file = spark.sparkContext.textFile("file:///usr/app//wc.txt") +val wordCounts = file.flatMap(line => line.split(",")).map((word => (word, 1))).reduceByKey(_ + _) +wordCounts.collect +``` + +执行过程如下: + +
+ +通过spark shell web-ui可以查看作业的执行情况,访问端口为4040 + +
\ No newline at end of file diff --git a/notes/installation/hadoop单机版本环境搭建.md b/notes/installation/hadoop单机版本环境搭建.md index cde0777..b8351a1 100644 --- a/notes/installation/hadoop单机版本环境搭建.md +++ b/notes/installation/hadoop单机版本环境搭建.md @@ -2,29 +2,23 @@ +>**系统环境**:centos 7.6 +> +>**JDK版本**:jdk 1.8.0_20 +> +>**Hadoop版本**:hadoop-2.6.0-cdh5.15.2 + + - ## 一、安装JDK Hadoop 需要在java环境下运行,所以需要先安装Jdk,安装步骤见[Linux下JDK的安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/JDK%E5%AE%89%E8%A3%85.md) diff --git a/notes/installation/虚拟机静态IP配置.md b/notes/installation/虚拟机静态IP配置.md new file mode 100644 index 0000000..a10db6a --- /dev/null +++ b/notes/installation/虚拟机静态IP配置.md @@ -0,0 +1,38 @@ +# 虚拟机静态IP配置 + +> 虚拟机环境:centos 7.6 + + + +### 1. 查看当前网卡名称 + +​ 本机网卡名称为`enp0s3` + +
+ +### 2. 编辑网络配置文件 + +```shell +# vim /etc/sysconfig/network-scripts/ifcfg-enp0s3 +``` + +添加如下网络配置,指明静态IP和DNS: + +```shell +BOOTPROTO=static +IPADDR=192.168.200.226 +NETMASK=255.255.255.0 +GATEWAY=192.168.200.254 +DNS1=114.114.114.114 +``` + +修改后完整配置如下: + +
+ +### 3. 重启网络服务 + +```shell +# systemctl restart network +``` + diff --git a/notes/linux下大数据常用软件安装指南.md b/notes/linux下大数据常用软件安装指南.md index f3c9c5a..2d40719 100644 --- a/notes/linux下大数据常用软件安装指南.md +++ b/notes/linux下大数据常用软件安装指南.md @@ -1,9 +1,26 @@ ## 大数据环境搭建指南 -### 一、JDK -1. [linux环境下JDK的安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/JDK安装.md) -### 二、Hadoop +## 一、JDK + +1. [Linux环境下JDK的安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/JDK安装.md) + + + +## 二、Hadoop + +1. [Hadoop单机版本环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Hadoop单机版本环境搭建.md) + + + +## 三、Spark + +1. [Spark单机版本环境搭建]((https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Spark单机版本环境搭建.md)) + + + +## 网络配置 + ++ [虚拟机静态IP配置](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/虚拟机静态IP配置.md) -1. [hadoop单机版本环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/hadoop单机版本环境搭建.md) \ No newline at end of file diff --git a/pictures/en0s3.png b/pictures/en0s3.png new file mode 100644 index 0000000..b35b0d8 Binary files /dev/null and b/pictures/en0s3.png differ diff --git a/pictures/ifconfig.png b/pictures/ifconfig.png new file mode 100644 index 0000000..8f7c017 Binary files /dev/null and b/pictures/ifconfig.png differ diff --git a/pictures/spark-download.png b/pictures/spark-download.png new file mode 100644 index 0000000..4a62662 Binary files /dev/null and b/pictures/spark-download.png differ diff --git a/pictures/spark-shell-web-ui.png b/pictures/spark-shell-web-ui.png new file mode 100644 index 0000000..235c98a Binary files /dev/null and b/pictures/spark-shell-web-ui.png differ diff --git a/pictures/spark-shell.png b/pictures/spark-shell.png new file mode 100644 index 0000000..2f0d095 Binary files /dev/null and b/pictures/spark-shell.png differ diff --git a/pictures/spark-web-ui.png b/pictures/spark-web-ui.png new file mode 100644 index 0000000..5e9fc57 Binary files /dev/null and b/pictures/spark-web-ui.png differ