修复部分失效链接

2019-07-31 22:27:25 +08:00
parent f6084a8851
commit e53293af37
14 changed files with 37 additions and 97 deletions
--- a/notes/installation/Azkaban_3.x_编译及部署.md
+++ b/notes/installation/Azkaban_3.x_编译及部署.md
@ -29,7 +29,7 @@ tar -zxvf azkaban-3.70.0.tar.gz

 Azkaban 编译依赖 JDK 1.8+ ，JDK 安装方式见本仓库：

-> [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux 下 JDK 安装.md)
+> [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux下JDK安装.md)

 #### 2. Gradle

@ -38,7 +38,6 @@ Azkaban 3.70.0 编译需要依赖 `gradle-4.6-all.zip`。Gradle 是一个项目
 需要注意的是不同版本的 Azkaban 依赖 Gradle 版本不同，可以在解压后的 `/gradle/wrapper/gradle-wrapper.properties` 文件查看

 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/azkaban-gradle-wrapper.png"/> </div>
-
 在编译时程序会自动去图中所示的地址进行下载，但是下载速度很慢。为避免影响编译过程，建议先手动下载至 `/gradle/wrapper/` 目录下：

 ```shell
@ -48,7 +47,6 @@ Azkaban 3.70.0 编译需要依赖 `gradle-4.6-all.zip`。Gradle 是一个项目
 然后修改配置文件 `gradle-wrapper.properties` 中的 `distributionUrl` 属性，指明使用本地的 gradle。

 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/azkaban-gradle-wrapper-2.png"/> </div>
-
 #### 3. Git

 Azkaban 的编译过程需要用 Git 下载部分 JAR 包，所以需要预先安装 Git：
@ -101,7 +99,6 @@ tar -zxvf  azkaban-solo-server-3.70.0.tar.gz
 这一步不是必须的。但是因为 Azkaban 默认采用的时区是 `America/Los_Angeles`，如果你的调度任务中有定时任务的话，就需要进行相应的更改，这里我更改为常用的 `Asia/Shanghai`

 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/azkaban-setting.png"/> </div>
-
 ### 2.3 启动

 执行启动命令，需要注意的是一定要在根目录下执行，不能进入 `bin` 目录下执行，不然会抛出 `Cannot find 'database.properties'` 异常。
@ -115,7 +112,6 @@ tar -zxvf  azkaban-solo-server-3.70.0.tar.gz
 验证方式一：使用 `jps` 命令查看是否有 `AzkabanSingleServer` 进程：

 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/akaban-jps.png"/> </div>
-
 <br/>

 验证方式二：访问 8081 端口，查看 Web UI 界面，默认的登录名密码都是 `azkaban`，如果需要修改或新增用户，可以在 `conf/azkaban-users.xml ` 文件中进行配置：
@ -123,4 +119,3 @@ tar -zxvf  azkaban-solo-server-3.70.0.tar.gz
 <div align="center"> <img width="700px" src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/azkaban-web-ui.png"/> </div>


-
--- a/notes/installation/HBase单机环境搭建.md
+++ b/notes/installation/HBase单机环境搭建.md
@ -12,7 +12,7 @@

 HBase 需要依赖 JDK 环境，同时 HBase 2.0+ 以上版本不再支持 JDK 1.7 ，需要安装 JDK 1.8+ 。JDK 安装方式见本仓库：

-> [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux 下 JDK 安装.md)
+> [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux下JDK安装.md)

 ### 1.2 Standalone模式和伪集群模式的区别

@ -109,14 +109,13 @@ export JAVA_HOME=/usr/java/jdk1.8.0_201
 <div align="center"> <img src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hbase-web-ui.png"/> </div>


-
 ## 三、伪集群模式安装（Pseudo-Distributed）

 ### 3.1 Hadoop单机伪集群安装

 这里我们采用 HDFS 作为 HBase 的存储方案，需要预先安装 Hadoop。Hadoop 的安装方式单独整理至：

-> [Hadoop 单机伪集群搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Hadoop 单机版本环境搭建.md)
+> [Hadoop 单机伪集群搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Hadoop单机版本环境搭建.md)

 ### 3.2 Hbase版本选择

--- a/notes/installation/HBase集群环境搭建.md
+++ b/notes/installation/HBase集群环境搭建.md
@ -23,14 +23,13 @@
 这里搭建一个 3 节点的 HBase 集群，其中三台主机上均为 `Regin Server`。同时为了保证高可用，除了在 hadoop001 上部署主 `Master` 服务外，还在 hadoop002 上部署备用的 `Master` 服务。Master 服务由 Zookeeper 集群进行协调管理，如果主 `Master` 不可用，则备用 `Master` 会成为新的主 `Master`。

 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hbase集群规划.png"/> </div>
-
 ## 二、前置条件

 HBase 的运行需要依赖 Hadoop 和 JDK(`HBase 2.0+` 对应 `JDK 1.8+`) 。同时为了保证高可用，这里我们不采用 HBase 内置的 Zookeeper 服务，而采用外置的 Zookeeper 集群。相关搭建步骤可以参阅：

- [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux 下 JDK 安装.md)
- [Zookeeper 单机环境和集群环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Zookeeper 单机环境和集群环境搭建.md)
- [Hadoop 集群环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Hadoop 集群环境搭建.md)
+- [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux下JDK安装.md)
+- [Zookeeper 单机环境和集群环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Zookeeper单机环境和集群环境搭建.md)
+- [Hadoop 集群环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Hadoop集群环境搭建.md)



@ -191,7 +190,6 @@ start-hbase.sh
 访问 HBase 的 Web-UI 界面，这里我安装的 HBase 版本为 1.2，访问端口为 `60010`，如果你安装的是 2.0 以上的版本，则访问端口号为 `16010`。可以看到 `Master` 在 hadoop001 上，三个 `Regin Servers` 分别在 hadoop001，hadoop002，和 hadoop003 上，并且还有一个 `Backup Matser` 服务在 hadoop002 上。

 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hbase-集群搭建1.png"/> </div>
-
 <br/>

 hadoop002 上的 HBase 出于备用状态：
--- a/notes/installation/Hadoop单机环境搭建.md
+++ b/notes/installation/Hadoop单机环境搭建.md
@ -14,7 +14,7 @@

 Hadoop 的运行依赖 JDK，需要预先安装，安装步骤见：

-+ [Linux 下 JDK 的安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/JDK%E5%AE%89%E8%A3%85.md)
+ [Linux 下 JDK 的安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux下JDK安装.md)



@ -198,7 +198,6 @@ sudo systemctl stop firewalld.service
 <div align="center"> <img width="700px" src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop安装验证.png"/> </div>


-
 ## 四、Hadoop(YARN)环境搭建

 ### 4.1 修改配置
--- a/notes/installation/Hadoop集群环境搭建.md
+++ b/notes/installation/Hadoop集群环境搭建.md
@ -24,12 +24,11 @@
 这里搭建一个 3 节点的 Hadoop 集群，其中三台主机均部署 `DataNode` 和 `NodeManager` 服务，但只有 hadoop001 上部署 `NameNode` 和 `ResourceManager` 服务。

 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop集群规划.png"/> </div>
-
 ## 二、前置条件

 Hadoop 的运行依赖 JDK，需要预先安装。其安装步骤单独整理至：

-+ [Linux 下 JDK 的安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/JDK%E5%AE%89%E8%A3%85.md)
+ [Linux 下 JDK 的安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux下JDK安装.md)



@ -211,13 +210,11 @@ start-yarn.sh
 在每台服务器上使用 `jps` 命令查看服务进程，或直接进入 Web-UI 界面进行查看，端口为 `50070`。可以看到此时有三个可用的 `Datanode`：

 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop-集群环境搭建.png"/> </div>
-
 <BR/>

 点击 `Live Nodes` 进入，可以看到每个 `DataNode` 的详细情况：

 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop-集群搭建2.png"/> </div>
-
 <BR/>

 接着可以查看 Yarn 的情况，端口号为 `8088` ：
@ -225,7 +222,6 @@ start-yarn.sh
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop-集群搭建3.png"/> </div>


-
 ## 五、提交服务到集群

 提交作业到集群的方式和单机环境完全一致，这里以提交 Hadoop 内置的计算 Pi 的示例程序为例，在任何一个节点上执行都可以，命令如下：
--- a/notes/installation/Linux下Flume的安装.md
+++ b/notes/installation/Linux下Flume的安装.md
@ -5,7 +5,7 @@

 Flume 需要依赖 JDK 1.8+，JDK 安装方式见本仓库：

-> [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux 下 JDK 安装.md)
+> [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux下JDK安装.md)



--- a/notes/installation/Storm单机环境搭建.md
+++ b/notes/installation/Storm单机环境搭建.md
@ -9,9 +9,9 @@

 按照[官方文档](http://storm.apache.org/releases/1.2.2/Setting-up-a-Storm-cluster.html) 的说明：storm 运行依赖于 Java 7+ 和 Python 2.6.6 +，所以需要预先安装这两个软件。由于这两个软件在多个框架中都有依赖，其安装步骤单独整理至  ：

-+ [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux 下 JDK 安装.md)
+ [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux下JDK安装.md)

-+ [Linux 环境下 Python 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux 下 Python 安装.md)
+ [Linux 环境下 Python 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux下Python安装.md)

  

--- a/notes/installation/Storm集群环境搭建.md
+++ b/notes/installation/Storm集群环境搭建.md
@ -26,14 +26,13 @@
 这里搭建一个 3 节点的 Storm 集群：三台主机上均部署 `Supervisor` 和 `LogViewer` 服务。同时为了保证高可用，除了在 hadoop001 上部署主 `Nimbus` 服务外，还在 hadoop002 上部署备用的 `Nimbus` 服务。`Nimbus` 服务由 Zookeeper 集群进行协调管理，如果主 `Nimbus` 不可用，则备用 `Nimbus` 会成为新的主 `Nimbus`。

 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/storm-集群规划.png"/> </div>
-
 ## 二、前置条件

 Storm 运行依赖于 Java 7+ 和 Python 2.6.6 +，所以需要预先安装这两个软件。同时为了保证高可用，这里我们不采用 Storm 内置的 Zookeeper，而采用外置的 Zookeeper 集群。由于这三个软件在多个框架中都有依赖，其安装步骤单独整理至 ：

- [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux 下 JDK 安装.md)
- [Linux 环境下 Python 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux 下 Python 安装.md)
- [Zookeeper 单机环境和集群环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Zookeeper 单机环境和集群环境搭建.md)
+- [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux下JDK安装.md)
+- [Linux 环境下 Python 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux下Python安装.md)
+- [Zookeeper 单机环境和集群环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Zookeeper单机环境和集群环境搭建.md)



@ -153,7 +152,6 @@ nohup sh storm logviewer &
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/storm-集群-shell.png"/> </div>


-
 <br/>

 访问 hadoop001 或 hadoop002 的 `8080` 端口，界面如下。可以看到有一主一备 2 个 `Nimbus` 和 3 个 `Supervisor`，并且每个 `Supervisor` 有四个 `slots`，即四个可用的 `worker` 进程，此时代表集群已经搭建成功。
@ -161,7 +159,6 @@ nohup sh storm logviewer &
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/storm-集群搭建1.png"/> </div>


-
 ## 五、高可用验证

 这里手动模拟主 `Nimbus` 异常的情况，在 hadoop001 上使用 `kill` 命令杀死 `Nimbus` 的线程，此时可以看到 hadoop001 上的 `Nimbus` 已经处于 `offline` 状态，而 hadoop002 上的 `Nimbus` 则成为新的 `Leader`。
--- a/notes/installation/基于Zookeeper搭建Hadoop高可用集群.md
+++ b/notes/installation/基于Zookeeper搭建Hadoop高可用集群.md
@ -21,7 +21,6 @@ Hadoop 高可用 (High Availability) 分为 HDFS 高可用和 YARN 高可用，
 HDFS 高可用架构如下：

 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/HDFS-HA-Architecture-Edureka.png"/> </div>
-
 > *图片引用自：https://www.edureka.co/blog/how-to-set-up-hadoop-cluster-with-hdfs-high-availability/*

 HDFS 高可用架构主要由以下组件所构成：
@ -43,13 +42,11 @@ HDFS 高可用架构主要由以下组件所构成：
 需要说明的是向 JournalNode 集群写入 EditLog 是遵循 “过半写入则成功” 的策略，所以你至少要有 3 个 JournalNode 节点，当然你也可以继续增加节点数量，但是应该保证节点总数是奇数。同时如果有 2N+1 台 JournalNode，那么根据过半写的原则，最多可以容忍有 N 台 JournalNode 节点挂掉。

 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop-QJM-同步机制.png"/> </div>
-
 ### 1.3 NameNode 主备切换

 NameNode 实现主备切换的流程下图所示：

 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop-namenode主备切换.png"/> </div>
-
 1. HealthMonitor 初始化完成之后会启动内部的线程来定时调用对应 NameNode 的 HAServiceProtocol RPC 接口的方法，对 NameNode 的健康状态进行检测。
 2. HealthMonitor 如果检测到 NameNode 的健康状态发生变化，会回调 ZKFailoverController 注册的相应方法进行处理。
 3. 如果 ZKFailoverController 判断需要进行主备切换，会首先使用 ActiveStandbyElector 来进行自动的主备选举。
@ -67,7 +64,6 @@ YARN ResourceManager 的高可用与 HDFS NameNode 的高可用类似，但是 R
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop-rm-ha-overview.png"/> </div>


-
 ## 二、集群规划

 按照高可用的设计目标：需要保证至少有两个 NameNode (一主一备)  和 两个 ResourceManager (一主一备)  ，同时为满足“过半写入则成功”的原则，需要至少要有 3 个 JournalNode 节点。这里使用三台主机进行搭建，集群规划如下：
@ -75,11 +71,10 @@ YARN ResourceManager 的高可用与 HDFS NameNode 的高可用类似，但是 R
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop高可用集群规划.png"/> </div>


-
 ## 三、前置条件

-+ 所有服务器都安装有 JDK，安装步骤可以参见：[Linux 下 JDK 的安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/JDK%E5%AE%89%E8%A3%85.md)；
-+ 搭建好 ZooKeeper 集群，搭建步骤可以参见：[Zookeeper 单机环境和集群环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Zookeeper 单机环境和集群环境搭建.md)
+ 所有服务器都安装有 JDK，安装步骤可以参见：[Linux 下 JDK 的安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux下JDK安装.md)；
+ 搭建好 ZooKeeper 集群，搭建步骤可以参见：[Zookeeper 单机环境和集群环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Zookeeper单机环境和集群环境搭建.md)
 + 所有服务器之间都配置好 SSH 免密登录。


@ -452,13 +447,11 @@ HDFS 和 YARN 的端口号分别为 `50070` 和 `8080`，界面应该如下：
 此时 hadoop001 上的 `NameNode` 处于可用状态：

 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop高可用集群1.png"/> </div>
-
 而 hadoop002 上的 `NameNode` 则处于备用状态：

 <br/>

 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop高可用集群3.png"/> </div>
-
 <br/>

 hadoop002 上的 `ResourceManager` 处于可用状态：
@ -466,7 +459,6 @@ hadoop002 上的 `ResourceManager` 处于可用状态：
 <br/>

 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop高可用集群4.png"/> </div>
-
 <br/>

 hadoop003 上的 `ResourceManager` 则处于备用状态：
@ -474,7 +466,6 @@ hadoop003 上的 `ResourceManager` 则处于备用状态：
 <br/>

 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop高可用集群5.png"/> </div>
-
 <br/>

 同时界面上也有 `Journal Manager` 的相关信息：
@ -482,7 +473,6 @@ hadoop003 上的 `ResourceManager` 则处于备用状态：
 <br/>

 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop高可用集群2.png"/> </div>
-
 ## 七、集群的二次启动

 上面的集群初次启动涉及到一些必要初始化操作，所以过程略显繁琐。但是集群一旦搭建好后，想要再次启用它是比较方便的，步骤如下（首选需要确保 ZooKeeper 集群已经启动）：