修复部分失效链接

2019-07-31 22:27:25 +08:00
parent f6084a8851
commit e53293af37
14 changed files with 37 additions and 97 deletions
--- a/notes/Hbase的SQL中间层_Phoenix.md
+++ b/notes/Hbase的SQL中间层_Phoenix.md
@ -30,7 +30,6 @@
 <div align="center"> <img width="600px"  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/Phoenix-hadoop.png"/> </div>
 ## 二、Phoenix安装
 > 我们可以按照官方安装说明进行安装，官方说明如下：
@ -89,7 +88,6 @@ start-hbase.sh
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/phoenix-shell.png"/> </div>
 ## 三、Phoenix 简单使用
 ### 3.1 创建表
@ -103,11 +101,9 @@ CREATE TABLE IF NOT EXISTS us_population (
 ```
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/Phoenix-create-table.png"/> </div>
 新建的表会按照特定的规则转换为 HBase 上的表，关于表的信息，可以通过 Hbase Web UI 进行查看：
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hbase-web-ui-phoenix.png"/> </div>
 ### 3.2 插入数据
 Phoenix 中插入数据采用的是 `UPSERT` 而不是 `INSERT`,因为 Phoenix 并没有更新操作，插入相同主键的数据就视为更新，所以 `UPSERT` 就相当于 `UPDATE`+`INSERT`
@ -133,7 +129,6 @@ UPSERT INTO us_population VALUES('NY','New York',999999);
 ```
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/Phoenix-update.png"/> </div>
 ### 3.4 删除数据
 ```sql
@ -141,7 +136,6 @@ DELETE FROM us_population WHERE city='Dallas';
 ```
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/Phoenix-delete.png"/> </div>
 ### 3.5 查询数据
 ```sql
@ -154,7 +148,6 @@ ORDER BY sum(population) DESC;
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/Phoenix-select.png"/> </div>
 ### 3.6 退出命令
 ```sql
@ -199,7 +192,6 @@ ORDER BY sum(population) DESC;
 如果是普通项目，则可以从 Phoenix 解压目录下找到对应的 JAR 包，然后手动引入：
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/phoenix-core-jar.png"/> </div>
 ### 4.2 简单的Java API实例
 ```java
@ -242,8 +234,7 @@ public class PhoenixJavaApi {
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/Phoenix-java-api-result.png"/> </div>
-
+实际的开发中我们通常都是采用第三方框架来操作数据库，如 `mybatis`，`Hibernate`，`Spring Data` 等。关于 Phoenix 与这些框架的整合步骤参见下一篇文章：[Spring/Spring Boot + Mybatis + Phoenix](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Spring+Mybtais+Phoenix整合.md)
 实际的开发中我们通常都是采用第三方框架来操作数据库，如 `mybatis`，`Hibernate`，`Spring Data` 等。关于 Phoenix 与这些框架的整合步骤参见下一篇文章：[Spring/Spring Boot + Mybatis + Phoenix](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Spring+Mybtais+Phoenix 整合.md)
 # 参考资料
--- a/notes/Hive数据查询详解.md
+++ b/notes/Hive数据查询详解.md
@ -251,8 +251,6 @@ Hive 支持内连接，外连接，左外连接，右外连接，笛卡尔连接
 <div align="center"> <img width="600px"  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/sql-join.jpg"/> </div>
 ### 3.1 INNER JOIN
 ```sql
@ -289,7 +287,6 @@ ON e.deptno = d.deptno;
 执行右连接后，由于 40 号部门下没有任何员工，所以此时员工信息为 NULL。这个查询可以很好的复述上面提到的——JOIN 语句的关联条件必须用 ON 指定，不能用 WHERE 指定。你可以把 ON 改成 WHERE，你会发现无论如何都查不出 40 号部门这条数据，因为笛卡尔运算不会有 (NULL, 40) 这种情况。
 <div align="center"> <img width="700px"   src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hive-right-join.png"/> </div>
 ### 3.4 FULL OUTER  JOIN 
 ```sql
@ -394,14 +391,6 @@ SET hive.exec.mode.local.auto=true;
 ## 参考资料
 1. [LanguageManual Select](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select)
 2. [LanguageManual Joins](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins)
 3. [LanguageManual GroupBy](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy)
-4. [LanguageManual SortBy](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy)
+4. [LanguageManual SortBy](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy)
--- a/notes/Spark_Streaming基本操作.md
+++ b/notes/Spark_Streaming基本操作.md
@ -63,7 +63,6 @@ storm storm flink azkaban
 此时控制台输出如下，可以看到已经接收到数据并按行进行了词频统计。
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/spark-streaming-word-count-v1.png"/> </div>
 <br/>
 下面针对示例代码进行讲解：
@ -92,7 +91,7 @@ streamingContext.fileStream[KeyClass, ValueClass, InputFormatClass](dataDirector
 被监听的目录可以是具体目录，如 `hdfs://host:8040/logs/`；也可以使用通配符，如 `hdfs://host:8040/logs/2017/*`。
-> 关于高级数据源的整合单独整理至：[Spark Streaming 整合 Flume](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Spark_Streaming 整合 Flume.md) 和 [Spark Streaming 整合 Kafka](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Spark_Streaming 整合 Kafka.md)
+> 关于高级数据源的整合单独整理至：[Spark Streaming 整合 Flume](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Spark_Streaming整合Flume.md) 和 [Spark Streaming 整合 Kafka](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Spark_Streaming整合Kafka.md)
 ### 3.3 服务的启动与停止
@ -107,7 +106,6 @@ streamingContext.fileStream[KeyClass, ValueClass, InputFormatClass](dataDirector
 DStream 是 Spark Streaming 提供的基本抽象。它表示连续的数据流。在内部，DStream 由一系列连续的 RDD 表示。所以从本质上而言，应用于 DStream 的任何操作都会转换为底层 RDD 上的操作。例如，在示例代码中 flatMap 算子的操作实际上是作用在每个 RDDs 上 (如下图)。因为这个原因，所以 DStream 能够支持 RDD 大部分的*transformation*算子。
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/spark-streaming-dstream-ops.png"/> </div>
 ### 2.2 updateStateByKey
 除了能够支持 RDD 的算子外，DStream 还有部分独有的*transformation*算子，这当中比较常用的是 `updateStateByKey`。文章开头的词频统计程序，只能统计每一次输入文本中单词出现的数量，想要统计所有历史输入中单词出现的数量，可以使用 `updateStateByKey` 算子。代码如下：
@ -169,7 +167,6 @@ storm storm flink azkaban
 此时控制台输出如下，所有输入都被进行了词频累计：
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/spark-streaming-word-count-v2.png"/> </div>
 同时在输出日志中还可以看到检查点操作的相关信息：
 ```shell
@ -326,7 +323,6 @@ storm storm flink azkaban
 使用 Redis Manager 查看写入结果 (如下图),可以看到与使用 `updateStateByKey` 算子得到的计算结果相同。
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/spark-streaming-word-count-v3.png"/> </div>  
 <br/>
 > 本片文章所有源码见本仓库：[spark-streaming-basis](https://github.com/heibaiying/BigData-Notes/tree/master/code/spark/spark-streaming-basis)
--- a/notes/Storm编程模型详解.md
+++ b/notes/Storm编程模型详解.md
@ -23,7 +23,6 @@
 下图为 Strom 的运行流程图，在开发 Storm 流处理程序时，我们需要采用内置或自定义实现 `spout`(数据源) 和 `bolt`(处理单元)，并通过 `TopologyBuilder` 将它们之间进行关联，形成 `Topology`。
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/spout-bolt.png"/> </div>
 ## 二、IComponent接口
 `IComponent` 接口定义了 Topology 中所有组件 (spout/bolt) 的公共方法，自定义的 spout 或 bolt 必须直接或间接实现这个接口。
@ -102,7 +101,6 @@ public interface ISpout extends Serializable {
 **通常情况下，我们实现自定义的 Spout 时不会直接去实现 `ISpout` 接口，而是继承 `BaseRichSpout`。**`BaseRichSpout` 继承自 `BaseCompont`，同时实现了 `IRichSpout` 接口。
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/storm-baseRichSpout.png"/> </div>
 `IRichSpout` 接口继承自 `ISpout` 和 `IComponent`,自身并没有定义任何方法：
 ```java
@ -193,7 +191,6 @@ public interface IBolt extends Serializable {
 同样的，在实现自定义 bolt 时，通常是继承 `BaseRichBolt` 抽象类来实现。`BaseRichBolt` 继承自 `BaseComponent` 抽象类并实现了 `IRichBolt` 接口。
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/storm-baseRichbolt.png"/> </div>
 `IRichBolt` 接口继承自 `IBolt` 和 `IComponent`,自身并没有定义任何方法：
 ```
@ -217,7 +214,6 @@ public interface IRichBolt extends IBolt, IComponent {
 这里我们使用自定义的 `DataSourceSpout` 产生词频数据，然后使用自定义的 `SplitBolt` 和 `CountBolt` 来进行词频统计。
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/storm-word-count-p.png"/> </div>
 > 案例源码下载地址：[storm-word-count](https://github.com/heibaiying/BigData-Notes/tree/master/code/Storm/storm-word-count)
 ### 5.2 代码实现
@ -385,7 +381,6 @@ public class LocalWordCountApp {
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/storm-word-count-console.png"/> </div>
 ## 六、提交到服务器集群运行
 ### 6.1 代码更改
@ -439,7 +434,6 @@ storm jar /usr/appjar/storm-word-count-1.0.jar  com.heibaiying.wordcount.Cluster
 出现 `successfully` 则代表提交成功：
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/storm-submit-success.png"/> </div>
 ### 6.4 查看Topology与停止Topology（命令行方式）
 ```shell
@ -451,7 +445,6 @@ storm kill ClusterWordCountApp -w 3
 ```
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/storm-list-kill.png"/> </div>
 ### 6.5 查看Topology与停止Topology（界面方式）
 使用 UI 界面同样也可进行停止操作，进入 WEB UI 界面（8080 端口），在 `Topology Summary` 中点击对应 Topology 即可进入详情页面进行操作。
@ -465,7 +458,6 @@ storm kill ClusterWordCountApp -w 3
 ## 七、关于项目打包的扩展说明
 ### mvn package的局限性
@ -475,7 +467,6 @@ storm kill ClusterWordCountApp -w 3
 这时候可能大家会有疑惑，在我们的项目中不是使用了 `storm-core` 这个依赖吗？其实上面之所以我们能运行成功，是因为在 Storm 的集群环境中提供了这个 JAR 包，在安装目录的 lib 目录下：
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/storm-lib.png"/> </div>
 为了说明这个问题我在 Maven 中引入了一个第三方的 JAR 包，并修改产生数据的方法：
 ```xml
@ -504,10 +495,9 @@ private String productData() {
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/storm-package-error.png"/> </div>
 想把依赖包一并打入最后的 JAR 中，maven 提供了两个插件来实现，分别是 `maven-assembly-plugin` 和 `maven-shade-plugin`。鉴于本篇文章篇幅已经比较长，且关于 Storm 打包还有很多需要说明的地方，所以关于 Storm 的打包方式单独整理至下一篇文章：
-[Storm 三种打包方式对比分析](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Storm 三种打包方式对比分析.md)
+[Storm 三种打包方式对比分析](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Storm三种打包方式对比分析.md)
 ## 参考资料
--- a/notes/installation/Azkaban_3.x_编译及部署.md
+++ b/notes/installation/Azkaban_3.x_编译及部署.md
@ -29,7 +29,7 @@ tar -zxvf azkaban-3.70.0.tar.gz
 Azkaban 编译依赖 JDK 1.8+ ，JDK 安装方式见本仓库：
-> [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux 下 JDK 安装.md)
+> [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux下JDK安装.md)
 #### 2. Gradle
@ -38,7 +38,6 @@ Azkaban 3.70.0 编译需要依赖 `gradle-4.6-all.zip`。Gradle 是一个项目
 需要注意的是不同版本的 Azkaban 依赖 Gradle 版本不同，可以在解压后的 `/gradle/wrapper/gradle-wrapper.properties` 文件查看
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/azkaban-gradle-wrapper.png"/> </div>
 在编译时程序会自动去图中所示的地址进行下载，但是下载速度很慢。为避免影响编译过程，建议先手动下载至 `/gradle/wrapper/` 目录下：
 ```shell
@ -48,7 +47,6 @@ Azkaban 3.70.0 编译需要依赖 `gradle-4.6-all.zip`。Gradle 是一个项目
 然后修改配置文件 `gradle-wrapper.properties` 中的 `distributionUrl` 属性，指明使用本地的 gradle。
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/azkaban-gradle-wrapper-2.png"/> </div>
 #### 3. Git
 Azkaban 的编译过程需要用 Git 下载部分 JAR 包，所以需要预先安装 Git：
@ -101,7 +99,6 @@ tar -zxvf  azkaban-solo-server-3.70.0.tar.gz
 这一步不是必须的。但是因为 Azkaban 默认采用的时区是 `America/Los_Angeles`，如果你的调度任务中有定时任务的话，就需要进行相应的更改，这里我更改为常用的 `Asia/Shanghai`
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/azkaban-setting.png"/> </div>
 ### 2.3 启动
 执行启动命令，需要注意的是一定要在根目录下执行，不能进入 `bin` 目录下执行，不然会抛出 `Cannot find 'database.properties'` 异常。
@ -115,7 +112,6 @@ tar -zxvf  azkaban-solo-server-3.70.0.tar.gz
 验证方式一：使用 `jps` 命令查看是否有 `AzkabanSingleServer` 进程：
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/akaban-jps.png"/> </div>
 <br/>
 验证方式二：访问 8081 端口，查看 Web UI 界面，默认的登录名密码都是 `azkaban`，如果需要修改或新增用户，可以在 `conf/azkaban-users.xml ` 文件中进行配置：
@ -123,4 +119,3 @@ tar -zxvf  azkaban-solo-server-3.70.0.tar.gz
 <div align="center"> <img width="700px" src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/azkaban-web-ui.png"/> </div>
--- a/notes/installation/HBase单机环境搭建.md
+++ b/notes/installation/HBase单机环境搭建.md
@ -12,7 +12,7 @@
 HBase 需要依赖 JDK 环境，同时 HBase 2.0+ 以上版本不再支持 JDK 1.7 ，需要安装 JDK 1.8+ 。JDK 安装方式见本仓库：
-> [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux 下 JDK 安装.md)
+> [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux下JDK安装.md)
 ### 1.2 Standalone模式和伪集群模式的区别
@ -109,14 +109,13 @@ export JAVA_HOME=/usr/java/jdk1.8.0_201
 <div align="center"> <img src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hbase-web-ui.png"/> </div>
 ## 三、伪集群模式安装（Pseudo-Distributed）
 ### 3.1 Hadoop单机伪集群安装
 这里我们采用 HDFS 作为 HBase 的存储方案，需要预先安装 Hadoop。Hadoop 的安装方式单独整理至：
-> [Hadoop 单机伪集群搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Hadoop 单机版本环境搭建.md)
+> [Hadoop 单机伪集群搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Hadoop单机版本环境搭建.md)
 ### 3.2 Hbase版本选择
--- a/notes/installation/HBase集群环境搭建.md
+++ b/notes/installation/HBase集群环境搭建.md
@ -23,14 +23,13 @@
 这里搭建一个 3 节点的 HBase 集群，其中三台主机上均为 `Regin Server`。同时为了保证高可用，除了在 hadoop001 上部署主 `Master` 服务外，还在 hadoop002 上部署备用的 `Master` 服务。Master 服务由 Zookeeper 集群进行协调管理，如果主 `Master` 不可用，则备用 `Master` 会成为新的主 `Master`。
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hbase集群规划.png"/> </div>
 ## 二、前置条件
 HBase 的运行需要依赖 Hadoop 和 JDK(`HBase 2.0+` 对应 `JDK 1.8+`) 。同时为了保证高可用，这里我们不采用 HBase 内置的 Zookeeper 服务，而采用外置的 Zookeeper 集群。相关搭建步骤可以参阅：
- [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux 下 JDK 安装.md)
+- [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux下JDK安装.md)
- [Zookeeper 单机环境和集群环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Zookeeper 单机环境和集群环境搭建.md)
+- [Zookeeper 单机环境和集群环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Zookeeper单机环境和集群环境搭建.md)
- [Hadoop 集群环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Hadoop 集群环境搭建.md)
+- [Hadoop 集群环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Hadoop集群环境搭建.md)
@ -191,7 +190,6 @@ start-hbase.sh
 访问 HBase 的 Web-UI 界面，这里我安装的 HBase 版本为 1.2，访问端口为 `60010`，如果你安装的是 2.0 以上的版本，则访问端口号为 `16010`。可以看到 `Master` 在 hadoop001 上，三个 `Regin Servers` 分别在 hadoop001，hadoop002，和 hadoop003 上，并且还有一个 `Backup Matser` 服务在 hadoop002 上。
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hbase-集群搭建1.png"/> </div>
 <br/>
 hadoop002 上的 HBase 出于备用状态：
--- a/notes/installation/Hadoop单机环境搭建.md
+++ b/notes/installation/Hadoop单机环境搭建.md
@ -14,7 +14,7 @@
 Hadoop 的运行依赖 JDK，需要预先安装，安装步骤见：
-+ [Linux 下 JDK 的安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/JDK%E5%AE%89%E8%A3%85.md)
+ [Linux 下 JDK 的安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux下JDK安装.md)
@ -198,7 +198,6 @@ sudo systemctl stop firewalld.service
 <div align="center"> <img width="700px" src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop安装验证.png"/> </div>
 ## 四、Hadoop(YARN)环境搭建
 ### 4.1 修改配置
--- a/notes/installation/Hadoop集群环境搭建.md
+++ b/notes/installation/Hadoop集群环境搭建.md
@ -24,12 +24,11 @@
 这里搭建一个 3 节点的 Hadoop 集群，其中三台主机均部署 `DataNode` 和 `NodeManager` 服务，但只有 hadoop001 上部署 `NameNode` 和 `ResourceManager` 服务。
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop集群规划.png"/> </div>
 ## 二、前置条件
 Hadoop 的运行依赖 JDK，需要预先安装。其安装步骤单独整理至：
-+ [Linux 下 JDK 的安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/JDK%E5%AE%89%E8%A3%85.md)
+ [Linux 下 JDK 的安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux下JDK安装.md)
@ -211,13 +210,11 @@ start-yarn.sh
 在每台服务器上使用 `jps` 命令查看服务进程，或直接进入 Web-UI 界面进行查看，端口为 `50070`。可以看到此时有三个可用的 `Datanode`：
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop-集群环境搭建.png"/> </div>
 <BR/>
 点击 `Live Nodes` 进入，可以看到每个 `DataNode` 的详细情况：
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop-集群搭建2.png"/> </div>
 <BR/>
 接着可以查看 Yarn 的情况，端口号为 `8088` ：
@ -225,7 +222,6 @@ start-yarn.sh
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop-集群搭建3.png"/> </div>
 ## 五、提交服务到集群
 提交作业到集群的方式和单机环境完全一致，这里以提交 Hadoop 内置的计算 Pi 的示例程序为例，在任何一个节点上执行都可以，命令如下：
--- a/notes/installation/Linux下Flume的安装.md
+++ b/notes/installation/Linux下Flume的安装.md
@ -5,7 +5,7 @@
 Flume 需要依赖 JDK 1.8+，JDK 安装方式见本仓库：
-> [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux 下 JDK 安装.md)
+> [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux下JDK安装.md)
--- a/notes/installation/Storm单机环境搭建.md
+++ b/notes/installation/Storm单机环境搭建.md
@ -9,9 +9,9 @@
 按照[官方文档](http://storm.apache.org/releases/1.2.2/Setting-up-a-Storm-cluster.html) 的说明：storm 运行依赖于 Java 7+ 和 Python 2.6.6 +，所以需要预先安装这两个软件。由于这两个软件在多个框架中都有依赖，其安装步骤单独整理至  ：
-+ [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux 下 JDK 安装.md)
+ [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux下JDK安装.md)
-+ [Linux 环境下 Python 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux 下 Python 安装.md)
+ [Linux 环境下 Python 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux下Python安装.md)
--- a/notes/installation/Storm集群环境搭建.md
+++ b/notes/installation/Storm集群环境搭建.md
@ -26,14 +26,13 @@
 这里搭建一个 3 节点的 Storm 集群：三台主机上均部署 `Supervisor` 和 `LogViewer` 服务。同时为了保证高可用，除了在 hadoop001 上部署主 `Nimbus` 服务外，还在 hadoop002 上部署备用的 `Nimbus` 服务。`Nimbus` 服务由 Zookeeper 集群进行协调管理，如果主 `Nimbus` 不可用，则备用 `Nimbus` 会成为新的主 `Nimbus`。
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/storm-集群规划.png"/> </div>
 ## 二、前置条件
 Storm 运行依赖于 Java 7+ 和 Python 2.6.6 +，所以需要预先安装这两个软件。同时为了保证高可用，这里我们不采用 Storm 内置的 Zookeeper，而采用外置的 Zookeeper 集群。由于这三个软件在多个框架中都有依赖，其安装步骤单独整理至 ：
- [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux 下 JDK 安装.md)
+- [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux下JDK安装.md)
- [Linux 环境下 Python 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux 下 Python 安装.md)
+- [Linux 环境下 Python 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux下Python安装.md)
- [Zookeeper 单机环境和集群环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Zookeeper 单机环境和集群环境搭建.md)
+- [Zookeeper 单机环境和集群环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Zookeeper单机环境和集群环境搭建.md)
@ -153,7 +152,6 @@ nohup sh storm logviewer &
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/storm-集群-shell.png"/> </div>
 <br/>
 访问 hadoop001 或 hadoop002 的 `8080` 端口，界面如下。可以看到有一主一备 2 个 `Nimbus` 和 3 个 `Supervisor`，并且每个 `Supervisor` 有四个 `slots`，即四个可用的 `worker` 进程，此时代表集群已经搭建成功。
@ -161,7 +159,6 @@ nohup sh storm logviewer &
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/storm-集群搭建1.png"/> </div>
 ## 五、高可用验证
 这里手动模拟主 `Nimbus` 异常的情况，在 hadoop001 上使用 `kill` 命令杀死 `Nimbus` 的线程，此时可以看到 hadoop001 上的 `Nimbus` 已经处于 `offline` 状态，而 hadoop002 上的 `Nimbus` 则成为新的 `Leader`。
--- a/notes/installation/基于Zookeeper搭建Hadoop高可用集群.md
+++ b/notes/installation/基于Zookeeper搭建Hadoop高可用集群.md
@ -21,7 +21,6 @@ Hadoop 高可用 (High Availability) 分为 HDFS 高可用和 YARN 高可用，
 HDFS 高可用架构如下：
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/HDFS-HA-Architecture-Edureka.png"/> </div>
 > *图片引用自：https://www.edureka.co/blog/how-to-set-up-hadoop-cluster-with-hdfs-high-availability/*
 HDFS 高可用架构主要由以下组件所构成：
@ -43,13 +42,11 @@ HDFS 高可用架构主要由以下组件所构成：
 需要说明的是向 JournalNode 集群写入 EditLog 是遵循 “过半写入则成功” 的策略，所以你至少要有 3 个 JournalNode 节点，当然你也可以继续增加节点数量，但是应该保证节点总数是奇数。同时如果有 2N+1 台 JournalNode，那么根据过半写的原则，最多可以容忍有 N 台 JournalNode 节点挂掉。
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop-QJM-同步机制.png"/> </div>
 ### 1.3 NameNode 主备切换
 NameNode 实现主备切换的流程下图所示：
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop-namenode主备切换.png"/> </div>
 1. HealthMonitor 初始化完成之后会启动内部的线程来定时调用对应 NameNode 的 HAServiceProtocol RPC 接口的方法，对 NameNode 的健康状态进行检测。
 2. HealthMonitor 如果检测到 NameNode 的健康状态发生变化，会回调 ZKFailoverController 注册的相应方法进行处理。
 3. 如果 ZKFailoverController 判断需要进行主备切换，会首先使用 ActiveStandbyElector 来进行自动的主备选举。
@ -67,7 +64,6 @@ YARN ResourceManager 的高可用与 HDFS NameNode 的高可用类似，但是 R
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop-rm-ha-overview.png"/> </div>
 ## 二、集群规划
 按照高可用的设计目标：需要保证至少有两个 NameNode (一主一备)  和 两个 ResourceManager (一主一备)  ，同时为满足“过半写入则成功”的原则，需要至少要有 3 个 JournalNode 节点。这里使用三台主机进行搭建，集群规划如下：
@ -75,11 +71,10 @@ YARN ResourceManager 的高可用与 HDFS NameNode 的高可用类似，但是 R
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop高可用集群规划.png"/> </div>
 ## 三、前置条件
-+ 所有服务器都安装有 JDK，安装步骤可以参见：[Linux 下 JDK 的安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/JDK%E5%AE%89%E8%A3%85.md)；
+ 所有服务器都安装有 JDK，安装步骤可以参见：[Linux 下 JDK 的安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux下JDK安装.md)；
-+ 搭建好 ZooKeeper 集群，搭建步骤可以参见：[Zookeeper 单机环境和集群环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Zookeeper 单机环境和集群环境搭建.md)
+ 搭建好 ZooKeeper 集群，搭建步骤可以参见：[Zookeeper 单机环境和集群环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Zookeeper单机环境和集群环境搭建.md)
 + 所有服务器之间都配置好 SSH 免密登录。
@ -452,13 +447,11 @@ HDFS 和 YARN 的端口号分别为 `50070` 和 `8080`，界面应该如下：
 此时 hadoop001 上的 `NameNode` 处于可用状态：
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop高可用集群1.png"/> </div>
 而 hadoop002 上的 `NameNode` 则处于备用状态：
 <br/>
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop高可用集群3.png"/> </div>
 <br/>
 hadoop002 上的 `ResourceManager` 处于可用状态：
@ -466,7 +459,6 @@ hadoop002 上的 `ResourceManager` 处于可用状态：
 <br/>
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop高可用集群4.png"/> </div>
 <br/>
 hadoop003 上的 `ResourceManager` 则处于备用状态：
@ -474,7 +466,6 @@ hadoop003 上的 `ResourceManager` 则处于备用状态：
 <br/>
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop高可用集群5.png"/> </div>
 <br/>
 同时界面上也有 `Journal Manager` 的相关信息：
@ -482,7 +473,6 @@ hadoop003 上的 `ResourceManager` 则处于备用状态：
 <br/>
 <div align="center"> <img  src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop高可用集群2.png"/> </div>
 ## 七、集群的二次启动
 上面的集群初次启动涉及到一些必要初始化操作，所以过程略显繁琐。但是集群一旦搭建好后，想要再次启用它是比较方便的，步骤如下（首选需要确保 ZooKeeper 集群已经启动）：
--- a/notes/大数据常用软件安装指南.md
+++ b/notes/大数据常用软件安装指南.md
@ -4,50 +4,50 @@
 ### 一、基础软件安装
-1. [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux 下 JDK 安装.md)
+1. [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux下JDK安装.md)
-2. [Linux 环境下 Python 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux 下 Python 安装.md)
+2. [Linux 环境下 Python 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux下Python安装.md)
-3. [虚拟机静态 IP 及多 IP 配置](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/虚拟机静态 IP 及多 IP 配置.md)
+3. [虚拟机静态 IP 及多 IP 配置](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/虚拟机静态IP及多IP配置.md)
 ### 二、Hadoop
-1. [Hadoop 单机环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Hadoop 单机环境搭建.md)
+1. [Hadoop 单机环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Hadoop单机环境搭建.md)
-2. [Hadoop 集群环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Hadoop 集群环境搭建.md)
+2. [Hadoop 集群环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Hadoop集群环境搭建.md)
-3. [基于 Zookeeper 搭建 Hadoop 高可用集群](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/基于 Zookeeper 搭建 Hadoop 高可用集群.md)
+3. [基于 Zookeeper 搭建 Hadoop 高可用集群](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/基于Zookeeper搭建Hadoop高可用集群.md)
 ### 三、Spark
-1. [Spark 开发环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/SparkSpark 开发环境搭建.md)
+1. [Spark 开发环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/SparkSpark开发环境搭建.md)
-2. [基于 Zookeeper 搭建 Spark 高可用集群](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Spark 集群环境搭建.md)
+2. [基于 Zookeeper 搭建 Spark 高可用集群](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Spark集群环境搭建.md)
 ### 四、Storm
-1. [Storm 单机环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Storm 单机环境搭建.md)
+1. [Storm 单机环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Storm单机环境搭建.md)
-2. [Storm 集群环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Storm 集群环境搭建.md)
+2. [Storm 集群环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Storm集群环境搭建.md)
 ### 五、HBase
-1. [HBase 单机环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/HBase 单机环境搭建.md)
+1. [HBase 单机环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/HBase单机环境搭建.md)
-2. [HBase 集群环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/HBase 集群环境搭建.md)
+2. [HBase 集群环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/HBase集群环境搭建.md)
 ### 六、Flume
-1. [Linux 环境下 Flume 的安装部署](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux 下 Flume 的安装.md)
+1. [Linux 环境下 Flume 的安装部署](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux下Flume的安装.md)
 ### 七、Azkaban
-1. [Azkaban3.x 编译及部署](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Azkaban_3.x_ 编译及部署.md)
+1. [Azkaban3.x 编译及部署](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Azkaban_3.x_编译及部署.md)
 ### 八、Hive
-1. [Linux 环境下 Hive 的安装部署](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux 环境下 Hive 的安装部署.md)
+1. [Linux 环境下 Hive 的安装部署](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux环境下Hive的安装部署.md)
 ### 九、Zookeeper
-1. [Zookeeper 单机环境和集群环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Zookeeper 单机环境和集群环境搭建.md) 
+1. [Zookeeper 单机环境和集群环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Zookeeper单机环境和集群环境搭建.md) 
 ### 十、Kafka
-1. [基于 Zookeeper 搭建 Kafka 高可用集群](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/基于 Zookeeper 搭建 Kafka 高可用集群.md)
+1. [基于 Zookeeper 搭建 Kafka 高可用集群](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/基于Zookeeper搭建Kafka高可用集群.md)
 ### 版本说明
`@ -5,7 +5,7 @@`

	`Flume 需要依赖 JDK 1.8+，JDK 安装方式见本仓库：`	`Flume 需要依赖 JDK 1.8+，JDK 安装方式见本仓库：`

	`> [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux 下 JDK 安装.md)`	`> [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux下JDK安装.md)`