更换图片源

This commit is contained in:
罗祥
2020-05-25 11:05:03 +08:00
parent 0d6f0c8cc6
commit 7bcf53a7b2
85 changed files with 391 additions and 391 deletions

View File

@@ -34,7 +34,7 @@ MapReduce 作业通过将输入的数据集拆分为独立的块,这些块由
这里以词频统计为例进行说明MapReduce 处理的流程如下:
<div align="center"> <img width="600px" src="../pictures/mapreduceProcess.png"/> </div>
<div align="center"> <img width="600px" src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/mapreduceProcess.png"/> </div>
1. **input** : 读取文本文件;
@@ -50,7 +50,7 @@ MapReduce 编程模型中 `splitting` 和 `shuffing` 操作都是由框架实现
## 三、combiner & partitioner
<div align="center"> <img width="600px" src="../pictures/Detailed-Hadoop-MapReduce-Data-Flow-14.png"/> </div>
<div align="center"> <img width="600px" src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/Detailed-Hadoop-MapReduce-Data-Flow-14.png"/> </div>
### 3.1 InputFormat & RecordReaders
@@ -68,11 +68,11 @@ MapReduce 编程模型中 `splitting` 和 `shuffing` 操作都是由框架实现
不使用 combiner 的情况:
<div align="center"> <img width="600px" src="../pictures/mapreduce-without-combiners.png"/> </div>
<div align="center"> <img width="600px" src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/mapreduce-without-combiners.png"/> </div>
使用 combiner 的情况:
<div align="center"> <img width="600px" src="../pictures/mapreduce-with-combiners.png"/> </div>
<div align="center"> <img width="600px" src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/mapreduce-with-combiners.png"/> </div>
@@ -145,7 +145,7 @@ public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritabl
`WordCountMapper` 对应下图的 Mapping 操作:
<div align="center"> <img src="../pictures/hadoop-code-mapping.png"/> </div>
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/hadoop-code-mapping.png"/> </div>
@@ -187,7 +187,7 @@ public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritab
如下图,`shuffling` 的输出是 reduce 的输入。这里的 key 是每个单词values 是一个可迭代的数据类型,类似 `(1,1,1,...)`
<div align="center"> <img src="../pictures/hadoop-code-reducer.png"/> </div>
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/hadoop-code-reducer.png"/> </div>
### 4.4 WordCountApp
@@ -290,7 +290,7 @@ hadoop fs -ls /wordcount/output/WordCountApp
hadoop fs -cat /wordcount/output/WordCountApp/part-r-00000
```
<div align="center"> <img src="../pictures/hadoop-wordcountapp.png"/> </div>
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/hadoop-wordcountapp.png"/> </div>
@@ -311,11 +311,11 @@ job.setCombinerClass(WordCountReducer.class);
没有加入 `combiner` 的打印日志:
<div align="center"> <img src="../pictures/hadoop-no-combiner.png"/> </div>
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/hadoop-no-combiner.png"/> </div>
加入 `combiner` 后的打印日志如下:
<div align="center"> <img src="../pictures/hadoop-combiner.png"/> </div>
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/hadoop-combiner.png"/> </div>
这里我们只有一个输入文件并且小于 128M所以只有一个 Map 进行处理。可以看到经过 combiner 后records 由 `3519` 降低为 `6`(样本中单词种类就只有 6 种),在这个用例中 combiner 就能极大地降低需要传输的数据量。
@@ -368,7 +368,7 @@ job.setNumReduceTasks(WordCountDataUtils.WORD_LIST.size());
执行结果如下,分别生成 6 个文件,每个文件中为对应单词的统计结果:
<div align="center"> <img src="../pictures/hadoop-wordcountcombinerpartition.png"/> </div>
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/hadoop-wordcountcombinerpartition.png"/> </div>