Update Hadoop-MapReduce.md

This commit is contained in:
heibaiying 2019-04-22 22:39:25 +08:00 committed by GitHub
parent c3e8b9e6d5
commit 7a39ed0a8c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -110,6 +110,8 @@ Merge是怎样的如“aaa”从某个map task读取过来时值是5从另
## 四、MapReduce 词频统计案例
> 源码下载地址:[hadoop-word-count](https://github.com/heibaiying/BigData-Notes/tree/master/code/Hadoop/hadoop-word-count)
### 4.1 项目简介
这里给出一个经典的案例:词频统计。统计如下样本数据中每个单词出现的次数。
@ -132,12 +134,8 @@ HBase Hive
为方便大家开发,我在项目源码中放置了一个工具类`WordCountDataUtils`,用于产生词频统计样本文件:
> 本篇文章所有源码下载地址:[hadoop-word-count](https://github.com/heibaiying/BigData-Notes/tree/master/code/Hadoop/hadoop-word-count)
```java
/**
* 产生词频统计模拟数据
*/
public class WordCountDataUtils {
public static final List<String> WORD_LIST = Arrays.asList("Spark", "Hadoop", "HBase",
@ -231,7 +229,6 @@ public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritabl
}
```
**代码说明**
WordCountMapper对应下图的Mapping操作这里WordCountMapper继承自Mapper类这是一个泛型类定义如下
@ -270,8 +267,6 @@ public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritab
}
```
**代码说明**
这里的key是每个单词这里的values是一个可迭代的数据类型因为shuffling输出的数据实际上是下图中所示的这样的`key(1,1,1,1,1,1,1,.....)`
<div align="center"> <img src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/hadoop-code-reducer.png"/> </div>