add pictures
This commit is contained in:
parent
41624ec068
commit
5249b0476e
@ -223,3 +223,6 @@ memCheck.enabled=false
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -294,3 +294,6 @@ nodes:
|
||||
1. [Azkaban Flow 2.0 Design](https://github.com/azkaban/azkaban/wiki/Azkaban-Flow-2.0-Design)
|
||||
2. [Getting started with Azkaban Flow 2.0](https://github.com/azkaban/azkaban/wiki/Getting-started-with-Azkaban-Flow-2.0)
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -74,3 +74,6 @@ Azkaban 和 Oozie 都是目前使用最为广泛的工作流调度程序,其
|
||||
+ **配置方面**:Azkaban Flow 1.0 基于 Properties 文件来定义工作流,这个时候的限制可能会多一点。但是在 Flow 2.0 就支持了 YARM。YARM 语法更加灵活简单,著名的微服务框架 Spring Boot 就采用的 YAML 代替了繁重的 XML。
|
||||
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -266,3 +266,6 @@ env.execute();
|
||||
2. Streaming Connectors:https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/connectors/index.html
|
||||
3. Apache Kafka Connector: https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/connectors/kafka.html
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -282,3 +282,6 @@ bin/kafka-console-producer.sh --broker-list hadoop001:9092 --topic flink-stream-
|
||||
1. data-sources:https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/datastream_api.html#data-sources
|
||||
2. Streaming Connectors:https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/connectors/index.html
|
||||
3. Apache Kafka Connector: https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/connectors/kafka.html
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -309,3 +309,6 @@ someStream.filter(...).slotSharingGroup("slotSharingGroupName");
|
||||
## 参考资料
|
||||
|
||||
Flink Operators: https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/operators/
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -126,3 +126,6 @@ public WindowedStream<T, KEY, GlobalWindow> countWindow(long size, long slide) {
|
||||
## 参考资料
|
||||
|
||||
Flink Windows: https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/operators/windows.html
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -302,3 +302,6 @@ Flink 大多数版本都提供有 Scala 2.11 和 Scala 2.12 两个版本的安
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -171,3 +171,6 @@ Flink 的所有组件都基于 Actor System 来进行通讯。Actor system是多
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -368,3 +368,6 @@ state.checkpoints.dir: hdfs://namenode:40010/flink/checkpoints
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -114,3 +114,6 @@ flume-ng agent \
|
||||
可以看到 `flume-kafka` 主题的消费端已经收到了对应的消息:
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/flume-kafka-2.png"/> </div>
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -373,3 +373,6 @@ flume-ng agent \
|
||||
可以看到已经从 8888 端口监听到内容,并成功输出到控制台:
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/flume-example-9.png"/> </div>
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -386,3 +386,6 @@ public void getFileBlockLocations() throws Exception {
|
||||
<br/>
|
||||
|
||||
**以上所有测试用例下载地址**:[HDFS Java API](https://github.com/heibaiying/BigData-Notes/tree/master/code/Hadoop/hdfs-java-api)
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -139,3 +139,6 @@ hadoop fs -test - [defsz] URI
|
||||
# 示例
|
||||
hadoop fs -test -e filename
|
||||
```
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -174,3 +174,6 @@ HDFS 具有良好的跨平台移植性,这使得其他大数据计算框架都
|
||||
2. Tom White . hadoop 权威指南 [M] . 清华大学出版社 . 2017.
|
||||
3. [翻译经典 HDFS 原理讲解漫画](https://blog.csdn.net/hudiefenmu/article/details/37655491)
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -382,3 +382,6 @@ job.setNumReduceTasks(WordCountDataUtils.WORD_LIST.size());
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -126,3 +126,6 @@ YARN 中的任务将其进度和状态 (包括 counter) 返回给应用管理器
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -759,3 +759,6 @@ connection = ConnectionFactory.createConnection(config);
|
||||
1. [连接 HBase 的正确姿势](https://yq.aliyun.com/articles/581702?spm=a2c4e.11157919.spm-cont-list.1.146c27aeFxoMsN%20%E8%BF%9E%E6%8E%A5HBase%E7%9A%84%E6%AD%A3%E7%A1%AE%E5%A7%BF%E5%8A%BF)
|
||||
2. [Apache HBase ™ Reference Guide](http://hbase.apache.org/book.htm)
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -277,3 +277,6 @@ scan 'Student', FILTER=>"PrefixFilter('wr')"
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -488,3 +488,6 @@ hbase > get 'magazine','rowkey1','article:content'
|
||||
1. [Apache HBase Coprocessors](http://hbase.apache.org/book.html#cp)
|
||||
2. [Apache HBase Coprocessor Introduction](https://blogs.apache.org/hbase/entry/coprocessor_introduction)
|
||||
3. [HBase 高階知識](https://www.itread01.com/content/1546245908.html)
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -194,3 +194,6 @@ hbase> restore_snapshot '快照名'
|
||||
|
||||
1. [Online Apache HBase Backups with CopyTable](https://blog.cloudera.com/blog/2012/06/online-hbase-backups-with-copytable-2/)
|
||||
2. [Apache HBase ™ Reference Guide](http://hbase.apache.org/book.htm)
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -239,3 +239,6 @@ public class PhoenixJavaApi {
|
||||
# 参考资料
|
||||
|
||||
1. http://phoenix.apache.org/
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -86,3 +86,6 @@ Hbase 的表具有以下特点:
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -220,3 +220,6 @@ HBase 系统遵循 Master/Salve 架构,由三种不同类型的组件组成:
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -443,3 +443,6 @@ scan.setFilter(filterList);
|
||||
## 参考资料
|
||||
|
||||
[HBase: The Definitive Guide _> Chapter 4. Client API: Advanced Features](https://www.oreilly.com/library/view/hbase-the-definitive/9781449314682/ch04.html)
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -277,3 +277,6 @@ Hive 可选的配置参数非常多,在用到时查阅官方文档即可[Admin
|
||||
1. [HiveServer2 Clients](https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients)
|
||||
2. [LanguageManual Cli](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli)
|
||||
3. [AdminManual Configuration](https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration)
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -166,3 +166,6 @@ SELECT * FROM page_view WHERE dt='2009-02-25';
|
||||
## 参考资料
|
||||
|
||||
1. [LanguageManual DDL BucketedTables](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL+BucketedTables)
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -448,3 +448,6 @@ SHOW CREATE TABLE ([db_name.]table_name|view_name);
|
||||
## 参考资料
|
||||
|
||||
[LanguageManual DDL](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL)
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -327,3 +327,6 @@ SELECT * FROM emp_ptn;
|
||||
|
||||
1. [Hive Transactions](https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions)
|
||||
2. [Hive Data Manipulation Language](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML)
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -392,3 +392,6 @@ SET hive.exec.mode.local.auto=true;
|
||||
2. [LanguageManual Joins](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins)
|
||||
3. [LanguageManual GroupBy](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy)
|
||||
4. [LanguageManual SortBy](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy)
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -200,3 +200,6 @@ CREATE TABLE page_view(viewTime INT, userid BIGINT)
|
||||
3. [LanguageManual DDL](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL)
|
||||
4. [LanguageManual Types](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types)
|
||||
5. [Managed vs. External Tables](https://cwiki.apache.org/confluence/display/Hive/Managed+vs.+External+Tables)
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -234,3 +234,6 @@ SHOW INDEX ON emp;
|
||||
2. [Materialized views](https://cwiki.apache.org/confluence/display/Hive/Materialized+views)
|
||||
3. [Hive 索引](http://lxw1234.com/archives/2015/05/207.htm)
|
||||
4. [Overview of Hive Indexes](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Indexing)
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -390,3 +390,6 @@ broker 返回给消费者数据的等待时间,默认是 500ms。
|
||||
|
||||
1. Neha Narkhede, Gwen Shapira ,Todd Palino(著) , 薛命灯 (译) . Kafka 权威指南 . 人民邮电出版社 . 2017-12-26
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -159,3 +159,6 @@ Exception: Replication factor: 3 larger than available brokers: 1.
|
||||
|
||||
1. Neha Narkhede, Gwen Shapira ,Todd Palino(著) , 薛命灯 (译) . Kafka 权威指南 . 人民邮电出版社 . 2017-12-26
|
||||
2. [Kafka 高性能架构之道](http://www.jasongj.com/kafka/high_throughput/)
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -362,3 +362,6 @@ acks 参数指定了必须要有多少个分区副本收到消息,生产者才
|
||||
## 参考资料
|
||||
|
||||
1. Neha Narkhede, Gwen Shapira ,Todd Palino(著) , 薛命灯 (译) . Kafka 权威指南 . 人民邮电出版社 . 2017-12-26
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -65,3 +65,6 @@ Broker 是集群 (Cluster) 的组成部分。每一个集群都会选举出一
|
||||
## 参考资料
|
||||
|
||||
Neha Narkhede, Gwen Shapira ,Todd Palino(著) , 薛命灯 (译) . Kafka 权威指南 . 人民邮电出版社 . 2017-12-26
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -310,3 +310,6 @@ object ScalaApp extends App {
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -540,3 +540,6 @@ object ScalaApp extends App {
|
||||
|
||||
1. Martin Odersky . Scala 编程 (第 3 版)[M] . 电子工业出版社 . 2018-1-1
|
||||
2. 凯.S.霍斯特曼 . 快学 Scala(第 2 版)[M] . 电子工业出版社 . 2017-7
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -272,3 +272,6 @@ res6: Boolean = true
|
||||
## 参考资料
|
||||
|
||||
1. Martin Odersky . Scala 编程 (第 3 版)[M] . 电子工业出版社 . 2018-1-1
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -191,3 +191,6 @@ object ScalaApp extends App {
|
||||
|
||||
1. Martin Odersky . Scala 编程 (第 3 版)[M] . 电子工业出版社 . 2018-1-1
|
||||
2. 凯.S.霍斯特曼 . 快学 Scala(第 2 版)[M] . 电子工业出版社 . 2017-7
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -280,3 +280,6 @@ object ScalaApp extends App {
|
||||
|
||||
1. Martin Odersky . Scala 编程 (第 3 版)[M] . 电子工业出版社 . 2018-1-1
|
||||
2. 凯.S.霍斯特曼 . 快学 Scala(第 2 版)[M] . 电子工业出版社 . 2017-7
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -170,3 +170,6 @@ object ScalaApp extends App {
|
||||
1. Martin Odersky . Scala 编程 (第 3 版)[M] . 电子工业出版社 . 2018-1-1
|
||||
2. 凯.S.霍斯特曼 . 快学 Scala(第 2 版)[M] . 电子工业出版社 . 2017-7
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -209,3 +209,6 @@ println(s"Hello, ${name}! Next year, you will be ${age + 1}.")
|
||||
|
||||
1. Martin Odersky . Scala 编程 (第 3 版)[M] . 电子工业出版社 . 2018-1-1
|
||||
2. 凯.S.霍斯特曼 . 快学 Scala(第 2 版)[M] . 电子工业出版社 . 2017-7
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -131,3 +131,6 @@ IDEA 默认不支持 Scala 语言的开发,需要通过插件进行扩展。
|
||||
|
||||
1. Martin Odersky(著),高宇翔 (译) . Scala 编程 (第 3 版)[M] . 电子工业出版社 . 2018-1-1
|
||||
2. https://www.scala-lang.org/download/
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -410,3 +410,6 @@ true
|
||||
1. Martin Odersky . Scala 编程 (第 3 版)[M] . 电子工业出版社 . 2018-1-1
|
||||
2. 凯.S.霍斯特曼 . 快学 Scala(第 2 版)[M] . 电子工业出版社 . 2017-7
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -465,3 +465,6 @@ def min[T <: SuperComparable[T]](p: Pair[T]) = {}
|
||||
1. Martin Odersky . Scala 编程 (第 3 版)[M] . 电子工业出版社 . 2018-1-1
|
||||
2. 凯.S.霍斯特曼 . 快学 Scala(第 2 版)[M] . 电子工业出版社 . 2017-7
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -416,3 +416,6 @@ class Employee extends Person with InfoLogger with ErrorLogger {...}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -354,3 +354,6 @@ object Pair extends App {
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -257,3 +257,6 @@ res8: Boolean = false
|
||||
1. https://docs.scala-lang.org/overviews/collections/overview.html
|
||||
2. https://docs.scala-lang.org/overviews/collections/trait-traversable.html
|
||||
3. https://docs.scala-lang.org/overviews/collections/trait-iterable.html
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -145,3 +145,6 @@ DataFrame、DataSet 和 Spark SQL 的实际执行流程都是相同的:
|
||||
2. [Spark SQL, DataFrames and Datasets Guide](https://spark.apache.org/docs/latest/sql-programming-guide.html)
|
||||
3. [且谈 Apache Spark 的 API 三剑客:RDD、DataFrame 和 Dataset(译文)](https://www.infoq.cn/article/three-apache-spark-apis-rdds-dataframes-and-datasets)
|
||||
4. [A Tale of Three Apache Spark APIs: RDDs vs DataFrames and Datasets(原文)](https://databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html)
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -497,3 +497,6 @@ df.write.option(“maxRecordsPerFile”, 5000)
|
||||
1. Matei Zaharia, Bill Chambers . Spark: The Definitive Guide[M] . 2018-02
|
||||
2. https://spark.apache.org/docs/latest/sql-data-sources.html
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -337,3 +337,6 @@ object SparkSqlApp {
|
||||
## 参考资料
|
||||
|
||||
1. Matei Zaharia, Bill Chambers . Spark: The Definitive Guide[M] . 2018-02
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -183,3 +183,6 @@ empDF.join(broadcast(deptDF), joinExpression).show()
|
||||
## 参考资料
|
||||
|
||||
1. Matei Zaharia, Bill Chambers . Spark: The Definitive Guide[M] . 2018-02
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -235,3 +235,6 @@ RDD(s) 及其之间的依赖关系组成了 DAG(有向无环图),DAG 定义了
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -77,3 +77,6 @@ storm 和 Flink 都是真正意义上的流计算框架,但 Spark Streaming
|
||||
|
||||
1. [Spark Streaming Programming Guide](https://spark.apache.org/docs/latest/streaming-programming-guide.html)
|
||||
2. [What is stream processing?](https://www.ververica.com/what-is-stream-processing)
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -333,3 +333,6 @@ storm storm flink azkaban
|
||||
## 参考资料
|
||||
|
||||
Spark 官方文档:http://spark.apache.org/docs/latest/streaming-programming-guide.html
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -357,3 +357,6 @@ spark-submit \
|
||||
|
||||
- [streaming-flume-integration](https://spark.apache.org/docs/latest/streaming-flume-integration.html)
|
||||
- 关于大数据应用常用的打包方式可以参见:[大数据应用常用打包方式](https://github.com/heibaiying/BigData-Notes/blob/master/notes/大数据应用常用打包方式.md)
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -319,3 +319,6 @@ bin/kafka-console-producer.sh --broker-list hadoop001:9092 --topic spark-streami
|
||||
## 参考资料
|
||||
|
||||
1. https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -242,3 +242,6 @@ spark.sql("SELECT ename,job FROM global_temp.gemp").show()
|
||||
## 参考资料
|
||||
|
||||
[Spark SQL, DataFrames and Datasets Guide > Getting Started](https://spark.apache.org/docs/latest/sql-getting-started.html)
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -416,3 +416,6 @@ sc.parallelize(list).saveAsTextFile("/usr/file/temp")
|
||||
|
||||
[RDD Programming Guide](http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-programming-guide)
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -92,3 +92,6 @@ MLlib 是 Spark 的机器学习库。其设计目标是使得机器学习变得
|
||||
GraphX 是 Spark 中用于图形计算和图形并行计算的新组件。在高层次上,GraphX 通过引入一个新的图形抽象来扩展 RDD(一种具有附加到每个顶点和边缘的属性的定向多重图形)。为了支持图计算,GraphX 提供了一组基本运算符(如: subgraph,joinVertices 和 aggregateMessages)以及优化后的 Pregel API。此外,GraphX 还包括越来越多的图形算法和构建器,以简化图形分析任务。
|
||||
|
||||
##
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -103,3 +103,6 @@ sc.parallelize(broadcastVar.value).map(_ * 10).collect()
|
||||
|
||||
[RDD Programming Guide](http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-programming-guide)
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -246,3 +246,6 @@ spark-submit \
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -384,3 +384,6 @@ UPSERT INTO us_population VALUES('CA','San Diego',1255540);
|
||||
UPSERT INTO us_population VALUES('CA','San Jose',912332);
|
||||
```
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -385,3 +385,6 @@ $ sqoop import ... --map-column-java id=String,value=Integer
|
||||
## 参考资料
|
||||
|
||||
[Sqoop User Guide (v1.4.7)](http://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html)
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -145,3 +145,6 @@ if [ ! -d "${ZOOKEEPER_HOME}" ]; then
|
||||
fi
|
||||
```
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -313,3 +313,6 @@ jar:file:/usr/appjar/storm-hdfs-integration-1.0.jar!/defaults.yaml]
|
||||
## 参考资料
|
||||
|
||||
关于 maven-shade-plugin 的更多配置可以参考: [maven-shade-plugin 入门指南](https://www.jianshu.com/p/7a0e20b30401)
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -96,3 +96,6 @@ storm 和 Flink 都是真正意义上的实时计算框架。其对比如下:
|
||||
|
||||
1. [What is stream processing?](https://www.ververica.com/what-is-stream-processing)
|
||||
2. [流计算框架 Flink 与 Storm 的性能对比](http://bigdata.51cto.com/art/201711/558416.htm)
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -157,3 +157,6 @@ Task 是组成 Component 的代码单元。Topology 启动后,1 个 Component
|
||||
3. [Understanding the Parallelism of a Storm Topology](http://storm.apache.org/releases/1.2.2/Understanding-the-parallelism-of-a-Storm-topology.html)
|
||||
4. [Storm nimbus 单节点宕机的处理](https://blog.csdn.net/daiyutage/article/details/52049519)
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -509,3 +509,6 @@ private String productData() {
|
||||
|
||||
1. [Running Topologies on a Production Cluster](http://storm.apache.org/releases/2.0.0-SNAPSHOT/Running-topologies-on-a-production-cluster.html)
|
||||
2. [Pre-defined Descriptor Files](http://maven.apache.org/plugins/maven-assembly-plugin/descriptor-refs.html)
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -487,3 +487,6 @@ SimpleHBaseMapper mapper = new SimpleHBaseMapper()
|
||||
|
||||
1. [Apache HDFS Integration](http://storm.apache.org/releases/2.0.0-SNAPSHOT/storm-hdfs.html)
|
||||
2. [Apache HBase Integration](http://storm.apache.org/releases/2.0.0-SNAPSHOT/storm-hbase.html)
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -365,3 +365,6 @@ public class DefaultRecordTranslator<K, V> implements RecordTranslator<K, V> {
|
||||
## 参考资料
|
||||
|
||||
1. [Storm Kafka Integration (0.10.x+)](http://storm.apache.org/releases/2.0.0-SNAPSHOT/storm-kafka-client.html)
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -653,3 +653,6 @@ public class CustomRedisCountApp {
|
||||
## 参考资料
|
||||
|
||||
1. [Storm Redis Integration](http://storm.apache.org/releases/2.0.0-SNAPSHOT/storm-redis.html)
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -281,3 +281,6 @@ public class AclOperation {
|
||||
```
|
||||
|
||||
> 完整源码见本仓库: https://github.com/heibaiying/BigData-Notes/tree/master/code/Zookeeper/curator
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -334,3 +334,6 @@ public void permanentChildrenNodesWatch() throws Exception {
|
||||
Thread.sleep(1000 * 1000); //休眠以观察测试效果
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -263,3 +263,6 @@ Mode: standalone
|
||||
Node count: 167
|
||||
```
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -205,3 +205,6 @@ Zookeeper 还能解决大多数分布式系统中的问题:
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -122,3 +122,6 @@ tar -zxvf azkaban-solo-server-3.70.0.tar.gz
|
||||
<div align="center"> <img width="700px" src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/azkaban-web-ui.png"/> </div>
|
||||
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -268,3 +268,6 @@ the classpath/dependencies.
|
||||
+ [Standalone Cluster](https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/deployment/cluster_setup.html#standalone-cluster)
|
||||
+ [JobManager High Availability (HA)](https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/jobmanager_high_availability.html)
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -225,3 +225,6 @@ hadoop001
|
||||
验证方式二 :访问 HBase Web UI 界面,需要注意的是 1.2 版本的 HBase 的访问端口为 `60010`
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/hbase-60010.png"/> </div>
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -198,3 +198,6 @@ hadoop002 上的 HBase 出于备用状态:
|
||||
<br/>
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/hbase-集群搭建2.png"/> </div>
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -260,3 +260,6 @@ cp mapred-site.xml.template mapred-site.xml
|
||||
方式二:查看 Web UI 界面,端口号为 `8088`:
|
||||
|
||||
<div align="center"> <img width="700px" src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/hadoop-yarn安装验证.png"/> </div>
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -231,3 +231,6 @@ start-yarn.sh
|
||||
hadoop jar /usr/app/hadoop-2.6.0-cdh5.15.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.15.2.jar pi 3 3
|
||||
```
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -66,3 +66,6 @@ export JAVA_HOME=/usr/java/jdk1.8.0_201
|
||||
|
||||

|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -53,3 +53,6 @@ Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
|
||||
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)
|
||||
|
||||
```
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -69,3 +69,6 @@ Type "help", "copyright", "credits" or "license" for more information.
|
||||
[root@hadoop001 app]#
|
||||
```
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -179,3 +179,6 @@ Hive 内置了 HiveServer 和 HiveServer2 服务,两者都允许客户端使
|
||||
```
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/hive-beeline-cli.png"/> </div>
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -176,3 +176,6 @@ IDEA 默认不支持 Scala 语言的开发,需要通过插件进行扩展。
|
||||
|
||||
**另外在 IDEA 中以本地模式运行 Spark 项目是不需要在本机搭建 Spark 和 Hadoop 环境的。**
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -188,3 +188,6 @@ spark-submit \
|
||||
100
|
||||
```
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -79,3 +79,6 @@ nohup sh storm logviewer &
|
||||
验证方式二: 访问 8080 端口,查看 Web-UI 界面:
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/storm-web-ui.png"/> </div>
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -165,3 +165,6 @@ nohup sh storm logviewer &
|
||||
这里手动模拟主 `Nimbus` 异常的情况,在 hadoop001 上使用 `kill` 命令杀死 `Nimbus` 的线程,此时可以看到 hadoop001 上的 `Nimbus` 已经处于 `offline` 状态,而 hadoop002 上的 `Nimbus` 则成为新的 `Leader`。
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/storm集群搭建2.png"/> </div>
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -185,3 +185,6 @@ echo "3" > /usr/local/zookeeper-cluster/data/myid
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/zookeeper-hadoop002.png"/> </div>
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/zookeeper-hadoop003.png"/> </div>
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -512,3 +512,6 @@ yarn-daemon.sh start resourcemanager
|
||||
|
||||
[Hadoop NameNode 高可用 (High Availability) 实现解析](https://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop-name-node/index.html)
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -237,3 +237,6 @@ bin/kafka-topics.sh --describe --bootstrap-server hadoop001:9092 --topic my-repl
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -116,3 +116,6 @@ DEVICE=enp0s8
|
||||
使用时只需要根据所处的网络环境,勾选对应的网卡即可,不使用的网卡尽量不要勾选启动。
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/virtualbox启用网络.png"/> </div>
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -167,3 +167,6 @@ Scala 是一门综合了面向对象和函数式编程概念的静态类型的
|
||||
|
||||
以上就是个人关于大数据的学习心得和路线推荐。本片文章对大数据技术栈做了比较狭义的限定,随着学习的深入,大家也可以把 Python 语言、推荐系统、机器学习等逐步加入到自己的大数据技术栈中。
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -65,3 +65,6 @@ hadoop-2.6.0-cdh5.15.2.tar.gz
|
||||
hbase-1.2.0-cdh5.15.2
|
||||
hive-1.1.0-cdh5.15.2.tar.gz
|
||||
```
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -304,3 +304,6 @@ Strom 官方文档 Running Topologies on a Production Cluster 章节:
|
||||
+ maven-dependency-plugin : http://maven.apache.org/components/plugins/maven-dependency-plugin/
|
||||
|
||||
关于 maven-shade-plugin 的更多配置也可以参考该博客: [maven-shade-plugin 入门指南](https://www.jianshu.com/p/7a0e20b30401)
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -1,2 +1,5 @@
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/大数据技术栈思维导图.png"/> </div>
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
@ -54,3 +54,6 @@ ProcessOn 式一个在线绘图平台,使用起来非常便捷,可以用于
|
||||
|
||||
官方网站:https://www.processon.com/
|
||||
|
||||
|
||||
|
||||
<div align="center"> <img src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin-desc.png"/> </div>
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
x
Reference in New Issue
Block a user