# BigData-Notes
> Java 程序员的大数据学习之路(2019) — — 持续更新中
Hadoop Hive Spark Flink Storm Flume Sqoop Azkaban HBase Kafka Zookeeper Scala
> 为方便查阅,所有软件的安装步骤统一整理至:[Linux下大数据常用软件安装指南](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Linux中大数据常用软件安装指南.md) ## 一、Hadoop 1. [分布式文件存储系统——HDFS](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Hadoop-HDFS.md) 2. [分布式计算框架——MapReduce](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Hadoop-MapReduce.md) 3. [集群资源管理器——YARN](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Hadoop-YARN.md) 4. [Hadoop单机伪集群环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/hadoop%E5%8D%95%E6%9C%BA%E7%89%88%E6%9C%AC%E7%8E%AF%E5%A2%83%E6%90%AD%E5%BB%BA.md) 5. [HDFS常用Shell命令](https://github.com/heibaiying/BigData-Notes/blob/master/notes/HDFS常用Shell命令.md) 6. HDFS Java API详解 ## 二、Hive 1. [数据仓库Hive简介](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Hive.md) 2. [Linux环境下Hive的安装部署](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux%E7%8E%AF%E5%A2%83%E4%B8%8BHive%E7%9A%84%E5%AE%89%E8%A3%85%E9%83%A8%E7%BD%B2.md) 3. 连接Hive的三种方式 4. [HiveCLI和Beeline命令行的基本使用](https://github.com/heibaiying/BigData-Notes/blob/master/notes/HiveCLI和Beeline命令行的基本使用.md) 5. [Hive 核心概念讲解](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Hive核心概念讲解.md) 6. [Hive 常用DDL操作](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Hive常用DDL操作.md) 7. [Hive 分区表和分桶表](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Hive分区表和分桶表.md) 8. [Hive 视图和索引](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Hive数据查询详解.md) 9. [Hive常用DML操作](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Hive常用DML操作.md) 10. [Hive 数据查询详解](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Hive数据查询详解.md) ## 三、Spark 1. Spark简介 2. [Spark单机版本环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Spark%E5%8D%95%E6%9C%BA%E7%89%88%E6%9C%AC%E7%8E%AF%E5%A2%83%E6%90%AD%E5%BB%BA.md) 3. RDD详解 4. Spark Transformation 和 Action ## 四、Flink TODO ## 五、Storm 1. [Storm和流处理简介](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Storm和流处理简介.md) 2. [Storm核心概念详解](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Storm核心概念详解.md) 3. [Storm单机版本环境搭建](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Storm%E5%8D%95%E6%9C%BA%E7%89%88%E6%9C%AC%E7%8E%AF%E5%A2%83%E6%90%AD%E5%BB%BA.md) 4. [Storm编程模型详解](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Storm编程模型详解.md) 5. [Storm项目三种打包方式对比分析](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Storm三种打包方式对比分析.md) 6. [Storm集成Redis详解](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Storm集成Redis详解.md) 7. [Storm集成HDFS/HBase](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Storm集成HBase和HDFS.md) 8. [Storm集成Kafka](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Storm集成Kakfa.md) ## 六、Flume 1. [Flume简介及基本使用](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Flume简介及基本使用.md) 2. [Linux环境下Flume的安装部署](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Linux%E4%B8%8BFlume%E7%9A%84%E5%AE%89%E8%A3%85.md) 3. [Flume整合Kafka](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Flume整合Kafka.md) ## 七、Sqoop 1. [Sqoop简介与安装](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Sqoop简介与安装.md) 2. [Sqoop的基本使用](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Sqoop基本使用.md) ## 八、Azkaban 1. [Azkaban简介](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Azkaban简介.md) 2. [Azkaban3.x 编译及部署](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Azkaban%203.x%20%E7%BC%96%E8%AF%91%E5%8F%8A%E9%83%A8%E7%BD%B2.md) 3. [Azkaban Flow 1.0 的使用](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Azkaban%20Flow%201.0%20%E7%9A%84%E4%BD%BF%E7%94%A8.md) 4. [Azkaban Flow 2.0 的使用](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Azkaban%20Flow%202.0%20%E7%9A%84%E4%BD%BF%E7%94%A8.md) ## 九、HBase 1. [Hbase 简介](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Hbase简介.md) 2. [HBase系统架构及数据结构](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Hbase%E7%B3%BB%E7%BB%9F%E6%9E%B6%E6%9E%84%E5%8F%8A%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84.md) 3. [HBase基本环境搭建(Standalone /pseudo-distributed mode)](https://github.com/heibaiying/BigData-Notes/blob/master/notes/installation/Hbase%E5%9F%BA%E6%9C%AC%E7%8E%AF%E5%A2%83%E6%90%AD%E5%BB%BA.md) 4. [HBase常用Shell命令](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Hbase%20Shell.md) 5. [HBase Java API](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Hbase%20Java%20API.md) 6. [Hbase 过滤器详解](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Hbase过滤器详解.md) 7. [HBase 协处理器详解](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Hbase协处理器详解.md) 8. [HBase 容灾与备份](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Hbase%E5%AE%B9%E7%81%BE%E4%B8%8E%E5%A4%87%E4%BB%BD.md) 9. [HBase的SQL中间层——Phoenix](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Hbase%E7%9A%84SQL%E5%B1%82%E2%80%94%E2%80%94Phoenix.md) 10. [Spring/Spring Boot 整合 Mybatis + Phoenix](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Spring%2BMybtais%2BPhoenix%E6%95%B4%E5%90%88.md) ## 十、Kafka 1. [Kafka 核心概念介绍](https://github.com/heibaiying/BigData-Notes/blob/master/notes/Kafka核心概念介绍.md) 2. 基于Zookeeper搭建Kafka高可用集群 3. Kafka生产者详解 4. Kafka消费者详解 5. Kafka 副本机制以及选举原理剖析 6. Kafka的数据可靠性 ## 十一、Zookeeper 1. Zookeeper 简介及核心概念 2. Zookeeper集群搭建Zookeeper 3. Zookeeper分布式锁实现方案 4. 集群升级、迁移深入分析 Zookeeper 5. Zab协议及选举机制 ## 十二、Scala