Update Hive分区表和分桶表.md

This commit is contained in:
heibaiying 2019-04-27 10:32:25 +08:00 committed by GitHub
parent 74c068c166
commit 59ab83c403
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -79,7 +79,7 @@ LOAD DATA LOCAL INPATH "/usr/file/emp30.txt" OVERWRITE INTO TABLE emp_partition
在HashMap中当我们给put()方法传递键和值时我们先对键调用hashCode()方法返回的hashCode用于找到bucket(桶)位置,最后将键值对存储在对应桶的链表结构中,链表达到一定阈值后会转换为红黑树(JDK1.8+)。下图为HashMap的数据结构图
<div align="center"> <img src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/HashMap-HashTable.png"/> </div>
<div align="center"> <img width="600px" src="https://github.com/heibaiying/BigData-Notes/blob/master/pictures/HashMap-HashTable.png"/> </div>
> 图片引用自:[HashMap vs. Hashtable](http://www.itcuties.com/java/hashmap-hashtable/)
@ -113,9 +113,6 @@ LOAD DATA LOCAL INPATH "/usr/file/emp30.txt" OVERWRITE INTO TABLE emp_partition
```sql
set hive.enforce.bucketing = true; --Hive 2.x不需要这一步
```
需要在插入分桶的时候hash, **也就是说向分桶表中插入数据的时候必然要执行一次MAPREDUCE,**
在Hive 0.x and 1.x版本必须使用设置`hive.enforce.bucketing = true`表示强制分桶允许程序根据表结构自动选择正确数量的Reducer和cluster by column来进行分桶。
#### 2. CTAS导入数据
@ -168,4 +165,4 @@ SELECT * FROM page_view WHERE dt='2009-02-25';
## 参考资料
1. [LanguageManual DDL BucketedTables](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL+BucketedTables)
1. [LanguageManual DDL BucketedTables](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL+BucketedTables)