Files

heibaiying e77082ade7 Update HiveCLI和Beeline命令行的基本使用.md

2019-04-30 15:47:00 +08:00

12 KiB

Raw Blame History

Hive CLI和Beeline命令行的基本使用

一、Hive CLI
        1.1 Help
        1.2 交互式命令行
        1.3 执行SQL命令
        1.4 执行SQL脚本
        1.5 配置Hive变量
        1.6 配置文件启动
        1.7 用户自定义变量
二、Beeline
        2.1 HiveServer2
        2.1 Beeline
        2.3 常用参数
三、Hive配置
        3.1 配置文件
        3.2 hiveconf
        3.3 set
        3.4 配置优先级
        3.5 配置参数

一、Hive CLI

1.1 Help

使用hive -H或者 hive --help命令可以查看所有命令的帮助，显示如下：

usage: hive
 -d,--define <key=value>          Variable subsitution to apply to hive 
                                  commands. e.g. -d A=B or --define A=B  --定义用户自定义变量
    --database <databasename>     Specify the database to use  -- 指定使用的数据库
 -e <quoted-query-string>         SQL from command line   -- 执行指定的SQL
 -f <filename>                    SQL from files   --执行SQL脚本
 -H,--help                        Print help information  -- 打印帮助信息
    --hiveconf <property=value>   Use value for given property    --自定义配置
    --hivevar <key=value>         Variable subsitution to apply to hive  --自定义变量
                                  commands. e.g. --hivevar A=B
 -i <filename>                    Initialization SQL file  --在进入交互模式之前运行初始化脚本
 -S,--silent                      Silent mode in interactive shell    --静默模式
 -v,--verbose                     Verbose mode (echo executed SQL to the  console)  --详细模式

1.2 交互式命令行

直接使用Hive命令，不加任何参数，即可进入交互式命令行。

1.3 执行SQL命令

在不进入交互式命令行的情况下，可以使用hive -e 执行SQL命令。

hive -e 'select * from emp';

1.4 执行SQL脚本

用于执行的sql脚本可以在本地文件系统，也可以在HDFS上。

# 本地文件系统
hive -f /usr/file/simple.sql;

# HDFS文件系统
hive -f hdfs://hadoop001:8020/tmp/simple.sql;

其中simple.sql内容如下：

select * from emp;

1.5 配置Hive变量

可以使用--hiveconf设置Hive运行时的变量。

hive -e 'select * from emp' \
--hiveconf hive.exec.scratchdir=/tmp/hive_scratch  \
--hiveconf mapred.reduce.tasks=4;

hive.exec.scratchdir：指定HDFS上目录位置，用于存储不同map/reduce阶段的执行计划和这些阶段的中间输出结果。

1.6 配置文件启动

使用-i可以在进入交互模式之前运行初始化脚本，相当于指定配置文件启动。

hive -i /usr/file/hive-init.conf;

其中hive-init.conf的内容如下：

set hive.exec.mode.local.auto = true;

hive.exec.mode.local.auto 默认值为false，这里设置为true ，代表开启本地模式。

1.7 用户自定义变量

--define <key=value> 和--hivevar <key=value> 在功能上是等价的，都是用来实现自定义变量，这里给出一个示例:

定义变量：

hive  --define  n=ename --hiveconf  --hivevar j=job;

在查询中引用自定义变量：

# 以下两条语句等价
hive > select ${n} from emp;
hive >  select ${hivevar:n} from emp;

# 以下两条语句等价
hive > select ${j} from emp;
hive >  select ${hivevar:j} from emp;

结果如下：

二、Beeline

2.1 HiveServer2

HiveServer2（HS2）允许远程客户端可以使用各种编程语言向Hive提交请求并检索结果，支持多客户端并发访问和身份验证。HS2是由多个服务组成的单个进程，其包括基于Thrift的Hive服务（TCP或HTTP）和用于Web UI的Jetty Web服务器。

HiveServer2拥有自己的CLI(Beeline)，Beeline是一个基于SQLLine的JDBC客户端，由于HiveServer2是Hive开发的重点，所以上面介绍的Hive CLI已经不推荐使用了，官方更加推荐使用Beeline。

2.1 Beeline

Beeline拥有更多可使用参数，可以使用beeline --help 查看，完整参数如下：

Usage: java org.apache.hive.cli.beeline.BeeLine
   -u <database url>               the JDBC URL to connect to
   -r                              reconnect to last saved connect url (in conjunction with !save)
   -n <username>                   the username to connect as
   -p <password>                   the password to connect as
   -d <driver class>               the driver class to use
   -i <init file>                  script file for initialization
   -e <query>                      query that should be executed
   -f <exec file>                  script file that should be executed
   -w (or) --password-file <password file>  the password file to read password from
   --hiveconf property=value       Use value for given property
   --hivevar name=value            hive variable name and value
                                   This is Hive specific settings in which variables
                                   can be set at session level and referenced in Hive
                                   commands or queries.
   --property-file=<property-file> the file to read connection properties (url, driver, user, password) from
   --color=[true/false]            control whether color is used for display
   --showHeader=[true/false]       show column names in query results
   --headerInterval=ROWS;          the interval between which heades are displayed
   --fastConnect=[true/false]      skip building table/column list for tab-completion
   --autoCommit=[true/false]       enable/disable automatic transaction commit
   --verbose=[true/false]          show verbose error messages and debug info
   --showWarnings=[true/false]     display connection warnings
   --showNestedErrs=[true/false]   display nested errors
   --numberFormat=[pattern]        format numbers using DecimalFormat pattern
   --force=[true/false]            continue running script even after errors
   --maxWidth=MAXWIDTH             the maximum width of the terminal
   --maxColumnWidth=MAXCOLWIDTH    the maximum width to use when displaying columns
   --silent=[true/false]           be more silent
   --autosave=[true/false]         automatically save preferences
   --outputformat=[table/vertical/csv2/tsv2/dsv/csv/tsv]  format mode for result display
   --incrementalBufferRows=NUMROWS the number of rows to buffer when printing rows on stdout,
                                   defaults to 1000; only applicable if --incremental=true
                                   and --outputformat=table
   --truncateTable=[true/false]    truncate table column when it exceeds length
   --delimiterForDSV=DELIMITER     specify the delimiter for delimiter-separated values output format (default: |)
   --isolation=LEVEL               set the transaction isolation level
   --nullemptystring=[true/false]  set to true to get historic behavior of printing null as empty string
   --maxHistoryRows=MAXHISTORYROWS The maximum number of rows to store beeline history.
   --convertBinaryArrayToString=[true/false]    display binary column data as string or as byte array
   --help                          display this message

2.3 常用参数

在Hive CLI中支持的参数，Beeline都支持，常用的参数如下。更多参数说明可以参见官方文档 Beeline Command Options

参数	说明
-u <database URL>	数据库地址
-n <username>	用户名
-p <password>	密码
-d <driver class>	驱动(可选)
-e <query>	执行SQL命令
-f <file>	执行SQL脚本
-i (or)--init <file or files>	在进入交互模式之前运行初始化脚本
--property-file <file>	指定配置文件
--hiveconf property=value	指定配置属性
--hivevar name=value	用户自定义属性，在会话级别有效

示例：使用用户名和密码连接Hive

$ beeline -u jdbc:hive2://localhost:10000  -n username -p password

三、Hive配置

可以通过三种方式对Hive的相关属性进行配置，分别介绍如下：

3.1 配置文件

方式一为使用配置文件，使用配置文件指定的配置是永久有效的。Hive有以下三个可选的配置文件：

hive-site.xml ：Hive的主要配置文件；
hivemetastore-site.xml：关于元数据的配置；
hiveserver2-site.xml：关于HiveServer2的配置。

示例如下,在hive-site.xml配置hive.exec.scratchdir：

 <property>
    <name>hive.exec.scratchdir</name>
    <value>/tmp/mydir</value>
    <description>Scratch space for Hive jobs</description>
  </property>

3.2 hiveconf

方式二为在启动命令行(Hive CLI / Beeline)的时候使用--hiveconf指定配置，这种方式指定的配置作用于整个Session。

hive --hiveconf hive.exec.scratchdir=/tmp/mydir

3.3 set

方式三为在交互式环境下(Hive CLI / Beeline)，使用set命令指定。这种设置的作用范围也是Session级别的，配置对于执行该命令后的所有命令生效。set兼具设置参数和查看参数的功能。如下：

0: jdbc:hive2://hadoop001:10000> set hive.exec.scratchdir=/tmp/mydir;
No rows affected (0.025 seconds)
0: jdbc:hive2://hadoop001:10000> set hive.exec.scratchdir;
+----------------------------------+--+
|               set                |
+----------------------------------+--+
| hive.exec.scratchdir=/tmp/mydir  |
+----------------------------------+--+

3.4 配置优先级

配置的优先顺序如下(由低到高)：
hive-site.xml - >hivemetastore-site.xml- > hiveserver2-site.xml - > -- hiveconf- > set

3.5 配置参数

Hive可选的配置参数非常多，在用到时查阅官方文档即可AdminManual Configuration

12 KiB Raw Blame History Unescape Escape