Hive 2.1.0-用mysql作为metastore数据库-安装配置和问题
环境
- 虚拟机:
[root@master whm]# cat /etc/issue
CentOS release 6.5 (Final) Kernel \r on an \m
- Hadoop
伪分布式的hadoop 2.7.3
下载hive 2.1,解压,放到喜欢的文件夹下面
下载apache-hive-2.1.0-bin.tar.gz,解压,放到喜欢的位置.本文放在用户whm的home文件夹.
[whm@master ~]$ ll
total 80
drwxrwxr-x. 10 whm whm 4096 Jan 13 23:16 apache-hive-2.1.0-bin
配置HOME_HIVE和PATH
[whm@master ~]$ su root
Password:
[root@master whm]# vim /etc/profile
---------------
export HIVE_HOME=/home/whm/apache-hive-2.1.0-bin
export PATH=PATH:HIVE_HOME/bin
在hadoop里建立tmp和warehouse文件夹,更改权限
[whm@master ~]$ hdfs dfs -mkdir /tmp
[whm@master ~]$ hdfs dfs -mkdir /hive/warehouse
[whm@master ~]$ hdfs dfs -chmod 777 /tmp
[whm@master ~]$ hdfs dfs -chmod 777 /hive/warehouse
hive的hive-site.xml中,默认的warehouse地址是/user/hive/warehouse,也可以使用默认值.
安装mysql
使用mysql作为metastore数据库.
[root@master ~]$ yum search mysql
里面有一条:
mysql-server.x86_64 : The MySQL server and related files
安装它:
[root@master ~]$ yum install mysql-server.x86_64
安装mysql-connector并复制到hive的lib
刚才yum search的时候,有一条:
mysql-connector-java.noarch : Official JDBC driver for MySQL
安装它:
[root@master ~]$ yum install mysql-connector-java.noarch
安装好后在/usr/share/java/
文件夹.
拷贝到hive的lib文件夹:
[whm@master ~]$ cp /usr/share/java/mysql-connector-java ~/apache-hive-2.1.0-bin/lib/
在mysql中,添加hive metastore数据库,初始化
[whm@master ~]$ su root
Password:
[root@master whm]# service mysqld start
Starting mysqld: [ OK ]
[root@master whm]# mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
创建hive metastore数据库,本文这个数据库叫hive:
mysql> create database hive;
mysql> use hive;
使用hive的schematool初始化(在hive的bin目录):
schematool -dbType mysql -initSchema
The Hive distribution now includes an offline tool for Hive metastore schema manipulation. This tool can be used to initialize the metastore schema for the current Hive version. It can also handle upgrading the schema from an older version to current. It tries to find the current schema from the metastore if it is available. This will be applicable to future upgrades like 0.12.0 to 0.13.0. In case of upgrades from older releases like 0.7.0 or 0.10.0, you can specify the schema version of the existing metastore as a command line option to the tool.
The schematool figures out the SQL scripts required to initialize or upgrade the schema and then executes those scripts against the backend database. The metastore DB connection information like JDBC URL, JDBC driver and DB credentials are extracted from the Hive configuration. You can provide alternate DB credentials if needed.
注意 :
不知道可不可以不初始化,但是,如果初始化,一定要使用这个schemetool...一开始参考的一篇文章是手动初始化:
mysql> SOURCE $HIVE_HOME/scripts/metastore/upgrade/mysql/hive-schema-0.14.0.mysql.sql;
查看$HIVE_HOME/scripts/metastore/upgrade/mysql/,确实有这些文件:
[whm@master ~]$ ll apache-hive-2.1.0-bin/scripts/metastore/upgrade/mysql/
total 760
...
-rw-r--r--. 1 whm whm 35192 Jun 3 2016 hive-schema-1.2.0.mysql.sql
-rw-r--r--. 1 whm whm 34845 Jun 3 2016 hive-schema-1.3.0.mysql.sql
-rw-r--r--. 1 whm whm 34845 Jun 3 2016 hive-schema-2.0.0.mysql.sql
-rw-r--r--. 1 whm whm 35601 Jan 14 01:33 hive-schema-2.1.0.mysql.sql
-rw-r--r--. 1 whm whm 2845 Jun 3 2016 hive-txn-schema-0.13.0.mysql.sql
-rw-r--r--. 1 whm whm 2845 Jun 3 2016 hive-txn-schema-0.14.0.mysql.sql
-rw-r--r--. 1 whm whm 4139 Jun 3 2016 hive-txn-schema-1.3.0.mysql.sql
-rw-r--r--. 1 whm whm 3770 Jun 3 2016 hive-txn-schema-2.0.0.mysql.sql
-rw-r--r--. 1 whm whm 4327 Jan 14 01:32 hive-txn-schema-2.1.0.mysql.sql
...
但如果如法炮制2.1.0的话,就会报错:
Failed to open file 'hive-txn-schema-2.1.0.mysql.sql', error: 2
因为参考的0.14版本,虽然也有txn那个文件,但是scheme文件中完全没有用到txn文件
而2.1.0的scheme文件中有一句:
SOURCE hive-txn-schema-2.1.0.mysql.sql;
都没有绝对路径当然找不到了...
如果试图手动改成绝对路径重新执行的话,因为其他的sql语句里有一些insert,就会报一些Duplicate key name,Duplicate entry等错误...而且txn那个文件里建表不检查是否exists,多执行几遍又会报表已经存在错误..
**使用schemetool就可以避免这些麻烦,而且很明确的知道是否初始化成功. **
在mysql中,添加连接metastore使用的user,授权
mysql> create user 'hiveuser'@'%' identified by 'hivepassword';
mysql> grant all on *.* to 'hiveuser'@localhost identified by 'hivepassword';
mysql> flush privileges;
此处hiveuser替换为喜欢的名字,hivepassword替换为喜欢的密码.本文直接使用这俩...
配置hive-env.sh
复制template:
[whm@master ~]$ cp apache-hive-2.1.0-bin/conf/hive-env.sh.template apache-hive-2.1.0-bin/conf/hive-env.sh
配置:
# Set HADOOP_HOME to point to a specific hadoop install directory
# HADOOP_HOME=${bin}/../../hadoop
HADOOP_HOME=/home/whm/hadoop-2.7.3
# Hive Configuration Directory can be controlled by:
# export HIVE_CONF_DIR=
export HIVE_CONF_DIR=/home/whm/apache-hive-2.1.0-bin/conf
配置hive-site.sml
复制template:
[whm@master ~]$ cp apache-hive-2.1.0-bin/conf/hive-default.xml.template apache-hive-2.1.0-bin/conf/hive-site.xml
配置:
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<!--之前hdfs中创建的warehouse文件夹-->
<value>/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<!--这里的hive就是刚才建立的metastore数据库名字-->
<value>jdbc:mysql://localhost/hive?createDatabaseIfNotExist=true</value>
<description>metadata is stored in a MySQL server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<!--刚才建的mysql用户-->
<value>hiveuser</value>
<description>user name for connecting to mysql server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<!--刚才建的mysql用户的密码-->
<value>hivepassword</value>
<description>password for connecting to mysql server</description>
</property>
<property>
<name>hive.exec.scratchdir</name>
<!--之前在hdfs中建的tmp文件夹-->
<value>/tmp/hive</value>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
</property>
<!--以下5个属性在default文件中都是通过${system:java.io.tmpdir}/${system:user.name}定义的,改成自己在hive目录下建的iotmp-->
<property>
<name>hive.querylog.location</name>
<value>/home/whm/apache-hive-2.1.0-bin/iotmp/querylog</value>
<description>Location of Hive run time structured log file</description>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/home/whm/apache-hive-2.1.0-bin/iotmp/operation_logs</value>
<description>Top level directory where operation logs are stored if logging functionality is enabled</description>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/home/whm/apache-hive-2.1.0-bin/iotmp/operation_logs</value>
<description>Top level directory where operation logs are stored if logging functionality is enabled</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/home/whm/apache-hive-2.1.0-bin/iotmp/resource_dir</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/home/whm/apache-hive-2.1.0-bin/iotmp/scratchdir</value>
<description>Local scratch space for Hive jobs</description>
</property>
</configuration>
改后面5个属性的文件夹,是因为我的报过错,然后在网上找的解决办法:
Exception in thread "main" java.lang.RuntimeException: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
如果幸运的话,就可以用了
Hive console
[whm@master ~]$ hive
SLF4J: Class path contains multiple SLF4J bindings.
...
hive> create table test (id int, name string);
hive> show tables;
OK
test
Time taken: 0.163 seconds, Fetched: 1 row(s)
查看metastore:
mysql> use hive;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select * from TBLS;
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+
| 1 | 1484329787 | 1 | 0 | whm | 0 | 1 | test | MANAGED_TABLE | NULL | NULL |
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+
1 row in set (0.00 sec)
beeline连接hiveserver2
连接之前要先设置代理用户,否则会报错.
在hadoop的core-site.xml中,设置如下属性(proxyuser后面是运行hive的超级用户,本文用whm运行):
<property>
<name>hadoop.proxyuser.whm.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.whm.groups</name>
<value>*</value>
</property>
设置了以后, 无论使用什么用户登陆 (后面举例使用whatever),都使用hive超级用户 (本文是whm用户, 其启动了hiveserver2) 来代理whatever, 使whatever以whm的权限进行操作, 但所建立的表还是属于whatever.后面有例子.
在一个窗口中启动hiveserver2:
[whm@master ~]$ hiveserver2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/whm/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/whm/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
此时jps,多了一个RunJar:
[whm@master ~]$ jps
9456 SecondaryNameNode
9824 Jps
9282 DataNode
9149 NameNode
9583 RunJar
另一个窗口启动beeline,用户名写随便什么都可以,密码空,直接回车.(能做到这一点和我们前面把hdfs上的文件夹/tmp和/hive/warehouse权限设置为777有关.)
比如先用whm连接,建表, 再用随便起的whatever连接, 建表,效果如下:
whm连接:
[whm@master ~]$ beeline
Beeline version 1.2.1.spark2 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000: whm
Enter password for jdbc:hive2://localhost:10000:
17/01/15 18:57:29 INFO jdbc.Utils: Supplied authorities: localhost:10000
17/01/15 18:57:29 INFO jdbc.Utils: Resolved authority: localhost:10000
17/01/15 18:57:29 INFO jdbc.HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://localhost:10000
Connected to: Apache Hive (version 2.1.0)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000> show tables;
+-----------+--+
| tab_name |
+-----------+--+
| test |
+-----------+--+
1 row selected (1.832 seconds)
0: jdbc:hive2://localhost:10000> create table whmbeelineteset (id int, name string);
No rows affected (0.491 seconds)
0: jdbc:hive2://localhost:10000> show tables;
+-------------------+--+
| tab_name |
+-------------------+--+
| test |
| whmbeelineteset |
+-------------------+--+
2 rows selected (0.238 seconds)
0: jdbc:hive2://localhost:10000>
whatever连接:
[whm@master ~]$ beeline
Beeline version 1.2.1.spark2 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000: whatever
Enter password for jdbc:hive2://localhost:10000:
17/01/15 20:53:48 INFO jdbc.Utils: Supplied authorities: localhost:10000
17/01/15 20:53:48 INFO jdbc.Utils: Resolved authority: localhost:10000
17/01/15 20:53:48 INFO jdbc.HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://localhost:10000
Connected to: Apache Hive (version 2.1.0)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000> create table whatevertest (id int, name string);
No rows affected (0.116 seconds)
0: jdbc:hive2://localhost:10000> show tables;
+-------------------+--+
| tab_name |
+-------------------+--+
| test |
| whatevertest |
| whmbeelinetest |
+-------------------+--+
4 rows selected (0.079 seconds)
如果之前warehouse没有设置为777,会报错:
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=hiveuser, access=WRITE, inode="/hive/warehouse/hiveusertest":whm:supergroup:drwxrwxr-x
当然如果只用hive超级用户连接不必设置为777,设置为775就可以了.
此时hdfs上,warehouse:
[whm@master ~]$ hdfs dfs -ls /hive/warehouse
Found 3 items
drwxrwxrwx - whm supergroup 0 2017-01-14 01:49 /hive/warehouse/test
drwxrwxrwx - whatever supergroup 0 2017-01-15 20:54 /hive/warehouse/whatevertest
drwxrwxrwx - whm supergroup 0 2017-01-15 20:45 /hive/warehouse/whmbeelinetest
表whatevertest属于用户whatever, 其他的属于whm.
删除表:
0: jdbc:hive2://localhost:10000> drop table whmbeelinetest;
No rows affected (0.206 seconds)
0: jdbc:hive2://localhost:10000> show tables;
+---------------+--+
| tab_name |
+---------------+--+
| test |
| whatevertest |
+---------------+--+
2 rows selected (0.066 seconds)
warehouse:
[whm@master ~]$ hdfs dfs -ls /hive/warehouse
Found 2 items
drwxrwxrwx - whm supergroup 0 2017-01-14 01:49 /hive/warehouse/test
drwxrwxrwx - whatever supergroup 0 2017-01-15 20:54 /hive/warehouse/whatevertest
但还有一个疑问
为什么不设置代理用户,用whm连接会报错呢?不输密码报错:
Error: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: whm is not allowed to impersonate whm (state=,code=0)
User: whm is not allowed to impersonate whm
whm不允许模拟whm...
输密码 (输了用户在linux的密码) 也报错:
Error: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10000/default;auth=noSasl: java.net.ConnectException: Connection refused (state=08S01,code=0)
beeline也是whm启动的,hiveserver2也是whm启动的,whm都是它们的超级用户.
使用whm作为beeline连接时的用户名,为什么也需要设置代理用户呢...
其他
最初安装的时候不知怎么搞得把mysql搞坏了,hive那边可以正常建表,hdfs上也能看到,但是mysql的TBLS表里就是没有信息,Empty Set.于是完全卸载重装mysql:
yum remove mysql mysql-server mysql-libs compat-mysql51
rm -rf /var/lib/mysql #这步很重要
rm /etc/my.cnf
参考
http://bit1129.iteye.com/blog/2169918
http://mangocool.com/1465288694998.html
https://dzone.com/articles/how-configure-mysql-metastore
https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-RemoteMetastoreDatabase
http://blog.csdn.net/reesun/article/details/8556078