hadoop集群HA环境搭建
- 准备工具
- VMware12安装包, 破解序列号:
5A02H-AU243-TZJ49-GTC7K-3C61N
- ubuntu 14.04 安装包 (如果采用克隆或者直接复制的虚拟机,VMware需要重新生成一个mac地址)
- hadoop-2.7.1
- zookeeper-3.4.8
- 7台虚拟机
- 虚拟机需要全部关闭防火墙,完全分布式模式也要将所有机器的防火墙关闭!否则zookeeper集群无法启动;
- 网络模式,使用NAT 模式,即所有虚拟机和主机中的VM8虚拟网卡共享一个IP地址,7台虚拟机网络需要在同一个网段,这样便可以使用xshell或者secureCRT方便来操控7台虚拟机,保证主机能够ping通7台虚拟机的
- 部署模式,完全分布式模式,其中两台为namenode,两台为yarn,三台主机为slaves。
- 实际情况下一台yarn也是可以的
- 注意:所以机器的系统时间要保证相差在10分钟以内,否则任务执行会失败;
- 修改每一台主机的主机名
vim /etc/hostname
user1
user2
user3
user4
user5
user6
user7
- 其中user1、user2为namenode,user3和user为yarn,user5、user6、user7为slaves节点,同时user5、user6、user7上面安装zookeeper
- ip地址配置
vim /etc/network/interfaces
,其中user1的网络ip设置为
# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface
auto eth0
#iface eth0 inet dhcp
iface eth0 inet static
address 192.168.18.11
netmask 255.255.255.0
gateway 192.168.18.10
dns-nameservers 192.168.18.10
- user1-user7 ip分别从12-17
- 培训的虚拟机已经关闭防火墙,另外dns需要加上name
- 修改hosts,实现使用主机名ping通
vim /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.18.11 user1
192.168.18.12 user2
192.168.18.13 user3
192.168.18.14 user4
192.168.18.15 user5
192.168.18.16 user6
192.168.18.17 user7
- 修改完毕后,所有节点使用
reboot
命令全部重启机器,一定要重启机器或者重启网络服务,否则/etc/hosts
文件很可能不会生效;
- 主机之间ssh免密码登陆
- 在每一个虚拟机上使用命令
ssh-keygen -t rsa
,一直回车,得到一个公钥、私钥、认证文件 - 在虚拟机user1至user7上分别将公钥全部拷贝给user7(包括自己一共7次) 使用命令 :
ssh-copy-id -i /root/.ssh/id_rsa.pub user7
- 然后
cat /root/.ssh/authorized_keys
查看user7的认证文件中是否有7条认证信息 - 然后将user7上的认证文件分别拷贝到user1-user6,覆盖他们的原有的认证文件
scp /root/.ssh/authorized_keys user1:/root/.ssh/authorized_keys
scp /root/.ssh/authorized_keys user2:/root/.ssh/authorized_keys
scp /root/.ssh/authorized_keys user3:/root/.ssh/authorized_keys
scp /root/.ssh/authorized_keys user4:/root/.ssh/authorized_keys
scp /root/.ssh/authorized_keys user5:/root/.ssh/authorized_keys
scp /root/.ssh/authorized_keys user6:/root/.ssh/authorized_keys
- 至此,主机之间ssh免密码登陆基本完成,此时需要在每一个节点上分别ssh一次其他的节点,一共6*6=36次,以防造成通信情况
- 安装jdk虚拟机
- 在user1机器上新建一个目录 :
mkdir /ittest
- 将准备好的jdk安装包解压到该路径下:
tar -zxvf jdk-8u72-linux-x64.tar.gz -C /ittest/
- 设置环境变量 :
vim /etc/profile
- 在文件/etc/profile 中添加以下内容:
export JAVA_HOME=/ittest/jdk1.8.0_72/
export JRE_HOME=/ittest/jdk1.8.0_72/jre
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
- 然后使用命令
source /etc/profile
使得环境变量立即生效 - 将user1的jdk安装文件和环境变量文件拷贝到其他六个节点上 :
scp -r /ittest/ user2:/
scp -r /ittest/ user3:/
scp -r /ittest/ user4:/
scp -r /ittest/ user5:/
scp -r /ittest/ user6:/
scp -r /ittest/ user7:/
scp /etc/profile user2:/etc/profile
scp /etc/profile user3:/etc/profile
scp /etc/profile user4:/etc/profile
scp /etc/profile user5:/etc/profile
scp /etc/profile user6:/etc/profile
scp /etc/profile user7:/etc/profile
- 然后分别在其他六个节点上使用命令 :
source /etc/profile
- 在七个节点上分别使用
java
和javac
或者jps
命令,如果不报错显示一堆参数,则jdk所有节点安装正常。(hadoop是使用java开发的,所有的进程必须运行在JVM虚拟机上)
- 安装zookeeper
- 将准备好的zookeeper安装包解压至路径
tar -zxvf zookeeper-3.4.8.tar.gz -C /ittest/
- 进入路径
/ittest/zookeeper-3.4.8/conf
- 将示例配置文件复制一份
cp zoo_sample.cfg zoo.cfg
- zoo.cfg配置文件中修改
dataDir=/ittest/zookeeper-3.4.8/tmp
,尾行新增:
server.1=user5:2888:3888
server.2=user6:2888:3888
server.3=user7:2888:3888
“ server.id=host:port:port. ”指示了不同的 ZooKeeper 服务器的自身标识,作为集群的一部分的机器应该知道 ensemble 中的其它机器。用户可以从“ server.id=host:port:port. ”中读取相关的信息。
- 然后在
/ittest/zookeeper-3.4.8
路径下新建文件夹tmp
- ( dataDir 参数所指定的目录)目录下创建一个文件名为 myid 的文件,这个文件中仅含有一行的内容,指定的是自身的 id 值。比如,服务器“ 1 ”应该在 myid 文件中写入“ 1 ”。这个 id 值必须是 ensemble 中唯一的,且大小在 1 到 255 之间。这一行配置中,第一个端口( port )是从( follower )机器连接到主( leader )机器的端口,第二个端口是用来进行 leader 选举的端口。在这个例子中,每台机器使用三个端口,分别是: clientPort ,2181 ; port , 2888 ; port , 3888 。
- 然后将zookeeper安装文件夹复制到user5、user6、user7三个节点上:
scp -r /ittest/zookeeper-3.4.8 user5:/ittest/
scp -r /ittest/zookeeper-3.4.8 user6:/ittest/
scp -r /ittest/zookeeper-3.4.8 user7:/ittest/
- 修改路径
/ittest/zookeeper-3.4.8/tmp/myid
文件,user5为1,user6为2,user7为3 - 使用命令
/ittest/zookeeper-3.4.8/bin/zkServer.sh start
来启动user5、user6、user7上的zookeeper - 全部启动后使用命令:
/ittest/zookeeper-3.4.8/bin/zkServer.sh status
来检查三个节点上的启动状况,一般是两个follower ,一个leader
8.安装hadoop(暂时先在user3上安装一台yarn,稍后再user4上配置第二台yarn)
- 将hadoop安装包解压到
/ittest/
,使用命令tar -zxvf hadoop-2.7.1.tar.gz -C /ittest/
-
cd /ittest/hadoop-2.7.1/etc/hadoop
路径下,然后开始配置文件
hadoop-env.sh
core-site.xml
hdfs-site.xml
mapred-site.xml
yarn-site.xml (需要使用示例文件新建一个)
slaves
- hadoop-env.sh配置文件在尾部加上配置jdk环境变量:
export JAVA_HOME=/ittest/jdk1.8.0_72
- core-site.xml配置文件:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!--指定hdfs的nameservice为ns1 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns1</value>
</property>
<!--指定hadoop临时目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/ittest/hadoop-2.7.1/tmp</value>
</property>
<!--指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>user5:2181,user6:2181,user7:2181</value>
</property>
</configuration>
- hdfs-site.xml文件配置:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!--指定hdfs的nameservice为ns1,需要和core-site.xml中保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>ns1</value>
</property>
<!--ns1下面有两个NameNode,分别是nn1,nn2 -->
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2</value>
</property>
<!-- nn1的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns1.nn1</name>
<value>user1:9000</value>
</property>
<!-- nn1的http通信地址 -->
<property>
<name>dfs.namenode.http-address.ns1.nn1</name>
<value>user1:50070</value>
</property>
<!-- nn2的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns1.nn2</name>
<value>user2:9000</value>
</property>
<!-- nn2的http通信地址 -->
<property>
<name>dfs.namenode.http-address.ns1.nn2</name>
<value>user2:50070</value>
</property>
<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://user5:8485;user6:8485;user7:8485/ns1</value>
</property>
<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/ittest/hadoop-2.7.1/journal</value>
</property>
<!-- 开启NameNode失败自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.ns1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔离机制 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<!-- 使用隔离机制时需要ssh免登陆 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
</configuration>
- mapred-site.xml配置文件:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 指定mr框架为yarn方式 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
- yarn-site.xml配置文件:
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- 指定resourcemanager地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>user3</value>
</property>
<!-- 指定nodemanager启动时加载server的方式为shuffle server -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
- slave文件配置:
user5
user6
user7
- 将配置好的hadoop文件夹scp到其他的节点上:
scp -r /ittest/hadoop-2.7.1/ user2:/ittest/
scp -r /ittest/hadoop-2.7.1/ user3:/ittest/
scp -r /ittest/hadoop-2.7.1/ user4:/ittest/
scp -r /ittest/hadoop-2.7.1/ user5:/ittest/
scp -r /ittest/hadoop-2.7.1/ user6:/ittest/
scp -r /ittest/hadoop-2.7.1/ user7:/ittest/
- 启动zookeeper集群(分别在user5、user6、user7上启动zk),(如果之前有启动则无需)
cd /ittest/zookeeper-3.4.8/bin/
./zkServer.sh start
- 查看状态:(三个节点必须全部启动后再查看状态)
./zkServer.sh status
- (一个leader,两个follower,出现这个则正常,如果报错直接就重新配置)
- 启动journalnode(在server01上启动所有journalnode)
- 如果不启动这个,会造成两个namenode之间无法通信;
cd /ittest/hadoop-2.7.1
sbin/hadoop-daemons.sh start journalnode
- (user5、user6、user7上运行jps命令检验,多了JournalNode进程)
- 格式化HDFS
- 在user1上执行命令:
hadoop namenode -format
- 格式化后会在根据core-site.xml中的hadoop.tmp.dir配置生成个文件,
- 这里我配置的是/ittest/hadoop-2.7.1/tmp,然后将/ittest/hadoop-2.7.1/tmp拷贝到user2的/ittest/hadoop-2.7.1/下。
scp -r tmp/ server02:/ittest/hadoop-2.7.1/
- 格式化ZK(在server01上执行即可)
hdfs zkfc -formatZK
-执行完后在 zookeep主机(user5或user6或user7)上测试!
cd /ittest/zookeeper3.4.8/bin
./zkCli.sh
- 注意,11和12两个步骤只能执行一次,如果执行了第二次,会导致后面datanode无法启动,若执行了第二次,建议删除user2-user7上的hadoop文件夹,删除user1上hadoop目录tmp目录下的所有文件,然后将user1的hadoop文件夹scp -r 到user2-user7重新格式化。
- 注意:如果两个namenode都是standby模式,可以使用命令
hdfs haadmin -transitionToActive --forcemanual nn1
命令直接强制切换一个节点为active;
启动HDFS(在user1上执行)
sbin/start-dfs.sh
启动YARN(在user1和 user3上都要执行)
sbin/start-yarn.sh
到此,hadoop2.7.1配置完毕,可以统计浏览器访问:
http://192.168.18.11:50070
NameNode 'user1:9000' (active)
http://192.168.18.12:50070
NameNode 'user2:9000' (standby)
- 此时user1和user2上面jps 会有进程
1346 NameNode
1480 DFSZKFailoverController
如果网页中summary中数据都是0,则能有可能是datanode没有启动;如果两个页面模式都是standby,则应该是zk通信问题,检查/etc/hosts文件,检查user5\user6\user7 上面
/ittest/zookeeper-3.4.8/bin/zkServer.sh status
状态,全部出现follower 或者leader才是正常的此时user5-user7上面的jps进程为:
1216 JournalNode
3633 Jps
1365 DataNode
1126 QuorumPeerMain
2988 NodeManager
- 如果此时
kill -9 psid
将user1 namenode 进程杀掉可以看到user2变成active状态! - 使用命令
sbin/hadoop-daemon.sh start namenode
重新将namenode进程启动,注意是daemon不是daemons那个!
- 验证HDFS HA
- 首先向hdfs上传一个文件
/ittest/hadoop-2.7.1/bin/hadoop fs -put /etc/profile /profile
- 查看hdfs根目录 :
/ittest/hadoop-2.7.1/bin/hadoop fs -ls /
- 然后再kill掉active的NameNode
kill -9 <pid of NN>
- 通过浏览器访问:
http://192.168.18.11:50070
NameNode 'user2:9000' (active)
- 这个时候user2上的NameNode变成了active
- 再执行命令:
/ittest/hadoop-2.7.1/bin/hadoop fs -ls /
-rw-r--r-- 3 root supergroup 1926 2014-02-06 15:36 /profile
刚才上传的文件依然存在!!!
手动启动那个挂掉的NameNode
/ittest/hadoop-2.7.1/sbin/hadoop-daemon.sh start namenode
通过浏览器访问:http://192.168.18.11:50070
NameNode 'user1:9000' (standby)
- 验证YARN:运行一下hadoop提供的demo中的WordCount程序:
/ittest/hadoop-2.7.1/bin/hadoop jar /ittest/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /copyprofile.sh /out4
-
http://192.168.18.13:8088/cluster
可以检查运行状况
- hadoop环境变量设置:
修改user1 上面/etc/profile文件
export JAVA_HOME=/ittest/jdk1.8.0_72
export JRE_HOME=/ittest/jdk1.8.0_72/jre
export HADOOP_HOME=/ittest/hadoop-2.7.1
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
- 然后
source /etc/profile
- 然后将user1 /etc/profile 文件夹拷贝到其他的节点上,
scp /etc/profile user2:/etc/profile
scp /etc/profile user3:/etc/profile
scp /etc/profile user4:/etc/profile
scp /etc/profile user5:/etc/profile
scp /etc/profile user6:/etc/profile
scp /etc/profile user7:/etc/profile
- 然后到其他的所有节点上运行
source /etc/profile
- 检验:
hadoop fs -ls /
,弹出目录则正常。
- 将user4做成yarn,修改user1节点上的/ittest/hadoop-2.7.1/etc/hadoop 中的yarn-site.xml,可以保留原来的一份
mv yarn-site.xml yarn.site.xml.noHA
cp yarn.site.xml.noHA yarn-site.xml
- yarn-site.xml文件配置为:
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarncluster</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>user3</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>user4</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>user3:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>user4:8088</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>user5:2181,user6:2181,user7:2181</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.nodemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.address</name>
<value>0.0.0.0:45454</value>
</property>
</configuration>
- 注意,上述所有配置文件中都有中文注释,建议将所有中文注释去掉,以防出现中文编码问题导致的错误。
- 现在将所有的yarn-site.xml scp 到其他节点
scp /ittest/hadoop-2.7.1/etc/hadoop/yarn-site.xml user2:/ittest/hadoop-2.7.1/etc/hadoop/yarn-site.xml
scp /ittest/hadoop-2.7.1/etc/hadoop/yarn-site.xml user3:/ittest/hadoop-2.7.1/etc/hadoop/yarn-site.xml
scp /ittest/hadoop-2.7.1/etc/hadoop/yarn-site.xml user4:/ittest/hadoop-2.7.1/etc/hadoop/yarn-site.xml
scp /ittest/hadoop-2.7.1/etc/hadoop/yarn-site.xml user5:/ittest/hadoop-2.7.1/etc/hadoop/yarn-site.xml
scp /ittest/hadoop-2.7.1/etc/hadoop/yarn-site.xml user6:/ittest/hadoop-2.7.1/etc/hadoop/yarn-site.xml
scp /ittest/hadoop-2.7.1/etc/hadoop/yarn-site.xml user7:/ittest/hadoop-2.7.1/etc/hadoop/yarn-site.xml
- user1\user3执行
stop-yarn.sh
- 然后user1\user3\user4执行
start-yarn.sh
- 至此,hadoop 2.0 HA安装完成
Hive的部署(为后期补上,简略写写)
- hive部署必须建立在hadoop基础之上
- mysql的安装
如果集群是可以直接连接互联网的,那么可以直接在线安装,centos系统使用
yum
命令,ubuntu系统使用sudo apt-get install
命令;如果集群机器不能连接互联网,则需要从mysql官网下载安装包;安装包分为源码安装和二进制包解压安装,这里使用二进制包解压安装;
将mysql安装包下载,然后上传到服务器
namenode
(hive安装我选择在namenode部署mysql,hive直接从mysql获取元数据);使用tar -zxvf
命令解压到指定目录;mysql安装是需要安装依赖库的,linux自带的好像不够;反正缺什么包就去百度搜索;比如:cmake和 libaio包;
解压文件后,修改
/etc/profile
文件,添加环境变量;然后启动mysql
service mysqld start
- 以root用户登录
mysql -uroot -p
- 跳过输入密码方法(百度一大堆,这里就不写了)
- 注意:mysql5.7以后修改密码,如果使用代码:
update mysql.user set password=password('root') where user ='root'
可能会报错;ERROR 1054(42S22) Unknown column 'password' in field list
因为mysql5.7版本以后,mysql.user表中的字段名字已经改了,此时使用命令:
update mysql.user set authentication_string=password('root') where user='root'
- 最关键的是;mysql中root用户的权限一定要设置!!!
mysql > grant all privileges on *.* to 'root'@'%' with grant option;
mysql > grant all privileges on *.* to 'root'@'%' identified by '123';
mysql > flush privileges;
- hive的部署
将下载的hive包解压到指定的安装目录,然后修改
/etc/profile
新加入环境变量;然后修改
hive-env.sh
和hive-site.xml
文件,我的配置文件如下(忘了从哪儿抄来的),虽然很长,但是需要修改的参数就几个,比较简单:hive-env.sh
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Set Hive and Hadoop environment variables here. These variables can be used
# to control the execution of Hive. It should be used by admins to configure
# the Hive installation (so that users do not have to set environment variables
# or set command line parameters to get correct behavior).
#
# The hive service being invoked (CLI/HWI etc.) is available via the environment
# variable SERVICE
# Hive Client memory usage can be an issue if a large number of clients
# are running at the same time. The flags below have been useful in
# reducing memory usage:
#
# if [ "$SERVICE" = "cli" ]; then
# if [ -z "$DEBUG" ]; then
# export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit"
# else
# export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"
# fi
# fi
# The heap size of the jvm stared by hive shell script can be controlled via:
#
# export HADOOP_HEAPSIZE=1024
#
# Larger heap size may be required when running queries over large number of files or partitions.
# By default hive shell scripts use a heap size of 256 (MB). Larger heap size would also be
# appropriate for hive server (hwi etc).
# Set HADOOP_HOME to point to a specific hadoop install directory
# HADOOP_HOME=${bin}/../../hadoop
HADOOP_HOME=/usr/local/hadoop/hadoop-2.8.1
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/usr/local/hive/conf
# Folder containing extra ibraries required for hive compilation/execution can be controlled by:
export HIVE_AUX_JARS_PATH=/usr/local/hive/lib
- hive-site.xml的设置主要是以下几个,其他都是打酱油的使用默认参数就好了;
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://172.16.244.235:3306/hive?createDatabaseIfNotExist=true&useSSL=false</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123</value>
<description>password to use against metastore database</description>
- 拷贝JDBC驱动包
- 这个包很重要,是java进程用来读取mysql的驱动包,hive通过它来进行读写mysql的元数据;
- 下载路径:http://central.maven.org/maven2/mysql/mysql-connector-java/5.1.38/mysql-connector-java-5.1.38.jar
- 下载后将该包拷贝到目录
$HIVE_HOME/lib
下面;
- 启动hive,进行测试。。。
后记
- hive笔记部分简单写写,基本安装部署不难,但是mysql的坑比较多;hadoop部署安装的坑太多了(容易崩溃和发疯),要学会看log日志,这个很重要,程序员进阶之路,就是把别人走过的坑和自己碰到的坑一路填平,踏平坎坷,成大道。。。。。