Configuring High Availability for Hive requires the following components to be fail proof:
2. Zookeeper
3. Hive Metastore Server
4. Hiveserver2
For the sake of simplicity this blog will focus on enabling HA for the Hive Metastore Server and HiveServer2. We recommend that the underlying Hive Metastore underlying RDBMS be configured for High Availability and we have configured multiple Zookeeper instances on the current cluster.
Enabling High Availability for Hive Metastore Server
Select Scope > Hive Metastore Server.
Select Category > Advanced.
Locate the Hive Metastore Delegation Token Store property.
Choose org.apache.hadoop.hive.thrift.DBTokenStore
Click Save Changes.
Click on Select Hosts for Hive Metastore Server.
Click OK and Continue.
Click on Restart Stale Service.
# beeline -u “jdbc:hive2://ip-10-7-176-204.ec2.internal:10000”
Enabling Load Balancing and High Availability for Hiveserver2
Click on Select Hosts for HiveServer2.
Click OK and Continue.
Choose the newly added instances and Choose Start.
Add a new property as below:
Name: hive.server2.support.dynamic.service.discovery
Value: true
The clients connecting to HiveServer2 now go through Zookeeper. An example, JDBC connect string is as follows. Notice that the JDBC now points to a list of nodes that have Zookeeper on them.
beeline -u “jdbc:hive2://ip-10-7-176-204.ec2.internal:2181,ip-10-229-16-131.ec2.internal:2181,ip-10-179-159-209.ec2.internal:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2”
Issue the following command on the HiveServer2 nodes.
Issue the following command on the HiveServer2 nodes.
Connection to Beeline using command below should work normally.
Connection to Beeline using command below should still work normally.
Connection to Beeline using command below should fail.
Connection to Beeline using command below should work normally again.
Discover more about our expertise in Big Data and Hadoop.
9 Comments. Leave new
Have you tried this with MapR
I have not tried this with MapR but it should work the same way.
I liked your post and the wayt it is organized . Thanks!
my question
beeline works fine when one of the 2 hiveserver2 is down.
How do you get this working in HUE. Hue is always pointing to the default hiverserver2 in case if that service is down it doesn’t switch to the second hiveserver2 and fails the hive queries for connection error
Doesn’t this still leave the actual datastore as a single point of failure? Whether it be Oracle, MySQL, Derby, etc ? Have you tried to use sqlproxy on each metastore server pointing to a Galera based cluster?
The idea for this blog to present the HA solution for HiveServer2 only. For the underlying metastore we either use a simple Master Slave Replication or Galera/PXDB type solution.
How to configure the new hiveserver connection in Hue. It looks like after enabling HA, Hive is not connecting through Hue.
This article is awesome.. saved lots of my efforts and i was able to fix the issue
Can we use nginx to load balance metstore.
Really good article , all the content is so clear that even a person with basic knowledge can understand