Cloudera Enterprise 5.15.x | Other versions

Configuring HiveServer2 High Availability in CDH

To enable high availability for multiple HiveServer2 hosts, configure a load balancer to manage them. To increase stability and security, configure the load balancer on a proxy server.

  Warning:
  • HiveServer2 high availability does not automatically fail and retry long-running Hive queries. If any of the HiveServer2 instances fail, all queries running on that instance fail and are not retried. Instead, the client application must re-submit the queries.
  • After you enable HiveServer2 high availability, existing Oozie jobs must be changed to reflect the HiveServer2 address.
  • On Kerberos-enabled clusters, you must use the load balancer for all connections. After you enable HiveServer2 high availability, direct connections to HiveServer2 instances fail.

Enabling HiveServer2 High Availability Using Cloudera Manager

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

  1. Go to the Hive service.
  2. Click the Configuration tab.
  3. Select Scope > HiveServer2.
  4. Select Category > Main.
  5. Locate the HiveServer2 Load Balancer property or search for it by typing its name in the Search box.
  6. Enter values for hostname:port number.
      Note: When you set the HiveServer2 Load Balancer property, Cloudera Manager regenerates the keytabs for HiveServer2 roles. The principal in these keytabs contains the load balancer hostname. If there is a Hue service that depends on this Hive service, it also uses the load balancer to communicate with Hive.
  7. Click Save Changes to commit the changes.
  8. Restart the Hive service.

Configuring HiveServer2 to Load Balance Behind a Proxy

For clusters with multiple users and availability requirements, you can configure a proxy server to relay requests to and from each HiveServer2 host. Applications connect to a single well-known host and port, and connection requests to the proxy succeed even when hosts running HiveServer2 become unavailable.

  1. Download load-balancing proxy software of your choice on a single host.
  2. Configure the software, typically by editing a configuration file:
    1. Set the port for the load balancer to listen on and relay HiveServer2 requests back and forth.
    2. Set the port and hostname for each HiveServer2 host—that is, the hosts from which the load balancer chooses when relaying each query.
  3. Run the load-balancing proxy server and point it at the configuration file.
  4. In Cloudera Manager, configure HiveServer2 Load Balancer for the proxy server. See Enabling HiveServer2 High Availability Using Cloudera Manager.
  5. Point all scripts, jobs, or application configurations to the new proxy server instead of any specific HiveServer2 instance.
Page generated May 18, 2018.