Configuring the Sentry Service
This topic describes how to enable the Sentry service for Hive and Impala, and configuring the Hive metastore to communicate with the Sentry service.
- Enabling the Sentry Service Using Cloudera Manager
- Enabling the Sentry Service Using the Command Line
- HiveServer2 Restricted Properties
- Configuring Pig and HCatalog for the Sentry Service
- Securing the Hive Metastore
- Using User-Defined Functions with HiveServer2
Enabling the Sentry Service Using Cloudera Manager
Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)
- Before Enabling the Sentry Service
- Enabling the Sentry Service for Hive
- Enabling the Sentry Service for Impala
- Enabling the Sentry Service for Solr
- Enabling the Sentry Service for Hue
- Add the Hive, Impala, Spark, and Hue Groups to Sentry's Admin Groups
Before Enabling the Sentry Service
- Verify the prerequisites for the Sentry service: Before You Install Sentry
- Setting Hive Warehouse Directory Permissions
Important: Enabling HDFS/Sentry synchronization obviates the need to explicitly set permissions on the Hive warehouse directory. After synchronization is enabled, all Hive databases and tables are owned by hive:hive and Sentry permissions on tables are automatically translated to HDFS ACLs on the underlying files.
- Using the default Hive warehouse directory - Permissions on the warehouse directory must be set as follows (see following Note for caveats):
- 771 on the directory itself (by default, /user/hive/warehouse)
- 771 on all subdirectories (for example, /user/hive/warehouse/mysubdir)
- All files and subdirectories should be owned by hive:hive
For example:If you have enabled Kerberos on your cluster, you must kinit as the hdfs user before you set permissions. For example:$ sudo -u hdfs hdfs dfs -chmod -R 771 /user/hive/warehouse $ sudo -u hdfs hdfs dfs -chown -R hive:hive /user/hive/warehouse
sudo -u hdfs kinit -kt hdfs.keytab hdfs sudo -u hdfs hdfs dfs -chmod -R 771 /user/hive/warehouse $ sudo -u hdfs hdfs dfs -chown -R hive:hive /user/hive/warehouse
- Using a non-default Hive warehouse: To use a different directory for the Hive warehouse, specify the directory path in the hive.metastore.warehouse.dir property and set the permissions on the new directory, as shown in this example:
$ hdfs dfs -chown hive:hive /data $ hdfs dfs -chmod 771 /data
Note: Changing the default Hive warehouse to a new location does not move existing tables. Any tables created prior to changing the path remain in the default location, but new tables will be created in the new path.For Sentry/HDFS sync to work as expected, add the new warehouse URL to the list of Sentry Synchronization Path Prefixes.
Note:- Set hive.warehouse.subdir.inherit.perms to true in hive-site.xml to have permissions set on the warehouse directory applied to all subdirectories.
- If a user has access to any object in the warehouse, that user will be able to execute use default. This ensures that use default commands issued by legacy applications work when Sentry is enabled.
- Modifying permissions on the Hive warehouse directory (as detailed above) override the recommendations in the Hive section of the CDH 5 Installation Guide.
- Using the default Hive warehouse directory - Permissions on the warehouse directory must be set as follows (see following Note for caveats):
- Disable impersonation for HiveServer2 in the Cloudera Manager Admin Console. HiveServer2 impersonation
lets users execute queries and access HDFS files as the connected user rather than as the super user. Access policies are applied at the file level using the HDFS permissions specified in ACLs
(access control lists). Enabling HiveServer2 impersonation bypasses Sentry from the end-to-end authorization process. Specifically, although Sentry enforces access control policies on tables and
views within the Hive warehouse, it does not control access to the HDFS files that underlie the tables. This means that users without Sentry permissions to tables in the warehouse may nonetheless be
able to bypass Sentry authorization checks and execute jobs and queries against tables in the warehouse as long as they have permissions on the HDFS files supporting the table. Use the
following instructions to disable impersonation:
- Go to the Hive service.
- Click the Configuration tab.
- Select .
- Select .
- Uncheck the HiveServer2 Enable Impersonation checkbox.
- Click Save Changes to commit the changes.
- If you are using MapReduce, enable the Hive user to submit MapReduce jobs.
- Open the Cloudera Manager Admin Console and go to the MapReduce service.
- Click the Configuration tab.
- Select .
- Select .
- Set the Minimum User ID for Job Submission property to zero (the default is 1000).
- Click Save Changes to commit the changes.
- Repeat steps 1-6 for every TaskTracker role group for the MapReduce service that is associated with Hive.
- Restart the MapReduce service.
- If you are using YARN, enable the Hive user to submit YARN jobs.
- Open the Cloudera Manager Admin Console and go to the YARN service.
- Click the Configuration tab.
- Select .
- Select .
- Ensure the Allowed System Users property includes the hive user. If not, add hive.
- Click Save Changes to commit the changes.
- Repeat steps 1-6 for every NodeManager role group for the YARN service that is associated with Hive.
- Restart the YARN service.
-
Block the Hive CLI user from accessing the Hive metastore:
- In the Cloudera Manager Admin Console, select the Hive service.
- On the Hive service page, click the Configuration tab.
- In the search field, search for Hive Metastore Access Control and Proxy User Groups Override to locate the hadoop.proxyuser.hive.groups setting.
- Click the plus sign three times to add the following groups:
- hive
- hue
- sentry
- Click Save Changes.
Setting this parameter blocks access to the Hive metastore for the user running the Hive CLI if they are not part of the hive, hue, or sentry groups. The Hive CLI can still run, but after setting this parameter as described here, the hive user can impersonate only members of the hive, sentry, or hue groups. If you are using Sqoop, the Sqoop user must also have access to the Hive metastore.
Enabling the Sentry Service for Hive
- Go to the Hive service.
- Click the Configuration tab.
- Select .
- Select .
- Locate the Sentry Service property and select Sentry.
- Locate the Enable Stored Notifications in Database property and select it.
- Click Save Changes to commit the changes.
- Restart the Hive service.
Enabling the Sentry Service for Impala
- Enable the Sentry service for Hive (as instructed above).
- Go to the Impala service.
- Click the Configuration tab.
- Select .
- Select .
- Locate the Sentry Service property and select Sentry.
- Click Save Changes to commit the changes.
- Restart Impala.
Enabling the Sentry Service for Solr
Enable the Sentry service as follows:- Go to the Solr service.
- Click the Configuration tab.
- Select .
- Select .
- Locate the Sentry Service property and select Sentry.
- Click Save Changes to commit the changes.
- Restart Solr.
After enabling Sentry for Solr, you may want to configure authorization as described in Configuring Sentry Authorization for Cloudera Search.
Enabling the Sentry Service for Hue
Hue uses a Security app to make it easier to interact with Sentry. When you set up Hue to manage Sentry permissions, make sure that users and groups are set up correctly. Every Hue user connecting to Sentry must have an equivalent OS-level user account on all hosts so that Sentry can authenticate Hue users. Each OS-level user should also be part of an OS-level group with the same name as the corresponding user's group in Hue.
For more information on using the Security app, see the related blog post.
Enable the Sentry service as follows:- Enable the Sentry service for Hive and Impala (as instructed above).
- Go to the Hue service.
- Click the Configuration tab.
- Select .
- Select .
- Locate the Sentry Service property and select Sentry.
- Click Save Changes to commit the changes.
- Restart Hue.
Add the Hive, Impala, Spark, and Hue Groups to Sentry's Admin Groups
Add the user groups that need administrative privileges on the Sentry Server.
- Go to the Sentry service.
- Click the Configuration tab.
- Select .
- Select .
- Locate the Admin Groups property and add the hive, impala, spark, and hue groups to the list. If an end user is in one of these admin groups, that user has administrative privileges on the Sentry Server.
- Click Save Changes to commit the changes.
Enabling the Sentry Service Using the Command Line
- Follow these command-line instructions on systems that do not use Cloudera Manager.
- This information applies specifically to CDH 5.15.0. See Cloudera Documentation for information specific to other releases.
Before Enabling the Sentry Service
- Setting Hive Warehouse Directory Permissions
Important: Enabling HDFS/Sentry synchronization obviates the need to explicitly set permissions on the Hive warehouse directory. After synchronization is enabled, all Hive databases and tables are owned by hive:hive and Sentry permissions on tables are automatically translated to HDFS ACLs on the underlying files.
- Using the default Hive warehouse directory - Permissions on the warehouse directory must be set as follows (see following Note for caveats):
- 771 on the directory itself (by default, /user/hive/warehouse)
- 771 on all subdirectories (for example, /user/hive/warehouse/mysubdir)
- All files and subdirectories should be owned by hive:hive
For example:If you have enabled Kerberos on your cluster, you must kinit as the hdfs user before you set permissions. For example:$ sudo -u hdfs hdfs dfs -chmod -R 771 /user/hive/warehouse $ sudo -u hdfs hdfs dfs -chown -R hive:hive /user/hive/warehouse
sudo -u hdfs kinit -kt hdfs.keytab hdfs sudo -u hdfs hdfs dfs -chmod -R 771 /user/hive/warehouse $ sudo -u hdfs hdfs dfs -chown -R hive:hive /user/hive/warehouse
- Using a non-default Hive warehouse: To use a different directory for the Hive warehouse, specify the directory path in the hive.metastore.warehouse.dir property and set the permissions on the new directory, as shown in this example:
$ hdfs dfs -chown hive:hive /data $ hdfs dfs -chmod 771 /data
Note: Changing the default Hive warehouse to a new location does not move existing tables. Any tables created prior to changing the path remain in the default location, but new tables will be created in the new path.For Sentry/HDFS sync to work as expected, add the new warehouse URL to the list of Sentry Synchronization Path Prefixes.
Note:- Set hive.warehouse.subdir.inherit.perms to true in hive-site.xml to have permissions set on the warehouse directory applied to all subdirectories.
- If a user has access to any object in the warehouse, that user will be able to execute use default. This ensures that use default commands issued by legacy applications work when Sentry is enabled.
- Modifying permissions on the Hive warehouse directory (as detailed above) override the recommendations in the Hive section of the CDH 5 Installation Guide.
- Using the default Hive warehouse directory - Permissions on the warehouse directory must be set as follows (see following Note for caveats):
- HiveServer2 impersonation must be turned off. HiveServer2 impersonation lets users execute queries and access HDFS files as the connected user rather than as the super user. Access policies are applied at the file level using the HDFS permissions specified in ACLs (access control lists). Enabling HiveServer2 impersonation bypasses Sentry from the end-to-end authorization process. Specifically, although Sentry enforces access control policies on tables and views within the Hive warehouse, it does not control access to the HDFS files that underlie the tables. This means that users without Sentry permissions to tables in the warehouse may nonetheless be able to bypass Sentry authorization checks and execute jobs and queries against tables in the warehouse as long as they have permissions on the HDFS files supporting the table.
- If you are using MapReduce, you must enable the Hive user to submit MapReduce jobs. You can ensure that this is true by setting the minimum user ID for job submission to 0. Edit the
taskcontroller.cfg file and set min.user.id=0.
If you are using YARN, you must enable the Hive user to submit YARN jobs, add the user hive to the allowed.system.users configuration property. Edit the container-executor.cfg file and add hive to the allowed.system.users property. For example,
allowed.system.users = nobody,impala,hive,hbase
Important: You must restart the cluster and HiveServer2 after changing these values. -
Block the Hive CLI user from accessing the Hive metastore by setting the following property in the cluster's core-site.xml file:
<property> <name>hadoop.proxyuser.hive.groups</name> <value>hive,hue,sentry</value> <description>Sets groups from which the hive user can impersonate other users.</description> </property>
Setting this parameter blocks access to the Hive metastore for the user running the Hive CLI if they are not part of the hive, hue, or sentry groups. The Hive CLI can still run, but after setting this parameter as described here, the hive user can impersonate only members of the hive, hue, or sentry groups. If you are using Sqoop, the Sqoop user must also have access to the Hive metastore.
- Add the hive, impala and hue groups to Sentry's sentry.service.admin.group in the sentry-site.xml file. If an end user is in one of these admin groups, that user has administrative privileges on
the Sentry Server.
<property> <name>sentry.service.admin.group</name> <value>hive,impala,hue</value> </property>
Configuring the Sentry Server
<property> <name>sentry.service.server.rpc-address</name> <value>nightly54-1.gce.cloudera.com</value> </property> <property> <name>sentry.service.server.rpc-port</name> <value>8038</value> </property> <property> <name>sentry.service.admin.group</name> <value>hive,impala,hue</value> </property> <property> <name>sentry.service.allow.connect</name> <value>hive,impala,hue,hdfs</value> </property> <property> <name>sentry.store.group.mapping</name> <value>org.apache.sentry.provider.common.HadoopGroupMappingService</value> </property> <property> <name>sentry.service.server.principal</name> <value>sentry/_HOST@GCE.CLOUDERA.COM</value> </property> <property> <name>sentry.service.security.mode</name> <value>kerberos</value> </property> <property> <name>sentry.service.server.keytab</name> <value>sentry.keytab</value> </property> <property> <name>sentry.store.jdbc.url</name> <value>jdbc:<JDBC connection URL for backend database></value> </property> <property> <name>sentry.store.jdbc.driver</name> <value><JDBC Driver class for backend database></value> </property> <property> <name>sentry.store.jdbc.user</name> <value><User ID for backend database user></value> </property> <property> <name>sentry.store.jdbc.password</name> <value><Password for backend database user></value> </property> <property> <name>sentry.service.processor.factories</name> <value>org.apache.sentry.provider.db.service.thrift.SentryPolicyStoreProcessorFactory, org.apache.sentry.hdfs.SentryHDFSServiceProcessorFactory</value> </property> <property> <name>sentry.policy.store.plugins</name> <value>org.apache.sentry.hdfs.SentryPlugin</value> </property> <property> <name>sentry.hdfs.integration.path.prefixes</name> <value>/user/hive/warehouse</value> </property>
Configuring HiveServer2 for the Sentry Service
<property> <name>hive.sentry.server</name> <value>server1</value> </property> <property> <name>sentry.service.server.principal</name> <value>sentry/_HOST@EXAMPLE.COM</value> </property> <property> <name>sentry.service.security.mode</name> <value>kerberos</value> </property> <property> <name>sentry.hive.provider.backend</name> <value>org.apache.sentry.provider.db.SimpleDBProviderBackend</value> </property> <property> <name>sentry.service.client.server.rpc-address</name> <value>example.cloudera.com</value> </property> <property> <name>sentry.service.client.server.rpc-port</name> <value>8038</value> </property> <property> <name>hive.sentry.provider</name> <value>org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider</value> </property> <property> <name>hive.sentry.failure.hooks</name> <value>com.cloudera.navigator.audit.hive.HiveSentryOnFailureHook</value> </property>
<property> <name>hive.security.authorization.task.factory</name> <value>org.apache.sentry.binding.hive.SentryHiveAuthorizationTaskFactoryImpl</value> </property> <property> <name>hive.server2.session.hook</name> <value>org.apache.sentry.binding.hive.HiveAuthzBindingSessionHook</value> </property> <property> <name>hive.sentry.conf.url</name> <value>file:///{{PATH/TO/DIR}}/sentry-site.xml</value> </property> <property> <name>hive.security.authorization.task.factory</name> <value>org.apache.sentry.binding.hive.SentryHiveAuthorizationTaskFactoryImpl</value> </property>Enabling Sentry on Hive service places several HiveServer2 properties on a restricted list properties that cannot be modified at runtime by clients. See HiveServer2 Restricted Properties.
Configuring the Hive Metastore for the Sentry Service
<property> <name>hive.metastore.filter.hook</name> <value>org.apache.sentry.binding.metastore.SentryMetaStoreFilterHook</value> </property> <property> <name>hive.metastore.pre.event.listeners</name> <value>org.apache.sentry.binding.metastore.MetastoreAuthzBinding</value> <description>list of comma separated listeners for metastore events.</description> </property> <property> <name>hive.metastore.event.listeners</name> <value>org.apache.sentry.binding.metastore.SentryMetastorePostEventListener</value> <description>list of comma separated listeners for metastore, post events.</description> </property>
Configuring Impala as a Client for the Sentry Service
<property> <name>sentry.service.client.server.rpc-port</name> <value>8038</value> </property> <property> <name>sentry.service.client.server.rpc-address</name> <value>hostname</value> </property> <property> <name>sentry.service.client.server.rpc-connection-timeout</name> <value>200000</value> </property> <property> <name>sentry.service.security.mode</name> <value>kerberos</value> </property>You must also add the following configuration properties to Impala's /etc/default/impala file. For more information , see Configuring Impala Startup Options through the Command Line.
- On the catalogd and the impalad.
--sentry_config=<absolute path to sentry service configuration file>
- On the impalad.
--server_name=<server name>
If the --authorization_policy_file flag is set, Impala will use the policy file-based approach. Otherwise, the database-backed approach will be used to implement authorization.
Enabling Solr as a Client for the Sentry Service Using the Command Line
You can enable Sentry using Cloudera Manager or by manually modifying files. For more information on enabling Sentry using Cloudera Manager, see Configuring Sentry Policy File Authorization Using Cloudera Manager and Enabling Sentry Policy File Authorization for Solr.
- In a Cloudera Manager deployment, required properties are added automatically when you click Enable Sentry Authorization in the Solr configuration page in Cloudera Manager.
- If you are using configs, you must configure the proper config=myConfig permissions as described in Using Roles and Privileges with Sentry.
- In a deployment not managed by Cloudera Manager, you must make these changes yourself. The variable SOLR_AUTHORIZATION_SENTRY_SITE specifies the path to
sentry-site.xml. The variable SOLR_AUTHORIZATION_SUPERUSER specifies the first part of SOLR_KERBEROS_PRINCIPAL. This is solr for the majority of users, as solr is the default. Settings are of the form:
SOLR_AUTHORIZATION_SENTRY_SITE=/location/to/sentry-site.xml SOLR_AUTHORIZATION_SUPERUSER=solr
To enable Sentry collection-level authorization checking on a new collection, the instancedir for the collection must use a modified version of solrconfig.xml with Sentry integration. Each collection has a separate solrconfig.xml file, meaning you can define different behavior for each collection. The command solrctl instancedir --generate generates two versions of solrconfig.xml: the standard solrconfig.xml without sentry integration and the sentry-integrated version called solrconfig.xml.secure. To use the sentry-integrated version, replace solrconfig.xml with solrconfig.xml.secure before creating the instancedir.
You can enable Sentry on an existing collection. The process varies depending on whether you are using a config or instancedir.
Enabling Sentry on Collections using configs
If you have a collection that is using a non-secured config, you can enable Sentry security on that collection by modifying the collection to use a secure config. The config in use must not be immutable, otherwise it cannot be changed. To update an existing non-immutable config:
- Delete the existing config using the solrctl config --delete command. For example:
solrctl config --delete myManaged
- Create a new non-immutable config using the solrctl config --create command. Use a sentry-enabled template such as managedTemplateSecure. The new config must have the same name as the config being replaced. For example:
solrctl config --create myManaged managedTemplateSecure -p immutable=false
- Reload the collection using to solrctl collection --reload command.
solrctl collection --reload myCollection
For a list of all available config templates, see Config Templates.
Enabling Sentry on Collections using instancedirs
If you have a collection that is using a non-secured instancedir configuration, you can enable Sentry security on that collection by modifying the settings that are stored in instancedir. For example, you might have an existing collection named foo and a standard solrconfig.xml. By default, collections are stored in instancedirs that use the collection's name, which is foo in this case.
If your collection uses an unmodified solrconfig.xml file, you can enable Sentry by replacing the existing solrconfig.xml file. If your collection uses a solrconfig.xml that contains modifications you want to preserve, you can attempt to use a difftool to find an integrate changes in to the secure template.
To enable Sentry on an existing collection without preserving customizations
# generate a fresh instancedir solrctl instancedir --generate foosecure # download the existing instancedir from ZK into subdirectory foo solrctl instancedir --get foo foo # replace the existing solrconfig.xml with the sentry-enabled one cp foosecure/conf/solrconfig.xml.secure foo/conf/solrconfig.xml # update the instancedir in ZK solrctl instancedir --update foo foo # reload the collection solrctl collection --reload foo
To enable Sentry on an existing collection and preserve customizations
Generate a new instancedir, compare the differences between the default solrconfig.xml and solrconfig.xml.secure files, and then add the elements that are unique to solrconfig.xml.secure to the file that your environment is using.
- Generate a fresh instancedir:
solrctl instancedir --generate foo
- Compare the solrconfig.xml and solrconfig.xml.secure:
diff foo/conf/solrconfig.xml foo/conf/solrconfig.xml.secure
- Add the elements that are unique to solrconfig.xml.secure to your existing solrconfig.xml file. You might complete this
process by manually editing your existing solrconfig.xml file or by using a merge tool.
Note: If you have modified or specified additional request handlers, consider that Sentry:
- Supports protecting additional query request handlers by adding a search component, which should be shown in the diff.
- Supports protecting additional update request handlers with Sentry by adding an updateRequestProcessorChain, which should be shown in the diff.
- Does not support protecting modified or specified additional "special" request handlers like analysis handlers or admin handlers.
- Reload the collection:
solrctl collection --reload foo
After enabling Sentry for Solr, you may want to configure authorization as described in Configuring Sentry Authorization for Cloudera Search.
HiveServer2 Restricted Properties
hive.enable.spark.execution.engine hive.semantic.analyzer.hook hive.exec.pre.hooks hive.exec.scratchdir hive.exec.local.scratchdir hive.metastore.uris, javax.jdo.option.ConnectionURL hadoop.bin.path hive.session.id hive.aux.jars.path hive.stats.dbconnectionstring hive.scratch.dir.permission hive.security.command.whitelist hive.security.authorization.task.factory hive.entity.capture.transform hive.access.conf.url hive.sentry.conf.url hive.access.subject.name hive.sentry.subject.name hive.sentry.active.role.set
Configuring Pig and HCatalog for the Sentry Service
Once you have the Sentry service up and running, and Hive has been configured to use the Sentry service, there are some configuration changes you must make to your cluster to allow Pig, MapReduce (using HCatLoader, HCatStorer) and WebHCat queries to access Sentry-secured data stored in Hive.
- Use HDFS ACLs to define permissions on a specific directory or file of HDFS. This directory/file is generally mapped to a database, table, partition, or a data file.
- Users running these jobs should have the required permissions in Sentry to add new metadata or read metadata from the Hive Metastore Server. For instructions on how to set up the required permissions, see Hive SQL Syntax for Use with Sentry. You can use HiveServer2's command line interface, Beeline to update the Sentry database with the user privileges.
- A user who is using Pig HCatLoader will require read permissions on a specific table or partition. In such a case, you can GRANT read access to the user in Sentry and set the ACL to read and run, on the file being accessed.
- A user who is using Pig HCatStorer will require ALL permissions on a specific table. In this case, you GRANT ALL access to the user in Sentry and set the ACL to write and run on the table being used.
Securing the Hive Metastore
<property> <name>sentry.hive.testing.mode</name> <value>true</value> </property>Impala does not require this flag to be set.
You can turn on Hive metastore security using the instructions in Cloudera Security. To secure the Hive metastore; see Hive Metastore Server Security Configuration.
Using User-Defined Functions with HiveServer2
<< Installing and Upgrading the Sentry Service | ©2016 Cloudera, Inc. All rights reserved | Migrating from Sentry Policy Files to the Sentry Service >> |
Terms and Conditions Privacy Policy |