Cloudera Enterprise 5.15.x | Other versions

How to Enable S3 Cloud Storage in Hue

Cloudera S3 Connector in Cloudera Manager securely connects your CDH cluster to Amazon S3.

  Note:
  • C5.11 adds S3 Guard for list consistency and support for IAM roles in Cloudera Manager.
  • C5.10 connects Hue, Impala, and Navigator securely with the Cloudera S3 Connector Service.
  • C5.9 adds support for Amazon S3 with plain-text credentials using Cloudera Manager safety valves.

Continue reading:

Enable S3 in Hue with the S3 Connector Service

For a secure and fine-grained connection to Amazon S3 (for Hue, Impala, and Navigator), Cloudera recommends its S3 Connector service in Secure Mode with encrypted access keys and Kerberos and Sentry installed.
  Important: Hive is not yet supported in Secure Mode. To connect Hive to S3, use "Unsecure" Mode.
Method Security Required Services
Secure Mode High Kerberos, Sentry Hue, Impala, Navigator
Unsecure Mode Medium   Hue, Impala, Navigator, Hive
  1. Log on to Cloudera Manager.
  2. Select Administration > External Accounts.
  3. Click Add Access Key Credentials or Add IAM Role-based Authentication.
      Important: IAM Role-based Authentication is not fine-grained authentication. Also, to use it with Hue, configure the region in hue_safety_valve.ini–see step step 11.
  4. Add any Name and enter your S3 credentials:
    1. To connect your AWS root user, add the Access Key ID and Secret Access Key for your root account.
    2. To connect an IAM user, add the Access Key ID and Secret Access Key for a read-only IAM account.
  5. If you have an Amazon DynamoDB database, check Enable S3Guard for consistent read operations.
      Warning: Components writing data to S3 are constrained by the inherent Amazon S3 limitation known as "eventual consistency." This can lead to data loss when a Spark or Hive job writes output directly to S3. Cloudera recommends that you use S3 Guard or write to HDFS and distcp to S3.
  6. Click Enable for <cluster name> to give Hue access to S3 and S3-backed tables. Impala must have permissions defined in Sentry.
  7. If using access keys, select Secure or Unsecure mode. Select Unsecure to use Hive.
  8. Click Continue (at Step 1) if your cluster passes validation. You are automatically taken to step 5.
  9. Click Continue (at Step 5) to restart Hive, Impala, Oozie, and Hue.
  10. When finished, click Home to see the S3 Connector.
      Note: A gray status icon Gray Status icon means the S3 Connector service was successfully added.


  11. If using S3 Signature Version 4 regions, include the region endpoint name in fs.s3a.endpoint.
    1. Select the S3 Connector Service.
    2. Select Configuration.
    3. Set Default S3 Endpoint with the region endpoint name.

      Valid endpoint names are those listed in the Amazon S3 section of AWS Regions and Endpoints.

    4. Click Save Changes.
    5. Restart Hue: select Cluster > Hue and Actions > Restart.
  12. If using IAM roles, set the region to us-east-1 (N. Virginia) in hue_safety_valve.ini.
      Note: Configuring hue_safety_valve.ini is a temporary Hue workaround for CDH 5.10.
    1. Select Configuration > Advanced Configuration Snippets.
    2. Filter by Scope > Hue.
    3. Set Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini with the following:
      [aws]
       [[aws_accounts]]
       [[[default]]]
       region=us-east-1
    4. Click Save Changes.
    5. Restart Hue: select Cluster > Hue and Actions > Restart.
      Note: The S3 Connector service is not added when you use IAM roles.

Related topics: How to Configure AWS Credentials and Configuring the Amazon S3 Connector.

Enable S3 in Hue with Safety Valves

This section assumes an AWS account with access keys, but not necessarily a Kerberized cluster.

You can connect to S3 using three safety valves (also known as Advanced Configuration Snippets):
  • Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini
  • Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml
  • Hive Service Advanced Configuration Snippet (Safety Valve) for core-site.xml.


  1. Log on to Cloudera Manager and select Clusters > your cluster.
  2. Select Configuration > Advanced Configuration Snippets.
  3. Filter by Scope > Hue.
  4. Set your S3 credentials in Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini:
      Note: Store your credentials in a script that outputs to stdout. A security_token is optional.
    [aws]
    [[aws_accounts]]
    [[[default]]]
    access_key_id_script=</path/to/access_key_script>
    secret_access_key_script=</path/to/secret_key_script>
    #security_token=<your AWS security token>
    allow_environment_credentials=false
    region=<your region, such as us-east-1> 
    For a proof-of-concept installation, you can add the IDs directly.
    access_key_id=<your_access_key_id>
    secret_access_key=<your_secret_access_key>
  5. Clear the scope filters and search on "core-site.xml".
  6. To enable the S3 Browser, set your S3 credentials in Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml:
    <property>
    <name>fs.s3a.access.key</name>
    <value>AWS access key ID</value>
    </property>
    
    <property>
    <name>fs.s3a.secret.key</name>
    <value>AWS secret key</value>
    </property>
  7. To enable Hive with S3, set your S3 credentials in Hive Service Advanced Configuration Snippet (Safety Valve) for core-site.xml.
  8. Click Save Changes.
  9. If using S3 Signature Version 4 regions, include the region endpoint name in fs.s3a.endpoint.
    1. Select the S3 Connector Service.
    2. Select Configuration.
    3. Set Default S3 Endpoint with the region endpoint name.

      Valid endpoint names are those listed in the Amazon S3 section of AWS Regions and Endpoints.

    4. Click Save Changes.
  10. Restart Hue: select Cluster > Hue and Actions > Restart.
  11. Restart Hive: select Cluster > Hive and Actions > Restart.

Related topics: Amazon Web Services (AWS) Security.

Generate Access Keys in AWS

To integrate Hue with S3, you must have an Amazon Web Services (AWS) account, with access keys for either your root user or a read-only IAM user.

Root Account

  1. Create an AWS account and sign in to the AWS Console.
  2. Create access keys for this AWS root account:
    1. Expand the drop-down menu under your account name and select My Security Credentials.
    2. Click Continue to Security Credentials.
    3. Expand Access Keys (Access Key ID and Secret Access Key).
    4. Click Create New Access Key.
    5. Click Show Access Key or Download Key File. These are your AWS root credentials.

IAM Account

  1. Create two IAM groups (AWS admin and S3 Read-only):
      Important: AWS requires that your first IAM group and associated user has administrator access.
    1. Go to the IAM service.
    2. Click Groups and Create New Group.
    3. Enter a name and click Next Step.
    4. Filter on "admin" and select the AdministratorAccess policy.
    5. Click Next Step and Create Group.
    6. Create a second group with AmazonS3ReadOnlyAccess.
  2. Create two IAM users and assign one to the admin policy and one to the S3 read policy.
    1. Click Users and Add User.
    2. Enter a name, and at a minimum, select Programmatic access.
    3. Click Next: Permissions.
    4. Select the group with administrator permissions.
    5. Click Next: Review and Create User.
    6. Create a second user and assign the group with S3 read-only access.
  3. Create access keys for your read-only IAM user:
    1. Click the name of your read-only IAM user.
    2. Click the Security Credentials tab.
    3. Click Create Access Key.
    4. Click Show Access Key or Download Key File. These are your IAM user credentials.

IAM Permissions Needed for Hue S3 Browser

In AWS, IAM files are used to create policies that control access to resources in a VPC. You can give IAM roles and permissions to your Hue servers to allow the Hue S3 browser to make API requests without the need to use or distribute AWS credentials (accessKey and secretAccessKey). For more information about IAM, see the AWS Identity and Access Management User Guide in the AWS documentation. For instructions on how to create an IAM role, see Creating a Role to Delegate Permissions to an AWS Service in the AWS documentation. For information about granting permission to Amazon S3 resources, see Managing Access Permissions to Your Amazon S3 Resources in the AWS documentation.

Use the AWS Policy Generator to create the IAM file, keeping in mind the following requirements:
  • Only the Hue servers need to have an IAM role applied to them to access S3 with the browser.
  • The Hue S3 browser does not become available until the Amazon S3 connector service is added to the cluster.
  • ListBucket on the Amazon S3 resource is necessary to drill down into that bucket, along with ListAllMyBuckets on all resources.
  • These permissions do not give access to other private buckets in that AWS account, although public buckets are accessible.
  • For the Hue S3 browser, your Hue servers require permissions for the following methods:
    • s3:ListBucket
    • s3:PutObject
    • s3:GetObject
    • s3:DeleteObject
    • s3:PutObjectAcl
    • s3:ListAllMyBuckets
The following example IAM policy shows the format to use for the Hue server permissions. Your Amazon Resource Name (ARN) will be different. For more information on ARNs, see Amazon Resource Names (ARNs) and AWS Service Namespaces in the AWS documentation.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "S3BucketPermissions",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::<S3BucketARN>"
            ],
        },
        {
            "Sid": "S3ObjectPermissions",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:PutObjectAcl"
            ],
            "Resource": [
                "arn:aws:s3:::<S3BucketARN>/*"
            ],
        },
        {
            "Sid": "AllS3",
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets"
            ],
            "Resource": "*"
        }
    ]
}

}

Page generated May 18, 2018.