Configuring Alluxio with Amazon S3

Slack Docker Pulls

This guide describes how to configure Alluxio with Amazon S3 as the under storage system. Alluxio natively provides access to S3 with the aws-sdk-java-s3 library through the s3a:// scheme.

Initial Setup

In preparation for using S3 with Alluxio, create a bucket (or use an existing bucket). You should also note the directory you want to use in that bucket, either by creating a new directory in the bucket, or using an existing one. For the purposes of this guide, the S3 bucket name is called S3_BUCKET, and the directory in that bucket is called S3_DIRECTORY.

Mounting S3

Alluxio unifies access to different storage systems through the unified namespace feature. An S3 location can be either mounted at the root of the Alluxio namespace or at a nested directory.

Root Mount

When installing Alluxio, the under storage address and credentials should be specified in conf/alluxio-site.properties.

alluxio.underfs.address=s3a://<S3_BUCKET>/<S3_DIRECTORY>
aws.accessKeyId=<AWS_ACCESS_KEY_ID>
aws.secretKey=<AWS_SECRET_KEY_ID>

See Amazon’s documentation for more details.

Nested Mount

An S3 location can be mounted at a nested directory in the Alluxio namespace to have unified access to multiple under storage systems. Alluxio’s Command Line Interface can be used for this purpose.

$ ./bin/alluxio fs mount --option aws.accessKeyId=<AWS_ACCESS_KEY_ID> --option aws.secretKey=<AWS_SECRET_KEY_ID>\
  /mnt/s3 s3a://<S3_BUCKET>/<S3_DIRECTORY>

Note: credentials can be specified in different ways, from highest to lowest priority:

  • aws.accessKeyId and aws.secretKey specified as mount options
  • aws.accessKeyId and aws.secretKey specified as Java system properties
  • aws.accessKeyId and aws.secretKey in Alluxio site properties
  • Environment variables AWS_ACCESS_KEY_ID or AWS_ACCESS_KEY (either is acceptable) and AWS_SECRET_ACCESS_KEY or AWS_SECRET_KEY (either is acceptable) on the Alluxio servers
  • Profile file containing credentials at ~/.aws/credentials
  • AWS Instance profile credentials, if you are using an EC2 instance

When using an AWS Instance profile as the credential provider:

  • Create an IAM Role with access to the mounted bucket
  • Create an Instance profile as a container for the defined IAM Role
  • Launch an EC2 instance using the created profile

See Amazon’s documentation for more details.

Running Alluxio Locally with S3

Tests can be run using the Alluxio Command Line Interface.

$ ./bin/alluxio runTests

If testing a nested mount point, run:

$ ./bin/alluxio runTests --directory /mnt/s3

After the test succeeds, you can visit your S3 directory S3_BUCKET/S3_DIRECTORY to verify the files and directories created by Alluxio exist. For this test, you should see files named like:

S3_BUCKET/S3_DIRECTORY/default_tests_files/BASIC_CACHE_CACHE_THROUGH

Advanced Configurations

Additional properties can be specified by modifying conf/alluxio-site.properties.

Enabling Server Side Encryption

You may encrypt your data stored in S3. The encryption is only valid for data at rest in S3 and will be transferred in decrypted form when read by clients.

Enable this feature by configuring the property:

alluxio.underfs.s3a.server.side.encryption.enabled=true

If the server side encryption is enabled, the S3 objects written by Alluxio will have Server Side Encryption as AES-256 in object properties. This flag would not affect the encryption property of the S3 root directory or bucket mounted to Alluxio.

Accessing S3 Through a Proxy

To communicate with S3 through a proxy, configure the properties:

alluxio.underfs.s3.proxy.host=<PROXY_HOST>
alluxio.underfs.s3.proxy.port=<PROXY_PORT>

Here, <PROXY_HOST> and <PROXY_PORT> should be replaced the host and port for your proxy.

Using a Specific Amazon S3 Endpoint

To use a specific Amazon S3 endpoint, configure the property:

alluxio.underfs.s3.endpoint=<S3_ENDPOINT>

S3 Access Control

If Alluxio security is enabled, Alluxio enforces the access control inherited from underlying object storage.

The S3 credentials specified in Alluxio config represents a S3 user. S3 service backend checks the user permission to the bucket and the object for access control. If the given S3 user does not have the right access permission to the specified bucket, a permission denied error will be thrown. When Alluxio security is enabled, Alluxio loads the bucket ACL to Alluxio permission on the first time when the metadata is loaded to Alluxio namespace.

Mapping from S3 User to Alluxio File Owner

By default, Alluxio tries to extract the S3 user display name from the S3 credential.

Optionally, alluxio.underfs.s3.owner.id.to.username.mapping can be used to specify a preset S3 canonical id to Alluxio username static mapping, in the format “id1=user1;id2=user2”. The AWS S3 canonical ID can be found at the console address. Please expand the “Account Identifiers” tab and refer to “Canonical User ID”.

Mapping from S3 ACL to Alluxio Permission

Alluxio checks the S3 bucket READ/WRITE ACL to determine the owner’s permission mode to a Alluxio file. For example, if the S3 user has read-only access to the underlying bucket, the mounted directory and files would have 0500 mode. If the S3 user has full access to the underlying bucket, the mounted directory and files would have 0700 mode.

Mount Point Sharing

If you want to share the S3 mount point with other users in Alluxio namespace, you can enable alluxio.underfs.object.store.mount.shared.publicly.

Permission Change

In addition, chown/chgrp/chmod to Alluxio directories and files do NOT propagate to the underlying S3 buckets nor objects.

Troubleshooting

If issues are encountered running against your S3 backend, enable additional logging to track HTTP traffic. Modify conf/log4j.properties to add the following properties:

log4j.logger.com.amazonaws=WARN
log4j.logger.com.amazonaws.request=DEBUG
log4j.logger.org.apache.http.wire=DEBUG

See Amazon’s documentation for more details.