Google Cloud Storage
- Basic Setup
- Running Alluxio Locally with GCS
- Advanced Setup
- GCS Access Control
- GCS with Google Cloud API (GCS version 2)
This guide describes how to configure Alluxio with Google Cloud Storage (GCS) as the under storage system.
In preparation for using GCS with Alluxio, create a bucket (or use an existing bucket). You
should also note the directory you want to use in that bucket, either by creating a new directory in
the bucket, or using an existing one. For the purposes of this guide, the GCS bucket name is called
GCS_BUCKET, and the directory in that bucket is called
For more information on GCS, please read its documentation.
A GCS bucket can be mounted to the Alluxio either at the root of the namespace, or at a nested directory.
Root Mount Point
Configure Alluxio to use under storage systems by modifying
conf/alluxio-site.properties. If it does not exist, create the configuration file from the
$ cp conf/alluxio-site.properties.template conf/alluxio-site.properties
Configure Alluxio to use GCS as its root under storage system. The first modification is to
specify an existing GCS bucket and directory as the under storage system by modifying
conf/alluxio-site.properties to include:
The google credentials must also be specified for the root mount point. In
<GCS_SECRET_ACCESS_KEY> with actual
GCS interoperable storage access keys,
or other environment variables that contain your credentials.
Note: GCS interoperability is disabled by default. Please click on the Interoperability tab
in GCS setting and enable this feature.
Create a new key to get the Access Key and Secret pair.
After these changes, Alluxio should be configured to work with GCS as its under storage system, and you can Run Alluxio Locally with GCS.
Nested Mount Point
An GCS location can be mounted at a nested directory in the Alluxio namespace to have unified access to multiple under storage systems. Alluxio’s Command Line Interface can be used for this purpose.
conf/alluxio-site.properties, specify the master host:
Then, mount GCS:
$ ./bin/alluxio fs mount \ --option fs.gcs.accessKeyId=<GCS_ACCESS_KEY_ID> \ --option fs.gcs.secretAccessKey=<GCS_SECRET_ACCESS_KEY> \ /gcs gs://GCS_BUCKET/GCS_DIRECTORY
Running Alluxio Locally with GCS
Start up Alluxio locally to see that everything works.
$ ./bin/alluxio format $ ./bin/alluxio-start.sh local SudoMount
This should start an Alluxio master and an Alluxio worker. You can see the master UI at http://localhost:19999.
Run a simple example program:
$ ./bin/alluxio runTests
Visit your GCS directory
GCS_BUCKET/GCS_DIRECTORY to verify the files
and directories created by Alluxio exist. For this test, you should see files named like:
To stop Alluxio, you can run:
$ ./bin/alluxio-stop.sh local
Customize the Directory Suffix
Directories are represented in GCS as zero-byte objects named with a specified suffix. The directory suffix can be updated with the configuration parameter alluxio.underfs.gcs.directory.suffix.
GCS Access Control
If Alluxio security is enabled, Alluxio enforces the access control inherited from underlying object storage.
The GCS credentials specified in Alluxio config represents a GCS user. GCS service backend checks the user permission to the bucket and the object for access control. If the given GCS user does not have the right access permission to the specified bucket, a permission denied error will be thrown. When Alluxio security is enabled, Alluxio loads the bucket ACL to Alluxio permission on the first time when the metadata is loaded to Alluxio namespace.
Mapping from GCS user to Alluxio file owner
By default, Alluxio tries to extract the GCS user id from the credentials. Optionally,
alluxio.underfs.gcs.owner.id.to.username.mapping can be used to specify a preset gcs owner id to
Alluxio username static mapping in the format
id1=user1;id2=user2. The Google Cloud Storage IDs
can be found at the console address. Please use
the “Owners” one.
Mapping from GCS ACL to Alluxio permission
Alluxio checks the GCS bucket READ/WRITE ACL to determine the owner’s permission mode to a Alluxio
file. For example, if the GCS user has read-only access to the underlying bucket, the mounted
directory and files would have
0500 mode. If the GCS user has full access to the underlying bucket,
the mounted directory and files would have
Mount point sharing
If you want to share the GCS mount point with other users in Alluxio namespace, you can enable
Command such as
chmod to Alluxio directories and files do NOT propagate to the underlying
GCS buckets nor objects.
GCS with Google Cloud API (GCS version 2)
The default GCS UFS module is implemented based on jets3t library which is design for AWS S3. Thus it only accepts Google cloud storage interoperability access/secret keypair which allows full access to all Google cloud storages inside a Google cloud project. No permission or access control can be placed on the interoperability keys. The conjuction of Google interoperability API and jets3t library also impact the performance of default GCS UFS module.
Google cloud API on the other hand accepts Google application credentials. Based on the application credentials, Google cloud can determine what permissions an authenticated client has for its target Google cloud storage bucket. Besides, GCS with Google cloud API has much better performance than the default one in metadata and read/write operations.
For mounting GCS bucket as root mount, change the
conf/alluxio-site.properties to include
alluxio.master.mount.table.root.option.alluxio.underfs.gcs.version=2 alluxio.master.mount.table.root.option.alluxio.underfs.address=gs://GCS_BUCKET/GCS_DIRECTORY alluxio.master.mount.table.root.option.fs.gcs.credential.path=/path/to/<google_application_credentials>.json
- The first property key tells Alluxio to load the Version 2 GCS UFS module which uses the Google cloud API.
- The second property key is the same as configuring Version 1 (default) GCS UFS module. Provides the GCS address as the Alluxio root UFS.
- The third property key provides the path to the Google application credentials json file. Note that the Google application credentials json file should be placed in all the Alluxio nodes in the same path. If the nodes running the Alluxio processes already contain the GCS credentials, this property may not be needed but it is always recommended to set this property explicitly.
After configuring GCS version 2 as the root mount, you can run Alluxio locally.
To mount a GCS bucket as a nested mount, run the following command in Alluxio home directory.
./bin/alluxio fs mount --option alluxio.underfs.gcs.version=2 --option fs.gcs.credential.path=/path/to/<google_application_credentials>.json /gcs gs://GCS_BUCKET/GCS_DIRECTORY
Advanced Configuration (Accessing GCS through Proxy)
If the Alluxio cluster is behind a corporate proxy or a firewall, the Alluxio GCS integration may not be able to access the internet with the default settings.
Add the following java options to
conf/alluxio-env.sh before starting the Alluxio Masters and Workers.
ALLUXIO_MASTER_JAVA_OPTS+=" -Dhttps.proxyHost=<proxy_host> -Dhttps.proxyPort=<proxy_port> -Dhttp.proxyHost=<proxy_host> -Dhttp.proxyPort=<proxy_port> -Dhttp.nonProxyHosts=<non_proxy_host>" ALLUXIO_WORKER_JAVA_OPTS+=" -Dhttps.proxyHost=<proxy_host> -Dhttps.proxyPort=<proxy_port> -Dhttp.proxyHost=<proxy_host> -Dhttp.proxyPort=<proxy_port> -Dhttp.nonProxyHosts=<non_proxy_host>"
An example value for
If username and password are required for the proxy, add the
https.proxyPassword java options.