Configuring Alluxio with Google Cloud Storage
This guide describes how to configure Alluxio with Google Cloud Storage (GCS) as the under storage system.
Then, if you haven’t already done so, create your configuration file with
For example, if you are running Alluxio on your local machine,
ALLUXIO_MASTER_HOSTNAME should be set to
./bin/alluxio bootstrapConf <ALLUXIO_MASTER_HOSTNAME> gcs
Alternatively, you can also create the configuration file from the template and set the contents manually.
cp conf/alluxio-env.sh.template conf/alluxio-env.sh
Also, in preparation for using GCS with Alluxio, create a bucket (or use an existing bucket). You
should also note the directory you want to use in that bucket, either by creating a new directory in
the bucket, or using an existing one. For the purposes of this guide, the GCS bucket name is called
GCS_BUCKET, and the directory in that bucket is called
If you are new to Google Cloud Storage, please read the GCS documentations first.
You need to configure Alluxio to use GCS as its under storage system.
The first modification is to specify an existing GCS
bucket and directory as the under storage system by modifying
Next, you need to specify the Google credentials for GCS access. In
<GCS_SECRET_ACCESS_KEY> should be replaced with your actual
GCS interoperable storage access keys,
or other environment variables that contain your credentials.
Note: GCS interoperability is disabled by default. Please click on the Interoperability tab
in GCS setting and enable this feature.
Then click on
Create a new key to get the Access Key and Secret pair.
Alternatively, these configuration settings can be set in the
conf/alluxio-env.sh file. More
details about setting configuration parameters can be found in
After these changes, Alluxio should be configured to work with GCS as its under storage system, and you can try Running Alluxio Locally with GCS.
Configuring Application Dependency
When building your application to use Alluxio, your application will have to include the
alluxio-core-client module. If you are using maven, you can add the
dependency to your application with:
<dependency> <groupId>org.alluxio</groupId> <artifactId>alluxio-core-client</artifactId> <version>1.4.0</version> </dependency>
Configuring Distributed Applications Runtime
When I/O is delegated to Alluxio workers (i.e., Alluxio configuration
alluxio.user.ufs.operation.delegation is true,
which is false by default since Alluxio 1.1), you do not have to do any thing special for your applications.
Otherwise, since you are using an Alluxio client that is running separately from the Alluxio Master and Workers (in
a separate JVM), then you need to make sure that your Google credentials are provided to the
application JVM processes as well. There are different ways to do this. The first approach is to add them as command line
options when starting your client JVM process. For example:
java -Xmx3g -Dfs.gcs.accessKeyId=<GCS_ACCESS_KEY_ID> -Dfs.gcs.secretAccessKey=<GCS_SECRET_ACCESS_KEY> -cp my_application.jar com.MyApplicationClass myArgs
Alternatively, you may copy
conf/alluxio-site.properties (having the properties setting credentials) to the classpath
of your application runtime (e.g.,
$SPARK_CLASSPATH for Spark), or append the path of this site properties file to
Running Alluxio Locally with GCS
After everything is configured, you can start up Alluxio locally to see that everything works.
./bin/alluxio format ./bin/alluxio-start.sh local
This should start an Alluxio master and an Alluxio worker. You can see the master UI at http://localhost:19999.
Next, you can run a simple example program:
After this succeeds, you can visit your GCS directory
GCS_BUCKET/GCS_DIRECTORY to verify the files
and directories created by Alluxio exist. For this test, you should see files named like:
To stop Alluxio, you can run:
GCS Access Control
If Alluxio security is enabled, Alluxio enforces the access control inherited from underlying object storage.
The GCS credentials specified in Alluxio config represents a GCS user. GCS service backend checks the user permission to the bucket and the object for access control. If the given GCS user does not have the right access permission to the specified bucket, a permission denied error will be thrown. When Alluxio security is enabled, Alluxio loads the bucket ACL to Alluxio permission on the first time when the metadata is loaded to Alluxio namespace.
Mapping from GCS user to Alluxio file owner
By default, Alluxio tries to extract the GCS user id from the credentials. Optionally,
alluxio.underfs.gcs.owner.id.to.username.mapping can be used to
specify a preset gcs owner id to Alluxio username static mapping in the format “id1=user1;id2=user2”.
The Google Cloud Storage IDs can be found at the console address. Please use the “Owners” one.
Mapping from GCS ACL to Alluxio permission
Alluxio checks the GCS bucket READ/WRITE ACL to determine the owner’s permission mode to a Alluxio file. For example, if the GCS user has read-only access to the underlying bucket, the mounted directory and files would have 0500 mode. If the GCS user has full access to the underlying bucket, the mounted directory and files would have 0700 mode.
Mount point sharing
If you want to share the GCS mount point with other users in Alluxio namespace, you can enable
In addition, chown/chgrp/chmod to Alluxio directories and files do NOT propagate to the underlying GCS buckets nor objects.