Configuring Alluxio with OSS
This guide describes how to configure Alluxio with Aliyun OSS as the under storage system. Object Storage Service (OSS) is a massive, secure and highly reliable cloud storage service provided by Aliyun.
Initial Setup
To run an Alluxio cluster on a set of machines, you must deploy Alluxio binaries to each of these machines.You can either compile the binaries from Alluxio source code, or download the precompiled binaries directly.
Then, if you haven’t already done so, create your configuration file with bootstrapConf
command.
For example, if you are running Alluxio on your local machine, ALLUXIO_MASTER_HOSTNAME
should be set to localhost
./bin/alluxio bootstrapConf <ALLUXIO_MASTER_HOSTNAME>
Alternatively, you can also create the configuration file from the template and set the contents manually.
cp conf/alluxio-env.sh.template conf/alluxio-env.sh
Also, in preparation for using OSS with alluxio, create a bucket or use an existing bucket. You should also note that the directory you want to use in that bucket, either by creating a new directory in the bucket, or using an existing one. For the purposes of this guide, the OSS bucket name is called OSS_BUCKET, and the directory in that bucket is called OSS_DIRECTORY. Also, for using the OSS Service, you should provide an OSS endpoint to specify which range your bucket is on. The endpoint here is called OSS_ENDPOINT, and to learn more about the endpoints for special range you can see here. For more information about OSS Bucket, Please see here
Configuring Alluxio
You need to configure Alluxio to use OSS as its under storage system. The first modification is to specify an existing OSS bucket and directory as the under storage system by modifying conf/alluxio-site.properties
to include:
alluxio.underfs.address=oss://OSS_BUCKET/OSS_DIRECTORY/
Next you need to specify the Aliyun credentials for OSS access. In conf/alluxio-site.properties
, add:
fs.oss.accessKeyId=<OSS_ACCESS_KEY_ID>
fs.oss.accessKeySecret=<OSS_ACCESS_KEY_SECRET>
fs.oss.endpoint=<OSS_ENDPOINT>
Here fs.oss.accessKeyId
is the Access Key Id string and fs.oss.accessKeySecret
is the Access Key Secret string, which are managed in AccessKeys in Aliyun UI.
fs.oss.endpoint
is the endpoint of this bucket, which can be found in the Bucket overview with possible values like “oss-us-west-1.aliyuncs.com”, “oss-cn-shanghai.aliyuncs.com”
(OSS Internet Endpoint).
After these changes, Alluxio should be configured to work with OSS as its under storage system, and you can try to run alluxio locally with OSS.
Configuring Distributed Applications
If you are using an Alluxio client that is running separately from the Alluxio Master and Workers (in a separate JVM), then you need to make sure that your Aliyun credentials are provided to the application JVM processes as well. The easiest way to do this is to add them as command line options when starting your client JVM process. For example:
java -Xmx3g -Dfs.oss.accessKeyId=<OSS_ACCESS_KEY_ID> -Dfs.oss.accessKeySecret=<OSS_ACCESS_KEY_SECRET> -Dfs.oss.endpoint=<OSS_ENDPOINT> -cp my_application.jar com.MyApplicationClass myArgs
Alternatively, you may copy conf/alluxio-site.properties
(having the properties setting credentials) to the classpath
of your application runtime (e.g., $SPARK_CLASSPATH
for Spark), or append the path to this site properties file to
the classpath.
Running Alluxio Locally with OSS
After everything is configured, you can start up Alluxio locally to see that everything works.
./bin/alluxio format
./bin/alluxio-start.sh local
This should start an Alluxio master and an Alluxio worker. You can see the master UI at http://localhost:19999.
Next, you can run a simple example program:
./bin/alluxio runTests
After this succeeds, you can visit your OSS directory OSS_BUCKET/OSS_DIRECTORY to verify the files and directories created by Alluxio exist. For this test, you should see files named like:
OSS_BUCKET/OSS_DIRECTORY/default_tests_files/BasicFile_CACHE_PROMOTE_MUST_CACHE
To stop Alluxio, you can run:
./bin/alluxio-stop.sh all