Running Alluxio on a Cluster

Slack Docker Pulls GitHub edit source

Download Alluxio

First download the Alluxio tar file, and extract it.

tar xvfz alluxio-1.6.1-bin.tar.gz

Configure Alluxio

In the ${ALLUXIO_HOME}/conf directory, create the conf/ configuration file from the template.

cp conf/ conf/

Update alluxio.master.hostname in conf/ to the hostname of the machine you plan to run Alluxio Master on. Add the IP addresses or hostnames of all the worker nodes to the conf/workers file. You cannot use local file system as Alluxio’s under storage system if there are multiple nodes in the cluster. Instead you need to set up a shared storage to which all Alluxio servers have access. The shared storage can be network file system (NFS), HDFS, S3, and so on. For example, you can refer to Configuring Alluxio with S3 and follow its instructions to set up S3 as Alluxio’s under storage.

Finally, sync all the information to the worker nodes. You can use

./bin/alluxio copyDir <dirname>

to sync files and folders to all hosts specified in the alluxio/conf/workers file. If you have downloaded and extracted Alluxio tar file on the master only, you can use the copyDir command to sync the entire extracted Alluxio directory to workers. You can also use this command to sync any change to conf/ to the workers.

Start Alluxio

Now, you can start Alluxio:

cd alluxio
./bin/alluxio format
./bin/ # use the right parameters here. e.g. all Mount
# Notice: the Mount and SudoMount parameters will format the existing RamFS.

To verify that Alluxio is running, you can visit http://<alluxio_master_hostname>:19999, check the log in the directory alluxio/logs, or run a sample program:

./bin/alluxio runTests

Note: If you are using EC2, make sure the security group settings on the master node allows incoming connections on the alluxio web UI port.