Running Alluxio with Fault Tolerance on EC2

Slack Docker Pulls GitHub edit source

Alluxio with Fault Tolerance can be deployed on Amazon EC2 using the Vagrant scripts that come with Alluxio. The scripts let you create, configure, and destroy clusters that come automatically configured with Apache HDFS.


Install Vagrant and the AWS plugins

Download Vagrant

Install AWS Vagrant plugin:

vagrant plugin install vagrant-aws
vagrant box add dummy

Install Alluxio

Download Alluxio to your local machine, and unzip it:

tar xvfz alluxio-1.6.1-bin.tar.gz

Install python library dependencies

Install python>=2.7, not python3.

Under deploy/vagrant directory in your home directory, run:

sudo bash bin/

Alternatively, you can manually install pip, and then in deploy/vagrant run:

sudo pip install -r pip-req.txt

Launch a Cluster

To run an Alluxio cluster on EC2, first sign up for an Amazon EC2 account on the Amazon Web Services site.

Then create access keys and set shell environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY by:

export AWS_ACCESS_KEY_ID=<your access key>
export AWS_SECRET_ACCESS_KEY=<your secret access key>

Next generate your EC2 Key Pairs. Make sure to set the permissions of your private key file that only you can read it:

chmod 400 <your key pair>.pem

Copy deploy/vagrant/conf/ec2.yml.template to deploy/vagrant/conf/ec2.yml by:

cp deploy/vagrant/conf/ec2.yml.template deploy/vagrant/conf/ec2.yml

In the configuration file deploy/vagrant/conf/ec2.yml, set the value of Keypair to your keypair name and Key_Path to the path to the pem key.

In the configuration file deploy/vagrant/conf/ufs.yml, either set the value of Type to hadoop or specify an S3 bucket for the Bucket field.

In the configuration file deploy/vagrant/conf/alluxio.yml, set the value of Masters to the number of AlluxioMasters you want. In fault tolerant mode, value of Masters should be larger than 1.

By default, the Vagrant script creates a Security Group named alluxio-vagrant-test at Region(us-east-1) and Availability Zone(us-east-1b). The security group will be set up automatically in the region with all inbound/outbound network traffic opened. You can change the security group, region and availability zone in ec2.yml.

Now you can launch the Alluxio cluster with Hadoop2.4.1 as under filesystem in us-east-1b by running the script under deploy/vagrant:

./create <number of machines> aws

Note that the <number of machines> above should be larger than or equal to Masters set in deploy/vagrant/conf/alluxio.yml.

Each node of the cluster has an Alluxio worker and each master node has an Alluxio master. The leader is in one of the master nodes.

Access the cluster

Access through Web UI

After command ./create <number of machines> aws succeeds, you can see three green lines like below shown at the end of the shell output:

>>> Master public IP for Alluxio is xxx, visit xxx:19999 for Alluxio web UI<<<
>>> Master public IP for other softwares is xxx <<<
>>> visit default port of the web UI of what you deployed <<<

The first line shows public IP for current leader of all Alluxio masters.

The second line shows public IP for master of other softwares like Hadoop.

Default port for Alluxio Web UI is 19999.

Default port for Hadoop Web UI is 50070.

Visit http://{MASTER_IP}:{PORT} in the browser to access the Web UIs.

You can also monitor the instances state through AWS web console.

Access with ssh

The nodes created are placed in one of two categories.

One category contains AlluxioMaster, AlluxioMaster2 and so on, representing all Alluxio masters; one of them is the leader, and the others are standbys. AlluxioMaster is also the master for other software, like Hadoop. Each node also runs workers for Alluxio and other software like Hadoop.

Another group contains AlluxioWorker1, AlluxioWorker2 and so on. Each node runs workers for Alluxio and other software like Hadoop.

To ssh into a node, run:

vagrant ssh <node name>

For example, you can ssh into AlluxioMaster with:

vagrant ssh AlluxioMaster

All software is installed under the root directory, e.g. Alluxio is installed in /alluxio, Hadoop is installed in /hadoop, and Zookeeper is installed in /zookeeper.

On the leader master node, you can run tests against Alluxio to check its health:

/alluxio/bin/alluxio runTests

After the tests finish, visit Alluxio web UI at http://{MASTER_IP}:19999 again. Click Browse in the navigation bar, and you should see the files written to Alluxio by the above tests.

You can ssh into the current Alluxio master leader, and find process ID of the AlluxioMaster process with:

jps | grep AlluxioMaster

Then kill the leader with:

kill -9 <leader pid found via the above command>

Then you can ssh into AlluxioMaster where zookeeper is running to find out the new leader, and run the zookeeper client via:


In the zookeeper client shell, you can see the leader with the command:

ls /leader

The output of the command should show the new leader. You may need to wait for a moment for the new leader to be elected. You can query the public IP for the new leader based on its name in AWS web console.

Visit Alluxio web UI at http://{NEW_LEADER_MASTER_IP}:19999. Click Browse in the navigation bar, and you should see all files are still there.

From a node in the cluster, you can ssh to other nodes in the cluster without password with:

ssh AlluxioWorker1

Destroy the cluster

Under deploy/vagrant directory, you can run


to destroy the cluster that you created. Only one cluster can be created at a time. After the command succeeds, the EC2 instances are terminated.