Running Alluxio with Fault Tolerance on EC2
Alluxio with Fault Tolerance can be deployed on Amazon EC2 using the Vagrant scripts that come with Alluxio. The scripts let you create, configure, and destroy clusters that come automatically configured with Apache HDFS.
Prerequisites
Install Vagrant and the AWS plugins
Download Vagrant
Install AWS Vagrant plugin:
vagrant plugin install vagrant-aws
vagrant box add dummy https://github.com/mitchellh/vagrant-aws/raw/master/dummy.box
Install Alluxio
Download Alluxio to your local machine, and unzip it:
wget http://alluxio.org/downloads/files/1.6.1/alluxio-1.6.1-bin.tar.gz
tar xvfz alluxio-1.6.1-bin.tar.gz
Install python library dependencies
Install python>=2.7, not python3.
Under deploy/vagrant
directory in your home directory, run:
sudo bash bin/install.sh
Alternatively, you can manually install pip, and then
in deploy/vagrant
run:
sudo pip install -r pip-req.txt
Launch a Cluster
To run an Alluxio cluster on EC2, first sign up for an Amazon EC2 account on the Amazon Web Services site.
Then create access keys
and set shell environment variables AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
by:
export AWS_ACCESS_KEY_ID=<your access key>
export AWS_SECRET_ACCESS_KEY=<your secret access key>
Next generate your EC2 Key Pairs. Make sure to set the permissions of your private key file that only you can read it:
chmod 400 <your key pair>.pem
Copy deploy/vagrant/conf/ec2.yml.template
to deploy/vagrant/conf/ec2.yml
by:
cp deploy/vagrant/conf/ec2.yml.template deploy/vagrant/conf/ec2.yml
In the configuration file deploy/vagrant/conf/ec2.yml
, set the value of Keypair
to your keypair
name and Key_Path
to the path to the pem key.
In the configuration file deploy/vagrant/conf/ufs.yml
, either set the value of Type
to hadoop
or
specify an S3 bucket for the Bucket
field.
In the configuration file deploy/vagrant/conf/alluxio.yml
, set the value of Masters
to the
number of AlluxioMasters you want. In fault tolerant mode, value of Masters
should be larger than
1.
By default, the Vagrant script creates a
Security Group
named alluxio-vagrant-test at
Region(us-east-1) and Availability Zone(us-east-1b).
The security group will be set up automatically in the region with all inbound/outbound network
traffic opened. You can change the security group, region and availability zone in ec2.yml
.
Now you can launch the Alluxio cluster with Hadoop2.4.1 as under filesystem in us-east-1b by running
the script under deploy/vagrant
:
./create <number of machines> aws
Note that the <number of machines>
above should be larger than or equal to Masters
set in
deploy/vagrant/conf/alluxio.yml
.
Each node of the cluster has an Alluxio worker and each master node has an Alluxio master. The leader is in one of the master nodes.
Access the cluster
Access through Web UI
After command ./create <number of machines> aws
succeeds, you can see three green lines like below
shown at the end of the shell output:
>>> Master public IP for Alluxio is xxx, visit xxx:19999 for Alluxio web UI<<<
>>> Master public IP for other softwares is xxx <<<
>>> visit default port of the web UI of what you deployed <<<
The first line shows public IP for current leader of all Alluxio masters.
The second line shows public IP for master of other softwares like Hadoop.
Default port for Alluxio Web UI is 19999.
Default port for Hadoop Web UI is 50070.
Visit http://{MASTER_IP}:{PORT}
in the browser to access the Web UIs.
You can also monitor the instances state through AWS web console.
Access with ssh
The nodes created are placed in one of two categories.
One category contains AlluxioMaster
, AlluxioMaster2
and so on, representing all Alluxio masters;
one of them is the leader, and the others are standbys. AlluxioMaster
is also the master for other
software, like Hadoop. Each node also runs workers for Alluxio and other software like Hadoop.
Another group contains AlluxioWorker1
, AlluxioWorker2
and so on. Each node runs workers
for Alluxio and other software like Hadoop.
To ssh into a node, run:
vagrant ssh <node name>
For example, you can ssh into AlluxioMaster
with:
vagrant ssh AlluxioMaster
All software is installed under the root directory, e.g. Alluxio is installed in /alluxio
,
Hadoop is installed in /hadoop
, and Zookeeper is installed in /zookeeper
.
On the leader master node, you can run tests against Alluxio to check its health:
/alluxio/bin/alluxio runTests
After the tests finish, visit Alluxio web UI at http://{MASTER_IP}:19999
again. Click
Browse
in the navigation bar, and you should see the files written to Alluxio by the
above tests.
You can ssh into the current Alluxio master leader, and find process ID of the AlluxioMaster process with:
jps | grep AlluxioMaster
Then kill the leader with:
kill -9 <leader pid found via the above command>
Then you can ssh into AlluxioMaster
where zookeeper is
running to find out the new leader, and run the zookeeper client via:
/zookeeper/bin/zkCli.sh
In the zookeeper client shell, you can see the leader with the command:
ls /leader
The output of the command should show the new leader. You may need to wait for a moment for the new leader to be elected. You can query the public IP for the new leader based on its name in AWS web console.
Visit Alluxio web UI at http://{NEW_LEADER_MASTER_IP}:19999
. Click Browse
in the
navigation bar, and you should see all files are still there.
From a node in the cluster, you can ssh to other nodes in the cluster without password with:
ssh AlluxioWorker1
Destroy the cluster
Under deploy/vagrant
directory, you can run
./destroy
to destroy the cluster that you created. Only one cluster can be created at a time. After the command succeeds, the EC2 instances are terminated.