Running Alluxio on a Cluster with High Availability

Slack Docker Pulls GitHub edit source

High availability in Alluxio is based upon a multi-master approach, where multiple master processes are running in the system. One of these processes is elected the leader and is used by all workers and clients as the primary point of contact. The other masters act as standby, and use the shared journal to ensure that they maintain the same file system metadata as the leader, and can rapidly take over in the event of the leader failing.

If the leader fails, a new leader is automatically selected from the available standby masters and Alluxio proceeds as usual. Note that while the switchover to a standby master happens clients may experience brief delays or transient errors.


There are two prerequisites for setting up a Alluxio cluster with high availability:

  • ZooKeeper
  • A shared reliable under filesystem on which to place the journal

Alluxio requires ZooKeeper for leader selection. This ensures that there is at most one leader master for Alluxio at any given time.

Alluxio also requires a shared under filesystem on which to place the journal. This shared filesystem must be accessible by all the masters, and possible options include HDFS, Amazon S3, or GlusterFS. The leader master writes the journal to the shared filesystem, while the other (standby) masters continually replay the journal entries to stay up-to-date.


Alluxio uses ZooKeeper to achieve master high availability. Alluxio masters use ZooKeeper for leader election. Alluxio clients also use ZooKeeper to inquire about the identity and address of the current leader.

ZooKeeper must be set up independently (see ZooKeeper Getting Started).

After ZooKeeper is deployed, note the address and port for configuring Alluxio below.

Shared Filesystem for Journal

Alluxio requires a shared filesystem to store the journal. All masters must be able to read and write to the shared filesystem. Only the leader master will be writing to the journal at any given time, but all masters read the shared journal to replay the state of Alluxio.

The shared filesystem must be set up independently from Alluxio, and should already be running before Alluxio is started.

For example, if using HDFS for the shared journal, you need to note address and port running your NameNode, as you will need to configure Alluxio below.

Configuring Alluxio

Once you have ZooKeeper and your shared filesystem running, you need to set up your appropriately on each host.

Externally Visible Address

In the following sections we refer to an “externally visible address”. This is simply the address of an interface on the machine being configured that can be seen by other nodes in the Alluxio cluster. On EC2, you should use the ip-x-x-x-x address. In particular, don’t use localhost or, as other nodes will then be unable to reach your node.

Configuring Fault Tolerant Alluxio

In order to configure high availability for Alluxio, additional configuration settings must be set for Alluxio masters, workers, and clients. In conf/, these java options must be set:

Property NameValueMeaning
alluxio.zookeeper.enabled true If true, masters will use ZooKeeper to enable fault tolerance.
alluxio.zookeeper.address [zookeeper_hostname]:2181 The hostname and port ZooKeeper is running on. Separate multiple addresses with commas.

To set these options, you can configure your conf/ to include:


If you are using a cluster of ZooKeeper nodes, you can specify multiple addresses by separating them with commas, like:


Alternatively, these configuration settings can be set in the conf/ file. More details about setting configuration parameters can be found in Configuration Settings.

Master Configuration

In addition to the above configuration settings, Alluxio masters need additional configuration. For each master, the following variable must be set appropriately in conf/

alluxio.master.hostname=[externally visible address of this machine]

Also, specify the correct journal folder by setting alluxio.master.journal.folder appropriately in conf/ For example, if you are using HDFS for the journal, you can add:


Once all the Alluxio masters are configured in this way, they can all be started to achieve highly available Alluxio. One of the masters will become the leader, and the others will replay the journal and wait until the current master dies.

Worker Configuration

As long as the config parameters above are correctly set, the worker will be able to consult with ZooKeeper, and find the current leader master to connect to. Therefore, alluxio.master.hostname does not have to be set for the workers.

Note: When running Alluxio in high availability mode, it is possible that the default worker heartbeat timeout value is too short. It is recommended to increase that value to a higher value, in order to handle the situations when a master failover occurs. In order to increase the heartbeat timeout value on the workers, modify the configuration parameter in conf/ to a larger value (at least a few minutes).

Client Configuration

No additional configuration parameters are required for high availability mode. As long as both:


are set appropriately for your client application, the application will be able to consult with ZooKeeper for the current leader master.


When communicating with Alluxio in HA mode using the HDFS API, ensure the client side zookeeper configuration is properly set. Use alluxio:// for the scheme but the host and port may be omitted. Any host provided in the URL is ignored; alluxio.zookeeper.address is used instead for finding the Alluxio leader master.

hadoop fs -ls alluxio:///directory

Automatic Fail Over

To test automatic fail over, ssh into current Alluxio master leader, and find process ID of the AlluxioMaster process with:

jps | grep AlluxioMaster

Then kill the leader with:

kill -9 <leader pid found via the above command>

Then you can see the leader with the following command:

./bin/alluxio fs leader

The output of the command should show the new leader. You may need to wait for a moment for the new leader to be elected.

Visit Alluxio web UI at http://{NEW_LEADER_MASTER_IP}:{NEW_LEADER_MASTER_WEB_PORT}. Click Browse in the navigation bar, and you should see all files are still there.