Alluxio with Fault Tolerance

Slack Docker Pulls

In fault tolerant mode, an Alluxio cluster has multiple masters. One master is elected the leader and is used by all of the workers and clients in the cluster to perform their functions. The other masters act as standbys and have access to the latest file system metadata so that the cluster can fail over if the current leader becomes nonfunctional.

Alluxio can either use an embedded journal or UFS-based journal for maintaining state across restarts. The embedded journal comes with its own leader election, while UFS journaling relies on Zookeeper for leader election. This guide discusses how to run Alluxio with a UFS-based journal and Zookeeper. See this doc for documentation on how to use the embedded journal. The embedded journal is appropriate when no fast, non-object storage such as HDFS or NFS is available, or no Zookeeper cluster is available.

Alluxio Clusters with Fault Tolerance

In this guide, we’ll install ZooKeeper, create a shared file system to hold the journal, and then create an Alluxio cluster. Finally, we’ll trigger a failover to witness the transition from master to standby.

There are several options for the cloud service provider, compute framework, and storage systems you can use with Alluxio. The high level steps for installing Alluxio with fault tolerance are the same regardless, but for the purpose of providing some particulars and command line examples, we’ll co-locate Alluxio and our compute engine on Amazon EC2 instances running Ubuntu Trusty, store our data in Amazon S3, and store the journal in NFS.

Install ZooKeeper

Apache ZooKeeper is a service that allows distributed processes to coordinate with each other. To the uninitiated, ZooKeeper can be arcane, but don’t worry, only its base functionality is required, and it’s simple to install and configure.

The leader and standby master processes don’t sync their own copies of the journal but instead rely on a shared journal. The journal is created on a shared storage system to which all master processes can read and write. The leader writes to the journal, while the others continually read new entries in order to stay up-to-date. The shared journal may reside in the Alluxio cluster under storage, it need not be stored on a separate system.

Alluxio uses ZooKeeper to achieve master fault tolerance: Alluxio masters use ZooKeeper for leader election. Alluxio clients also use ZooKeeper to inquire about the identity and address of the current leader.

ZooKeeper can either be deployed on the same machines as Alluxio or on different machines. In a production environment, the downside to colocating them is that a machine level failure would take down both a ZooKeeper node and an Alluxio node. ZooKeeper could also be shared by other services, in which case it would be more logical to deploy it separately.

For the purpose of providing an example installation, it’s simpler to use the same cluster—ZooKeeper doesn’t consume many resources anyway. We will however enable ZooKeeper redundancy which requires at least 3 hosts.

  1. Install ZooKeeper on 3 machines using your favorite package manager. These machines will comprise your ZooKeeper ensemble.
    $ sudo apt-get update -y
    $ sudo apt-get install -y zookeeper zookeeperd
    
  2. Each ZooKeeper server needs to know about all the other servers in the ensemble.
    • On each server, open /etc/zookeeper/conf/zoo.cfg in a text editor, and replace the IP addresses in the following section with the addresses or hostnames of your ZooKeeper servers. Leave everything else the same, and save your changes.
      # specify all zookeeper servers
      # The fist port is used by followers to connect to the leader
      # The second one is used for leader election
      # NOTE: In EC2 you must use the private instance address, not the public instance address.
      server.1=zookeeper1:2888:3888
      server.2=zookeeper2:2888:3888
      server.3=zookeeper3:2888:3888
      
  3. Double-check that your security policy allows inbound TCP 2888 and inbound TCP 3888. On Amazon EC2, this is done by adding rules to your security group.
  4. Each ZooKeeper server needs a unique id. On each server in the ensemble, open /etc/zookeeper/conf/myid and replace the existing line with an integer from 1 to 255. Save your changes, and exit.
  5. On each ZooKeeper server, restart the ZooKeeper service.
    $ sudo service zookeeper restart
    
  6. (Optional) Let’s check which server is the leader and trigger a failover.
    1. Use zkServer.sh status on all 3 servers in the ensemble to see who the leader is.
      $ sudo /usr/share/zookeeper/bin/zkServer.sh status
      JMX enabled by default
      Using config: /etc/zookeeper/conf/zoo.cfg
      Mode: leader
      
    2. Stop ZooKeeper on the leader, and display the status again on the standbys to discover which one was elected the new leader.
      $ sudo service zookeeper stop
      
    3. Make sure you restart ZooKeeper on the former leader before proceeding to the next section!
      $ sudo service zookeeper start
      

Note that the failover we triggered in this section demonstrated ZooKeeper fault tolerance. This is different from Alluxio fault tolerance, which will be demonstrated once Alluxio is set up.

Set Up Shared Storage System

Alluxio requires a shared storage system to store the journal. All masters must be able to read and write to the shared system, though only the leader will be writing to the journal at any given time. The standby masters continually read new entries to stay up to date. The shared system should already be running before your Alluxio cluster is started.

The shared journal may reside in the Alluxio cluster under storage, it need not be stored on a separate system. HDFS or GlusterFS are fine choices. S3 will also work but is not performant. You might find it convenient to use S3 anyway for the purpose of getting started in which case you can specify something like s3n://bucket/journal during cluster setup.

In this section, in order to get up and running quickly, we’ll use NFS for the shared storage system, and reuse one of the machines running Alluxio master for the NFS server. If you decide to use NFS in your production environment you’ll want to deploy it independent of Alluxio and add some form of redundancy.

  1. Install the NFS server package.
    $ sudo apt-get update -y
    $ sudo apt-get install -y nfs-kernel-server
    
  2. Create the directory that will hold the journal. Alluxio processes run as a user you’ll specify later when creating a cluster. Right now though, you’ll need to either create the directory in a location where that user has read-write access, e.g., its home directory; otherwise modify the permissions of any directory you create. For this example Alluxio install we intend to use the default user:group for Ubuntu AMIs, and we’ll change ownership accordingly.
    $ sudo mkdir -p /var/alluxio/journal
    $ sudo chown -R ubuntu:ubuntu /var/alluxio
    
  3. Add the journal directory to /etc/exports along with read-write permissions for all the nodes which need to access the journal. (Replace alluxio_master# with the hostnames of your Alluxio master nodes.)
    $ /var/alluxio/journal alluxio_master1(rw,sync,no_subtree_check) alluxio_master2(rw,sync,no_subtree_check) alluxio_master3(rw,sync,no_subtree_check)
    
  4. Start the NFS service, export the directory, and restart the service
    $ sudo service nfs-kernel-server start
    $ sudo exportfs -ra
    $ sudo service nfs-kernel-server restart
    
  5. Double-check that your security policy allows inbound TCP 111 and inbound TCP 2049. NFS needs these ports to be open to operate. On Amazon EC2, this is done by adding rules to your security group.
  6. Install the NFS client package on each server in your ensemble and mount the journal location to the same location on all of them.
    $ sudo apt-get install -y nfs-common
    $ sudo mkdir -p /mnt/alluxio/journal
    $ sudo mount JOURNAL_HOST_IP:/var/alluxio/journal /mnt/alluxio/journal
    

Create an Alluxio Cluster

Once you have ZooKeeper and your shared storage system running, you can create your Alluxio cluster.

Installing a Fault Tolerant Alluxio Cluster

During the installation and configuration of Alluxio, these are the configuration settings to update for a fault tolerant Alluxio cluster.

On the Alluxio masters and Alluxio workers, in the conf/alluxio-site.properties file, add the following settings:

alluxio.zookeeper.enabled=true
alluxio.zookeeper.address=[zookeeper_hostname1]:2181,[zookeeper_hostname2]:2181,[zookeeper_hostname3]:2181
alluxio.master.journal.folder=/mnt/alluxio/journal

Also, make sure the property alluxio.master.hostname is set to its externally visible address on each of the masters (not localhost). On EC2, this looks like ip-xxx-xx-x-xxx.ec2.internal.

Note if you are using HDFS location for storing master journal, please make sure to set alluxio.underfs.hdfs.version to your Hadoop version for the HDFS. See Configuring Alluxio with HDFS for more details.

Trigger Master Failover

At this point ZooKeeper should be installed and your Alluxio cluster should be running. Now let’s witness an Alluxio master failover.

  1. First, discover who the leader is using the CLI client. You can do this in the alluxio directory on any cluster node.
    $ ./bin/alluxio fs leader
    
  2. Now stop the master process on the leader.
    $ ./bin/alluxio-stop.sh master
    
  3. Look again for the Alluxio leader.
    $ ./alluxio/bin/alluxio fs leader
    # you can see the failover in master.log too
    $ cat logs/master.log | grep leader
    # the current leader
    INFO  LeaderSelectorClient - The current leader is ec2-35-161-219-93.us-west-2.compute.amazonaws.com:19998
    # the znode for the leader is being re-written
    INFO  LeaderSelectorClient - deleting zk path: /leader/ec2-35-160-129-131.us-west-2.compute.amazonaws.com:19998
    INFO  LeaderSelectorClient - creating zk path: /leader/ec2-35-160-129-131.us-west-2.compute.amazonaws.com:19998
    # the new leader
    INFO  LeaderSelectorClient - ec2-35-160-129-131.us-west-2.compute.amazonaws.com:19998 is now the leader.
    

Configure your Client Application

Your client application must be able to consult ZooKeeper for the leader master. To do this, set the following values in your client application:

alluxio.zookeeper.enabled=true
alluxio.zookeeper.address=[zookeeper_hostname]:2181

If you have multiple nodes in your ZooKeeper ensemble, you can specify them like:

alluxio.zookeeper.enabled=true
alluxio.zookeeper.address=[zookeeper_hostname1]:2181,[zookeeper_hostname2]:2181,[zookeeper_hostname3]:2181

Hadoop Client

When using the Hadoop client, you must use the scheme alluxio-ft instead of alluxio for your URI scheme, e.g. alluxio-ft:///path/to/alluxio/file. This indicates to the client that it needs to consult ZooKeeper to get the leader master address.