Alluxio Deployment

Slack Docker Pulls

Installation

Alluxio creates a distributed file system across one or more nodes which constitute your Alluxio cluster. Cluster nodes assume one of two roles, either ‘master’ or ‘worker’. The master node(s) manage the file system metadata, while the worker node(s) store data and serve read/write requests from Alluxio clients.

NOTE: In the special case of a single node deployment, the same node will assume the ‘master’ and ‘worker’ role.

  1. Your cluster nodes should already be running.
  2. Make sure your cluster meets the Alluxio requirements.

Download and Unpack Alluxio

  1. Download Alluxio. You’ll need to sign in or create an account if you don’t already have one.
  2. Unpack the Alluxio tarball to the same directory on each node of your Alluxio cluster. This directory will be referred to as ${ALLUXIO_HOME} in the Alluxio configuration file and in the rest of this documentation.
$ tar xvfz alluxio-enterprise-1.8.0-<hadoop-version>.tar.gz

Distribute SSH keys to Workers

Alluxio master node must be able to ssh to the worker nodes to start Alluxio processes.

  1. If you don’t have an SSH key pair on the master node yet, you can generate it using the following command.
    $ ssh-keygen -t rsa
    
  2. Append the Alluxio master node’s public SSH key (e.g. id_rsa.pub) to the .ssh/authorized_keys file on each of the worker nodes in the home directory of the user that will be used for running Alluxio.

Initial Configuration

On Alluxio master node:

  1. Go to the ${ALLUXIO_HOME} directory.
  2. You should have received an email from your Alluxio sales representative which includes your Alluxio Enterprise Edition license. Download the attached license, rename the file to license.json, and place it in the root of the ${ALLUXIO_HOME} directory.
  3. Copy conf/alluxio-site.properties.template to conf/alluxio-site.properties. This file can be use to specify Alluxio configuration. The minimal setup requires you to set the alluxio.master.hostname property to match the master node hostname; for single node deployments, this should be localhost.
  4. Edit the conf/workers file, listing the hostnames for all of your worker nodes. For single node deployments, this step can be skipped.

On each worker node:

  1. Go to the ${ALLUXIO_HOME} directory.
  2. Copy conf/alluxio-site.properties.template to conf/alluxio-site.properties. In this file, set the alluxio.master.hostname property to match the master node hostname; for single node deployments, this should be localhost.

Alternatively, you can use

$ ./bin/alluxio copyDir <dirname>

to sync files and folders to all hosts specified in the alluxio/conf/workers file. If you have downloaded and extracted Alluxio tar file on the master only, you can use the copyDir command to sync the entire extracted Alluxio directory to workers. You can also use this command to sync any change to conf/alluxio-site.properties to the workers.

Format Alluxio

On the master node:

  1. Go to the ${ALLUXIO_HOME} directory.
  2. Run the Alluxio format command.
    $ ./bin/alluxio format
    

NOTE: This step is only required when you run Alluxio for the first time. If you run this command for an existing Alluxio cluster, all previously stored data and metadata in Alluxio filesystem will be erased. However, data in under storage will not be changed.

  1. Verify that the journal was created.
    $ ls -hal | grep journal
    drwxr-xr-t  5 ubuntu ubuntu 4.0K Nov 26 22:36 journal
    

Start Alluxio

Alluxio provides convenience scripts that let you start all Alluxio process from the master node.

On the master node:

  1. Go to the ${ALLUXIO_HOME} directory.
  2. Invoke the following commands, which will start all Alluxio processes. In particular, these commands will start Alluxio processes on the master node and on all Alluxio processes for each of the worker nodes. For each of the worker nodes, the commands will also create a RAM disk.
    $ ./bin/alluxio-start.sh all SudoMount
    

    Alluxio should now be running on each of the cluster nodes.

  3. To verify that Alluxio processes are running on the master node, run the following command.
    $ jps | grep -v Jps
    1000 AlluxioMaster
    ...
    
  4. To verify that Alluxio processes are running on the worker nodes, run the following command on each worker node.
    $ jps | grep -v Jps
    1000 AlluxioWorker
    ...
    

Run Built-in I/O Tests

To verify that the Alluxio cluster is operational, you can run built-in I/O tests through:

$ ./bin/alluxio runTests
...

NOTE: The built-in I/O tests require the root UFS folder specified by the alluxio.underfs.address configuration property (${ALLUXIO_HOME}/underFSStorage by default) exists. You cannot use local file system as Alluxio’s under storage system if there are multiple nodes in the cluster. Instead you need to set up a shared storage to which all Alluxio servers have access. The shared storage can be network file system (NFS), HDFS, S3, and so on. For example, you can refer to Configuring Alluxio with S3 and follow its instructions to set up S3 as Alluxio’s under storage.

Stop Alluxio

To stop Alluxio and its processes, invoke the following commands from the master node:

  1. Go to the ${ALLUXIO_HOME} directory.
  2. Invoke the following commands, which will stop all Alluxio processes.
    $ ./bin/alluxio-stop.sh all
    

Manage Alluxio without SSH

If you cannot use SSH command on your cluster, you can manually start and stop Alluxio cluster by running commands on individual nodes.

You will need to format Alluxio file system before you run Alluxio for the first time.

First, run the following command on one of the master node to create new journal log:

$ ./bin/alluxio formatMaster

And then run the following command on each worker node to format the worker storage:

$ ./bin/alluxio formatWorker

NOTE: This step is only required when you run Alluxio for the first time. If you format an existing Alluxio cluster, all previously stored data and metadata in Alluxio filesystem will be erased. However, data in under storage will not be changed.

To start Alluxio, run the following commands on each master node:

$ ./bin/alluxio-start.sh master
$ ./bin/alluxio-start.sh job_master
$ ./bin/alluxio-start.sh proxy

And then run the following commands on each worker node:

$ ./bin/alluxio-start.sh worker
$ ./bin/alluxio-start.sh job_worker
$ ./bin/alluxio-start.sh proxy

To stop Alluxio, run the following commands on each master node:

$ ./bin/alluxio-stop.sh proxy
$ ./bin/alluxio-stop.sh job_master
$ ./bin/alluxio-stop.sh master

And then run the following commands on each worker node:

$ ./bin/alluxio-stop.sh proxy
$ ./bin/alluxio-stop.sh job_worker
$ ./bin/alluxio-stop.sh worker

Next Steps

Congratulations on successfully installing Alluxio!

There are several next steps available. You can learn more about the various key features of Alluxio. You can also transparently mount storage systems with the Alluxio unified namespace or configure your applications to work with the Alluxio Filesystem API.