Running Alluxio on Kubernetes

Slack Docker Pulls

Alluxio can be run on Kubernetes. This guide demonstrates how to run Alluxio on Kubernetes using the specification that comes in the Alluxio enterprise tarball.

Prerequisites

  • A Kubernetes cluster (version >= 1.8). Alluxio workers will use emptyDir volumes with a restricted size using the sizeLimit parameter. This is an alpha feature in Kubernetes 1.8. Please ensure the feature is enabled. If using the kubeadm CLI:
    $ kubeadm init --kubernetes-version v1.8.0 --feature-gates=SelfHosting=true
    
  • An Alluxio Docker image. Refer to this page for instructions to build an image. The image must be available for a pull from all Kubernetes hosts running Alluxio processes. This can be achieved by pushing the image to an accessible Docker registry, or pushing the image individually to all hosts. If using a private Docker registry, refer to the Kubernetes documentation.

Download and unpack Alluxio

  1. Download Alluxio. You’ll need to sign in or create an account if you don’t already have one.
  2. Unpack the Alluxio tarball to a directory.
    $ tar xvfz alluxio-enterprise-1.8.0-<hadoop distribution>.tar.gz
    $ cd integration/kubernetes
    

Enable short-circuit operations

Short-circuit access enables clients to perform read and write operations directly against the worker memory instead of having to go through the worker process. Set up a domain socket on all hosts eligible to run the Alluxio worker process to enable this mode of operation.

From the host machine, create a directory for the shared domain socket.

$ mkdir /tmp/domain
$ chmod a+w /tmp/domain

This step can be skipped in case short-circuit accesss is not desired or cannot be set up. To disable this feature, set the property alluxio.user.short.circuit.enabled=false according to the instructions in the configuration section below.

Provision a Persistent Volume

Alluxio master can be configured to use a persistent volume for storing the journal. The volume, once claimed, is persisted across restarts of the master process.

Create the persistent volume spec from the template. The access mode ReadWriteMany is used to allow multiple Alluxio master nodes to access the shared volume.

$ cp alluxio-journal-volume.yaml.template alluxio-journal-volume.yaml

Note: the spec provided uses a hostPath volume for demonstration on a single-node deployment. For a multi-node cluster, you may choose to use NFS, AWSElasticBlockStore, GCEPersistentDisk or other available persistent volume plugins.

Create the persistent volume.

$ kubectl create -f alluxio-journal-volume.yaml

Configure Alluxio properties

Alluxio containers in Kubernetes use environment variables to set Alluxio properties. Refer to Docker configuration for the corresponding environment variable name for Alluxio properties in conf/alluxio-site.properties.

Define all environment variables in a single file. Copy the properties template at integration/kubernetes/conf, and modify or add any configuration properties as required. Note that when running Alluxio with host networking, the ports assigned to Alluxio services must not be occupied beforehand.

$ cp conf/alluxio.properties.template conf/alluxio.properties

Create a ConfigMap.

$ kubectl create configmap alluxio-config --from-file=ALLUXIO_CONFIG=conf/alluxio.properties

Deploy

Prepare the Alluxio deployment specs from the templates. Modify any parameters required, such as location of the Docker image, and CPU and memory requirements for pods. The Alluxio master container uses a Kubernetes secret to store the base64-encoded enterprise license string. Modify the master template to specify the license.

$ cat /path/to/license.json | base64 |  tr -d "\n" # Add output to alluxio-master.yaml as the value for key 'license.json'
$ cp alluxio-master.yaml.template alluxio-master.yaml
$ cp alluxio-worker.yaml.template alluxio-worker.yaml

Once all the pre-requisites and configuration have been setup, deploy Alluxio.

$ kubectl create -f alluxio-master.yaml
$ kubectl create -f alluxio-worker.yaml

Verify status of the Alluxio deployment.

$ kubectl get pods

If using peristent volumes for Alluxio master, the status of the volume should change to CLAIMED.

$ kubectl get pv alluxio-journal-volume

Verify

Once ready, access the Alluxio CLI from the master pod and run basic I/O tests.

$ kubectl exec -ti alluxio-master-0 /bin/bash

From the master pod, execute the following:

$ cd /opt/alluxio
$ ./bin/alluxio runTests