Install Alluxio on Kubernetes

Slack Docker Pulls

This documentation shows how to install Alluxio (Dora) on Kubernetes via Helm, a kubernetes package manager, and Operator, a kubernetes extension for managing applications.

We recommend using the operator to deploy Alluxio on Kubernetes. However, if some required permissions are missing, consider using helm chart instead.

Prerequisites

  • A Kubernetes cluster with version at least 1.19, with feature gate enabled.
  • A copy of Alluxio Docker image and Alluxio Helm Chart, and cluster access to the Docker image. If using a private Docker registry, refer to the Kubernetes private image registry documentation.
  • Ensure the cluster’s Kubernetes Network Policy allows for connectivity between applications (Alluxio clients) and the Alluxio Pods on the defined ports.
  • The control plane of the Kubernetes cluster has helm 3 with version at least 3.6.0 installed.

Helm

Deploy Alluxio

Following the steps below to deploy Dora on Kubernetes:

1. Download Helm Chart

Download the Helm chart here and enter the helm chart directory.

2. Configure Persistent Volumes

Configure Persistent Volumes for:

  1. (Optional) Embedded journal. HostPath is also supported for journal storage.
  2. (Optional) Worker page store. HostPath is also supported for worker storage.
  3. (Optional) Worker metastore. Only required if you use RocksDB for storing metadata on workers.

Here is an example of a persistent volume of type hostPath for Alluxio embedded journal:

kind: PersistentVolume
apiVersion: v1
metadata:
  name: alluxio-journal-0
  labels:
    type: local
spec:
  storageClassName: standard
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: /tmp/alluxio-journal-0

Note:

  • If using hostPath as volume for embedded journal, Alluxio will run an init container as root to grant RWX permission of the path for itself.
  • Each journal volume requires at least the storage of its corresponding persistentVolumeClaim, configurable through the configuration file which will be talked in step 3.
  • If using local hostPath persistent volume, make sure the user of UID 1000 and GID 1000 has RWX permission.
    • Alluxio containers run as user alluxio of group alluxio with UID 1000 and GID 1000 by default.

3. Prepare Configuration File

Prepare a configuration file config.yaml. All configurable properties can be found in file values.yaml from the code downloaded in step 1.

You MUST specify your dataset configurations to enable Dora in your config.yaml. More specifically, the following section:

## Dataset ##

dataset:
  # The path of the dataset. For example, s3://my-bucket/dataset
  path:
  # Any credentials for Alluxio to access the dataset. For example,
  # credentials:
  #   aws.accessKeyId: XXXXX
  #   aws.secretKey: xxxx
  credentials:

4. Install Dora Cluster

Install Dora cluster by running

$ helm install dora -f config.yaml .

Wait until the cluster is ready. You can check pod status and container readiness by running

$ kubectl get po

Uninstall

Uninstall Dora cluster as follows:

$ helm delete dora

Metrics

See Metrics On Kubernetes for information on how to configure and get metrics of different metrics sinks from Alluxio deployed on Kubernetes.