Install Alluxio on Kubernetes
This documentation shows how to install Alluxio (Dora) on Kubernetes via Helm, a kubernetes package manager, and Operator, a kubernetes extension for managing applications.
We recommend using the operator to deploy Alluxio on Kubernetes. However, if some required permissions are missing, consider using helm chart instead.
Prerequisites
- A Kubernetes cluster with version at least 1.19, with feature gate enabled.
- Cluster access to an Alluxio Docker image alluxio/alluxio. If using a private Docker registry, refer to the Kubernetes private image registry documentation.
- Ensure the cluster’s Kubernetes Network Policy allows for connectivity between applications (Alluxio clients) and the Alluxio Pods on the defined ports.
- The control plane of the Kubernetes cluster has helm 3 with version at least 3.6.0 installed.
Operator
Extra prerequisites for Operator
You will need certain RBAC permission in the Kubernetes cluster to make Operator to work.
- Permission to create CRD (Custom Resource Definition);
- Permission to create ServiceAccount, ClusterRole, and ClusterRoleBinding for the operator pods;
- Permission to create namespace that the operator will be in.
Deploy Alluxio Kubernetes Operator
We use the Helm Chart for Alluxio K8s Operator for deploying. Following the steps below to deploy Alluxio Operator:
1. Download Alluxio Kubernetes Operator
Download the Alluxio Kubernetes Operator here and enter the root directory of the project.
2. Install Operator
Install the operator by running:
$ helm install operator ./deploy/charts/alluxio-operator
Operator will automatically create namespace alluxio-operator
and install
all the components there.
3. Run Operator
Make sure the operator is running as expected:
$ kubectl get pods -n alluxio-operator
Deploy Dataset
1. Create Dataset Configuration
Create a dataset configuration dataset.yaml
. Its apiVersion
must be
k8s-operator.alluxio.com/v1alpha1
and kind
must be Dataset
. Here is an example:
apiVersion: k8s-operator.alluxio.com/v1alpha1
kind: Dataset
metadata:
name: my-dataset
spec:
dataset:
path: <path of your dataset>
credentials:
- <property 1 for accessing your dataset>
- <property 2 for accessing your dataset>
- ...
2. Deploy Dataset
Deploy your dataset by running
$ kubectl create -f dataset.yaml
3. Check Status of Dataset
Check the status of the dataset by running
$ kubectl get dataset <dataset-name>
Deploy Alluxio
1. Configure Persistent Volumes
Configure Persistent Volumes for:
- (Optional) Embedded journal. HostPath is also supported for embedded journal.
- (Optional) Worker page store. HostPath is also supported for worker storage.
- (Optional) Worker metastore. Only required if you use RocksDB for storing metadata on workers.
Here is an example of a persistent volume of type hostPath for Alluxio embedded journal:
kind: PersistentVolume
apiVersion: v1
metadata:
name: alluxio-journal-0
labels:
type: local
spec:
storageClassName: standard
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
hostPath:
path: /tmp/alluxio-journal-0
Note:
- If using hostPath as volume for embedded journal, Alluxio will run an init container as root to grant RWX permission of the path for itself.
- Each journal volume should have capacity at least requested by its corresponding persistentVolumeClaim, configurable through the configuration file which will be talked in step 2.
- If using local hostPath persistent volume, make sure user alluxio has RWX permission.
- Alluxio containers run as user
alluxio
of groupalluxio
with UID 1000 and GID 1000 by default.
- Alluxio containers run as user
2. Prepare Resource Configuration File
Prepare a resource configuration file alluxio-config.yaml
. Its apiVersion
must be
k8s-operator.alluxio.com/v1alpha1
and kind
must be AlluxioCluster
. Here is an example:
apiVersion: k8s-operator.alluxio.com/v1alpha1
kind: AlluxioCluster
metadata:
name: my-alluxio-cluster
spec:
<configurations>
Dataset
in which the name of your dataset is required in the spec
section.
All other configurable properties in the spec
section can be found in deploy/charts/alluxio/values.yaml
.
3. Deploy Alluxio Cluster
Deploy Alluxio cluster by running:
$ kubectl create -f alluxio-config.yaml
4. Check Status of Alluxio Cluster
Check the status of Alluxio cluster by running:
$ kubectl get alluxiocluster <alluxio-cluster-name>
Uninstall Dataset + Alluxio
Run the following command to uninstall Dataset and Alluxio cluster:
$ kubectl delete dataset <dataset-name>
$ kubectl delete alluxiocluster <alluxio-cluster-name>
[Bonus] Load the data into Alluxio
To load your data into Alluxio cluster, so that your application can read the data faster, create a
resource file load.yaml
. Here is an example:
apiVersion: k8s-operator.alluxio.com/v1alpha1
kind: Load
metadata:
name: my-load
spec:
dataset: <dataset-name>
path: /
Then run the following command to start the load:
$ kubectl create -f load.yaml
To check the status of the load:
$ kubectl get load
[Bonus] Example Alluxio cluster configuration for AI/ML use case
apiVersion: k8s-operator.alluxio.com/v1alpha1
kind: AlluxioCluster
metadata:
name: my-alluxio-cluster
spec:
worker:
count: 4
pagestore:
type: hostPath
quota: 512Gi
hostPath: /mnt/alluxio
csi:
enabled: true
Helm
Deploy Alluxio
Following the steps below to deploy Dora on Kubernetes:
1. Download Helm Chart
Download the Helm chart here and enter the helm chart directory.
2. Configure Persistent Volumes
Configure Persistent Volumes for:
- (Optional) Embedded journal. HostPath is also supported for journal storage.
- (Optional) Worker page store. HostPath is also supported for worker storage.
- (Optional) Worker metastore. Only required if you use RocksDB for storing metadata on workers.
Here is an example of a persistent volume of type hostPath for Alluxio embedded journal:
kind: PersistentVolume
apiVersion: v1
metadata:
name: alluxio-journal-0
labels:
type: local
spec:
storageClassName: standard
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
hostPath:
path: /tmp/alluxio-journal-0
Note:
- If using hostPath as volume for embedded journal, Alluxio will run an init container as root to grant RWX permission of the path for itself.
- Each journal volume requires at least the storage of its corresponding persistentVolumeClaim, configurable through the configuration file which will be talked in step 3.
- If using local hostPath persistent volume, make sure the user of UID 1000 and GID 1000 has RWX permission.
- Alluxio containers run as user
alluxio
of groupalluxio
with UID 1000 and GID 1000 by default.
- Alluxio containers run as user
3. Prepare Configuration File
Prepare a configuration file config.yaml
.
All configurable properties can be found in file values.yaml
from the code downloaded in step 1.
You MUST specify your dataset configurations to enable Dora in your config.yaml
.
More specifically, the following section:
## Dataset ##
dataset:
# The path of the dataset. For example, s3://my-bucket/dataset
path:
# Any credentials for Alluxio to access the dataset. For example,
# credentials:
# aws.accessKeyId: XXXXX
# aws.secretKey: xxxx
credentials:
4. Install Dora Cluster
Install Dora cluster by running
$ helm install dora -f config.yaml .
Wait until the cluster is ready. You can check pod status and container readiness by running
$ kubectl get po
Uninstall
Uninstall Dora cluster as follows:
$ helm delete dora
Metrics
See Metrics On Kubernetes for information on how to configure and get metrics of different metrics sinks from Alluxio deployed on Kubernetes.