Quick Start Guide
This quick start guide goes over how to run Alluxio on a local machine. The guide will cover the following tasks:
- Download and configure Alluxio
- Start Alluxio locally
- Perform basic tasks via Alluxio Shell
- [Bonus] Mount a public Amazon S3 bucket in Alluxio
- [Bonus] Mount HDFS under storage in Alluxio
- Stop Alluxio
This guide contains optional tasks labeled with [Bonus] that use credentials from an AWS account with an access key id and secret access key.
▶️ Get Alluxio Up & Running in Less Than 3 Min! (2:36)
Note: This guide is designed to start an Alluxio system with minimal setup on a single machine. If you are trying to speedup SQL analytics, you can try the Presto Alluxio Getting Started tutorial.
Prerequisites
- MacOS or Linux
- Java 8
- Enable remote login: see instructions for MacOS users
- [Bonus] AWS account and keys
Downloading Alluxio
Download Alluxio from this page. Select the desired release followed by the distribution built for default Hadoop. Unpack the downloaded file with the following commands.
$ tar -xzf alluxio-314-SNAPSHOT-bin.tar.gz
$ cd alluxio-314-SNAPSHOT
This creates a directory alluxio-314-SNAPSHOT
with all of the Alluxio
source files and Java binaries. Through this tutorial, the path of this directory will be referred
to as ${ALLUXIO_HOME}
.
Configuring Alluxio
In the ${ALLUXIO_HOME}/conf
directory, create the conf/alluxio-env.sh
configuration
file by copying the template file.
$ cp conf/alluxio-env.sh.template conf/alluxio-env.sh
In conf/alluxio-env.sh
, adds configuration for JAVA_HOME
. For example:
$ echo "JAVA_HOME=/path/to/java/home" >> conf/alluxio-env.sh
In the ${ALLUXIO_HOME}/conf
directory, create the conf/alluxio-site.properties
configuration
file by copying the template file.
$ cp conf/alluxio-site.properties.template conf/alluxio-site.properties
Set alluxio.master.hostname
in conf/alluxio-site.properties
to localhost
.
$ echo "alluxio.master.hostname=localhost" >> conf/alluxio-site.properties
Set additional parameters in conf/alluxio-site.properties
$ echo "alluxio.dora.client.read.location.policy.enabled=true" >> conf/alluxio-site.properties
$ echo "alluxio.user.short.circuit.enabled=false" >> conf/alluxio-site.properties
$ echo "alluxio.master.worker.register.lease.enabled=false" >> conf/alluxio-site.properties
$ echo "alluxio.worker.block.store.type=PAGE" >> conf/alluxio-site.properties
$ echo "alluxio.worker.page.store.type=LOCAL" >> conf/alluxio-site.properties
$ echo "alluxio.worker.page.store.sizes=1GB" >> conf/alluxio-site.properties
$ echo "alluxio.worker.page.store.page.size=1MB" >> conf/alluxio-site.properties
Set the page store directories to an existing directory which the current user has read/write permissions to.
The following uses /mnt/ramdisk
as an example.
$ echo "alluxio.worker.page.store.dirs=/mnt/ramdisk" >> conf/alluxio-site.properties
The paging cache storage guide has more information about how to configure page block store.
Configure Alluxio ufs:
$ echo "alluxio.dora.client.ufs.root=/tmp" >> conf/alluxio-site.properties
<UFS_URI>
should be a full ufs uri. This can be set to a local folder (e.g. default value /tmp
)
in a single node deployment or a full ufs uri (e.g.hdfs://namenode:port/path/
or s3://bucket/path
).
[Bonus] Configuration for AWS
To configure Alluxio to interact with Amazon S3, add AWS access information to the Alluxio configuration in conf/alluxio-site.properties
.
$ echo "alluxio.dora.client.ufs.root=s3://<BUCKET_NAME>/<DIR>" >> conf/alluxio-site.properties
$ echo "s3a.accessKeyId=<AWS_ACCESS_KEY_ID>" >> conf/alluxio-site.properties
$ echo "s3a.secretKey=<AWS_SECRET_ACCESS_KEY>" >> conf/alluxio-site.properties
Replace s3://<BUCKET_NAME>/<DIR>
, <AWS_ACCESS_KEY_ID>
and <AWS_SECRET_ACCESS_KEY>
with
a valid AWS S3 address, AWS access key ID and AWS secret access key respectively.
For more information, please refer to the S3 configuration docs.
[Bonus] Configuration for HDFS
To configure Alluxio to interact with HDFS, provide the path to HDFS configuration files available locally on each node in conf/alluxio-site.properties
.
$ echo "alluxio.dora.client.ufs.root=hdfs://nameservice/<DIR>" >> conf/alluxio-site.properties
$ echo "alluxio.underfs.hdfs.configuration=/path/to/hdfs/conf/core-site.xml:/path/to/hdfs/conf/hdfs-site.xml" >> conf/alluxio-site.properties
Replace nameservice/<DIR>
and /path/to/hdfs/conf
with the actual values.
For more information, please refer to the HDFS configuration docs.
Starting Alluxio
Alluxio needs to be formatted before starting the process. The following command formats the Alluxio journal and worker storage directories.
$ ./bin/alluxio init format
Start the Alluxio services
$ ./bin/alluxio process start local
Congratulations! Alluxio is now up and running!
Using the Alluxio Shell
The Alluxio shell provides command line operations for interacting with Alluxio. To see a list of filesystem operations, run
$ ./bin/alluxio fs
List files in Alluxio with the ls
command. To list all files in the root directory, use the
following command:
$ ./bin/alluxio fs ls /
At this moment, there are no files in Alluxio. Copy a file into Alluxio by using the
copyFromLocal
shell command.
$ ./bin/alluxio fs copyFromLocal ${ALLUXIO_HOME}/LICENSE /LICENSE
Copied file://${ALLUXIO_HOME}/LICENSE to /LICENSE
List the files in Alluxio again to see the LICENSE
file.
$ ./bin/alluxio fs ls /
-rw-r--r-- staff staff 27040 02-17-2021 16:21:11:061 0% /LICENSE
The output shows the file has been written to Alluxio under storage successfully.
Check the directory set as the value of alluxio.dora.client.ufs.root
, which is /tmp
by default.
$ ls /tmp
LICENSE
The cat
command prints the contents of the file.
$ ./bin/alluxio fs cat /LICENSE
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
...
When the file is read, it will also be cached by Alluxio to speed up future data access.
Stopping Alluxio
Stop Alluxio with the following command:
$ ./bin/alluxio process stop local
Next Steps
Congratulations on getting Alluxio started! This guide covered how to download and install Alluxio locally with examples of basic interactions via the Alluxio shell.
There are several next steps available:
- Learn more about the various features of Alluxio in our documentation, such as Data Caching and Metadata Caching.
- See how you can Install an Alluxio Cluster with High Availability (HA)
- You can also Install Alluxio on Kubernetes with Alluxio K8s Helm Chart or Alluxio K8s Operator
- Connect a compute engine such as Presto, Trino, or Apache Spark
- Connect an under file storage such as Amazon AWS S3, HDFS, or Google Cloud Storage
- Check out our Contribution Guide if you’re interested in becoming a contributor!
FAQ
Why do I keep getting “Operation not permitted” for ssh and alluxio?
For the users who are using macOS 11(Big Sur) or later, when running the command
$ ./bin/alluxio init format
you might get the error message:
alluxio-314-SNAPSHOT/bin/alluxio: Operation not permitted
This can be caused by the newly added setting options to macOS.
To fix it, open System Preferences
and open Sharing
.
On the left, check the box next to Remote Login
. If there is Allow full access to remote users
as shown in the
image, check the box next to it. Besides, click the +
button and add yourself to the list of users that are allowed
for Remote Login if you are not already in it.
Tuning
Optional Dora Server-side Metadata Cache
By default, Dora worker caches metadata and data.
Set alluxio.dora.client.metadata.cache.enabled
to false
to disable the metadata cache.
If disabled, client will always fetch metadata from under storage directly.
High performance data transmission over Netty
Set alluxio.user.netty.data.transmission.enabled
to true
to enable transmission of data between clients and
Dora cache nodes over Netty. This avoids serialization and deserialization cost of gRPC, as well as consumes less
resources on the worker side.
Known limitations
- Only one UFS is supported by Dora. Nested mounts are not supported yet.
- The Alluxio Master node still needs to be up and running. It is used for Dora worker discovery, cluster configuration updates, as well as handling write I/O operations.