For those that are new to Alluxio, this guide is a good place to start. For additional installation methods, visit our documentation on installing Alluxio Deployment.
We will install Alluxio locally and once we have installed Alluxio, we will run through some basic cluster operations.
- Verify Prerequisites.
- Install Alluxio locally.
- Perform basic tasks via Alluxio Shell.
- Mount a public Amazon S3 bucket in Alluxio.
- Accelerate data access.
- Stop Alluxio.
Alluxio components have specific requirements which you must meet before proceeding.
Install Alluxio Locally
Alluxio creates distributed filesystem across one or more machines which consitute your Alluxio cluster. For this introduction, we’ll install Alluxio locally. The Alluxio components will all be installed on your one machine, and the filesystem will be ‘distributed’ across local storage only. Follow the step by step instructions for Alluxio Deployment. Once Alluxio is installed continue with steps below.
To verify at the terminal, run
jps, processes including ‘AlluxioMaster’, ‘AlluxioWorker’, ‘AlluxioProxy’, ‘AlluxioJobMaster’, and ‘AlluxioJobWorker’ should exist.
Using the Alluxio Shell
Now that Alluxio is running, we can examine the Alluxio filesystem from the command line with the Alluxio shell. In this section we’ll cover basic file system operations including how to copy files into Alluxio and persist them to under storage.
- Change directory to the Alluxio install directory by running
- You can invoke the Alluxio shell by running
./bin/alluxio fs, which will list all of the available command-line operations.
- Let’s list all the files in Alluxio.
$ ./bin/alluxio fs ls /
- Unfortunately, we don’t have any files in Alluxio. We can solve that by copying a file into Alluxio using
$ ./bin/alluxio fs copyFromLocal conf/alluxio-site.properties.template /alluxio-site.properties.template Copied conf/alluxio-site.properties.template to /alluxio-site.properties.template
- After copying the license file, we should be able to see it in Alluxio. List the files in
Alluxio again with
ls. The output shows the file that exists in Alluxio, as well as some other useful information, like the size of the file, the date it was created, and the in-memory status of the file.
$ ./bin/alluxio fs ls / -rw-r--r-- <owner> <group> 1229 NOT_PERSISTED 09-27-2017 10:05:07:412 100% /alluxio-site.properties.template
- You can also view the contents of the file using the
$ ./bin/alluxio fs cat /alluxio-site.properties.template ...
- With the default configuration, Alluxio uses the local file system as its under storage (US). The
default path for the US is
./under-storage. We can see what’s in the US as follows:
$ ls ./under-storage/
- The directory is empty. By default, Alluxio will write data only into
Alluxio space, not to the US. We can tell Alluxio to persist the file from Alluxio space to the US using the shell command
$ ./bin/alluxio fs persist /alluxio-site.properties.template persisted file /alluxio-site.properties.template with size 1193
- Now, if we examine the US again, the file should appear.
$ ls ./under-storage alluxio-site.properties.template
Exploring the Web UI
Alluxio has a user-friendly web interface enabling users to watch and manage the system. The master and workers all serve their own web UI. The default port for the web interface is 19999 for the master and 30000 for the workers.
If we browse the Alluxio file system in the master’s web UI we can see the file we copied earlier, as well as other useful information. Notice the ‘persistence state’ column shows the file is persisted.
Mount a Storage System
Alluxio unifies access to different storage systems with the unified namespace feature, which enables users to mount different storage systems into the Alluxio namespace and access the files across those systems seamlessly.
- Create a directory in Alluxio to store your mount points.
$ ./bin/alluxio fs mkdir /mnt Successfully created directory /mnt
NOTE: The rest of this example requires Amazon AWS account credentials
You will need to provide credentials to access the
alluxio-quick-startbucket. Set Alluxio properties
conf/alluxio-site.propertiesand restart Alluxio.
- Mount an existing sample S3 bucket to Alluxio. We have provided a sample S3 bucket for
you to use in this guide.
$ ./bin/alluxio fs mount -readonly alluxio://localhost:19998/mnt/s3 s3a://alluxio-quick-start/data Mounted s3a://alluxio-quick-start/data at alluxio://localhost:19998/mnt/s3
- Now the S3 bucket is mounted into the Alluxio namespace. We can list the files from S3, through the Alluxio namespace using the familiar
$ ./bin/alluxio fs ls -h /mnt/s3 -r-x------ <owner> <group> 933.21KB PERSISTED 09-27-2017 11:34:20:072 0% /mnt/s3/sample_tweets_1m.csv -r-x------ <owner> <group> 9.61MB PERSISTED 09-27-2017 11:34:20:076 0% /mnt/s3/sample_tweets_10m.csv -r-x------ <owner> <group> 87.86KB PERSISTED 09-27-2017 11:34:20:076 0% /mnt/s3/sample_tweets_100k.csv -r-x------ <owner> <group> 149.77MB PERSISTED 09-27-2017 11:34:20:077 0% /mnt/s3/sample_tweets_150m.csv
- With Alluxio’s unified namespace, you can interact with data from different storage systems
seamlessly. For example, with the
lsshell command, you can recursively list all the files that exist under a directory. The following output shows all the files under the root of the Alluxio file system, from all of the mounted storage systems. The alluxio-site.properties.template file is in your local file system, while the files under /mnt/s3/ are in S3.
$ ./bin/alluxio fs ls -hR / -rw-r--r-- <owner> <group> 1229B NOT_PERSISTED 09-27-2017 10:05:07:412 100% /alluxio-site.properties.template dr-x------ <owner> <group> 1 PERSISTED 09-27-2017 11:34:20:072 DIR /mnt dr-x------ <owner> <group> 4 PERSISTED 09-27-2017 11:34:20:072 DIR /mnt/s3 -r-x------ <owner> <group> 933.21KB PERSISTED 09-27-2017 11:34:20:072 0% /mnt/s3/sample_tweets_1m.csv -r-x------ <owner> <group> 9.61MB PERSISTED 09-27-2017 11:34:20:076 0% /mnt/s3/sample_tweets_10m.csv -r-x------ <owner> <group> 87.86KB PERSISTED 09-27-2017 11:34:20:076 0% /mnt/s3/sample_tweets_100k.csv -r-x------ <owner> <group> 149.77MB PERSISTED 09-27-2017 11:34:20:077 0% /mnt/s3/sample_tweets_150m.csv
- You can see the newly mounted files and directories in the Alluxio web UI as well.
Accelerating Data Access
Alluxio leverages memory to accelerate data access. This exercise is designed so you can experience this acceleration first hand.
First, let’s take a look at the status of a file in Alluxio, mounted from S3.
$ ./bin/alluxio fs ls -h /mnt/s3/sample_tweets_150m.csv -r-x------ <owner> <group> 149.77MB PERSISTED 09-27-2017 11:34:20:077 0% /mnt/s3/sample_tweets_150m.csv
The output shows that the file is not in memory. This file is a sample of tweets. Let’s see how many tweets mention the word ‘kitten’.
$ time ./bin/alluxio fs cat /mnt/s3/sample_tweets_150m.csv | grep -c kitten 889 real 0m22.857s user 0m7.557s sys 0m1.181s
Now, let’s see how many tweets mention the word ‘puppy’.
$ time ./bin/alluxio fs cat /mnt/s3/sample_tweets_150m.csv | grep -c puppy 1553 real 0m25.998s user 0m6.828s sys 0m1.048s
As you can see, it takes a lot of time to access the data for each command. Alluxio can accelerate
access to this data by using memory to store the data. However, the
cat shell command does not
cache data in Alluxio memory. There is a separate shell command,
load, which tells
Alluxio to store the data in memory.
$ ./bin/alluxio fs load /mnt/s3/sample_tweets_150m.csv
After loading the file, check the status with the
ls command. The output shows that the file is now in memory. Now that the file is memory, reading the file should be much faster now.
$ ./bin/alluxio fs ls /mnt/s3/sample_tweets_150m.csv -r-x------ <owner> <group> 149.77MB PERSISTED 09-27-2017 11:34:20:077 100% /mnt/s3/sample_tweets_150m.csv
Let’s again count the number of tweets with the word ‘puppy’.
$ time ./bin/alluxio fs cat /mnt/s3/sample_tweets_150m.csv | grep -c puppy 1553 real 0m1.917s user 0m2.306s sys 0m0.243s
As you can see, reading the file was very fast, only a few seconds! And, since the data is in Alluxio memory, you can easily read the file again just as quickly. Let’s observe this by counting how many tweets mention the word ‘bunny’.
$ time ./bin/alluxio fs cat /mnt/s3/sample_tweets_150m.csv | grep -c bunny 907 real 0m1.983s user 0m2.362s sys 0m0.240s
Stop Your Cluster
Alluxio can be stopped and started at the cluster level. Stopping means that all Alluxio services on all nodes, in this case your local computer, will be stopped. All data will remain available after the cluster is restart so long as none of the nodes in the cluster were rebooted in the meantime.
$ ./bin/alluxio-stop.sh all
Congratulations on successfully installing Alluxio on your local computer and performing some basic operations!
There are several next steps available. You can learn more about the various key features of Alluxio. You can also deploy fault tolerant Alluxio on a cluster, transparently mount storage systems with the Alluxio unified namespace, or configure your applications to work with the Alluxio file system API.