Command Line Interface
Alluxio’s command line interface provides users with basic file system operations. You can invoke the following command line utility to get all the subcommands:
$ ./bin/alluxio fs Usage: alluxio fs [generic options] [cat <path>] [checkConsistency [-r] <Alluxio path>] ...
fs subcommands that take Alluxio URIs as argument (e.g.
mkdir), the argument should
be either a complete Alluxio URI
without header provided to use the default hostname and port set in the
All “path” variables in fs commands should start with:
alluxio://<master node address>:<master node port>/<path>
Or, if no header is provided, the default hostname and port (set in the env file) will be used.
Most of the commands which require path components allow wildcard arguments for ease of use. For example:
$ ./bin/alluxio fs rm /data/2014*
The example command would delete anything in the
data directory with a prefix of
Note that some shells will attempt to glob the input paths, causing strange errors such as:
rm takes 1 arguments, not 21
(NOTE: The number 21 could be different and comes from the number of matching files in your local filesystem.)
As a work around, you can disable globbing (depending on your shell, for example
set -f) or by
escaping wildcards, for example:
$ ./bin/alluxio fs cat /\\*
Note the double escape, this is because the shell script will eventually call a Java program which
should have the final escaped parameters (
List of Operations
|cat||cat "path"||Print the content of the file to the console.|
|checksum||checksum "path"||Calculate the md5 checksum for a file.|
|checkConsistency||checkConsistency "path"||Check the metadata consistency between Alluxio and the under storage.|
|chgrp||chgrp "group" "path"||Change the group of the directory or file.|
|chmod||chmod "permission" "path"||Change the permission of the directory or file.|
|chown||chown "owner" "path"||Change the owner of the directory or file.|
|copyFromLocal||copyFromLocal "source path" "remote path"||Copy the specified file specified by "source path" to the path specified by "remote path". This command will fail if "remote path" already exists.|
|copyToLocal||copyToLocal "remote path" "local path"||Copy the specified file from the path specified by "remote path" to a local destination.|
|count||count "path"||Display the number of folders and files matching the specified prefix in "path".|
|cp||cp "src" "dst"||Copy a file or directory within the Alluxio file system.|
|distributedLoad||distributedLoad "path"||More efficient version of the `load` command. It can load multiple HDFS blocks concurrently to different Alluxio workers. In addition, the user can specify how many replicas to create in Alluxio.|
|distributedMv||distributedMv "src" "dst"||More efficient version of the `mv` command. It can move multiple files concurrently and it can move file across mount points.|
|du||du "path"||Display the size of a file or a directory specified by the input path.|
|fileInfo||fileInfo "path"||Print the information of the blocks of a specified file.|
|free||free "path"||Free a file or all files under a directory from Alluxio. If the file/directory is also in under storage, it will still be available there.|
|getCapacityBytes||getCapacityBytes||Get the capacity of the AlluxioFS.|
|getUsedBytes||getUsedBytes||Get number of bytes used in the AlluxioFS.|
|jobLeader||jobLeader||Prints the current Alluxio leader job master host name.|
|leader||leader||Prints the current Alluxio leader master host name.|
|load||load "path"||Loads a file or directory into Alluxio space.|
|location||location "path"||Display a list of hosts that have the file data.|
|ls||ls "path"||List all the files and directories directly under the given path with information such as size.|
|mkdir||mkdir "path1" ... "pathn"||Create directory(ies) under the given paths, along with any necessary parent directories. Multiple paths separated by spaces or tabs. This command will fail if any of the given paths already exist.|
|mount||mount "path" "uri"||Mount the underlying file system path "uri" into the Alluxio namespace as "path". The "path" is assumed not to exist and is created by the operation. No data or metadata is loaded from under storage into Alluxio. After a path is mounted, operations on objects under the mounted path are mirror to the mounted under storage.|
|mv||mv "source" "destination"||Move a file or directory specified by "source" to a new location "destination". This command will fail if "destination" already exists.|
|persist||persist "path1" ... "pathn"||Persist files or directories currently stored only in Alluxio to the underlying file system.|
|pin||pin "path"||Pin the given file to avoid evicting it from memory. If the given path is a directory, it recursively pins all the files contained and any new files created within this directory.|
|rm||rm "path"||Remove a file. This command will fail if the given path is a directory rather than a file.|
|setReplication||setReplication -max "number" -min "number" "path"||Set the minimum and / or maximum replication for an Alluxio file or directory.|
|setTtl||setTtl "path" "time"||Set the TTL (time to live) in milliseconds to a file.|
|tail||tail "path"||Print the last 1KB of the specified file to the console.|
|touch||touch "path"||Create a 0-byte file at the specified location.|
|unmount||unmount "path"||Unmount the underlying file system path mounted in the Alluxio namespace as "path". Alluxio objects under "path" are removed from Alluxio, but they still exist in the previously mounted under storage.|
|unpin||unpin "path"||Unpin the given file to allow Alluxio to evict this file again. If the given path is a directory, it recursively unpins all files contained and any new files created within this directory.|
|unsetTtl||unsetTtl "path"||Remove the TTL (time to live) setting from a file.|
Example Use Cases
cat command prints the entire contents of a file in Alluxio to the console. This can be
useful for verifying the file is what the user expects. If you wish to copy the file to your local
copyToLocal should be used.
For example, when trying out a new computation job,
cat can be used as a quick way to check the
$ ./bin/alluxio fs cat /output/part-00000
checkConsistency command compares Alluxio and under storage metadata for a given path. If the
path is a directory, the entire subtree will be compared. The command returns a message listing each
inconsistent file or directory. The system administrator should reconcile the differences of these
files at their discretion. To avoid metadata inconsistencies between Alluxio and under storages,
design your systems to modify files and directories through the Alluxio and avoid directly modifying
state in the underlying storage.
NOTE: This command requires a read lock on the subtree being checked, meaning writes and updates to files or directories in the subtree cannot be completed until this command completes.
checkConsistency can be used to periodically validate the integrity of the namespace.
$ ./bin/alluxio fs checkConsistency /
checksum command outputs the md5 value of a file in Alluxio.
checksum can be used to verify the content of a file stored in Alluxio
matches the content stored in an UnderFS or local filesystem:
$ ./bin/alluxio fs checksum /alluxio-site.properties.template md5sum: c548cec3cf4c6034f271c2753dc1daa8 md5 conf/alluxio-site.properties.template MD5 (conf/alluxio-site.properties.template) = c548cec3cf4c6034f271c2753dc1daa8
chgrp command changes the group of the file or directory in Alluxio. Alluxio supports file
authorization with POSIX file permission. Group is an authorizable entity in POSIX file permission
model. The file owner or super-user can execute this command to change the group of the file or
-R option also changes the group of child file and child directory recursively.
chgrp can be used as a quick way to change the group of file:
$ ./bin/alluxio fs chgrp alluxio-group-new /input/file1
chmod command changes the permission of file or directory in Alluxio. Currently octal mode
is supported: the numerical format accepts three octal digits which refer to permissions for the
file owner, the group and other users. Here is number-permission mapping table:
-R option also changes the permission of child file and child directory recursively.
|7||read, write and execute||rwx|
|6||read and write||rw-|
|5||read and execute||r-x|
|3||write and execute||-wx|
chmod can be used as a quick way to change the permission of file:
$ ./bin/alluxio fs chmod 755 /input/file1
chown command changes the owner of the file or directory in Alluxio. For security
reasons, the ownership of a file can only be altered by a super-user or its owner.
-R option also changes the owner of child file and child directory recursively.
chown can be used as a quick way to change the owner of file:
$ ./bin/alluxio fs chown alluxio-user /input/file1
copyFromLocal command copies the contents of a file in your local file system into Alluxio.
If the node you run the command from has an Alluxio worker, the data will be available on that
worker. Otherwise, the data will be placed in a random remote node running an Alluxio worker. If a
directory is specified, the directory and all its contents will be uploaded recursively.
copyFromLocal can be used as a quick way to inject data into the system for
$ ./bin/alluxio fs copyFromLocal /local/data /input
copyToLocal command copies the contents of a file in Alluxio to a file in your local file
system. If a directory is specified, the directory and all its contents will be downloaded
copyToLocal can be used as a quick way to download output data for additional
investigation or debugging.
$ ./bin/alluxio fs copyToLocal /output/part-00000 part-00000 wc -l part-00000
count command outputs the number of files and folders matching a prefix as well as the
total size of the files.
count works recursively and accounts for any nested directories and
count is best utilized when the user has some predefined naming conventions for their
For example, if data files are stored by their date,
count can be used to determine the number of
data files and their total size for any date, month, or year.
$ ./bin/alluxio fs count /data/2014
cp command copies a file or directory in the Alluxio filesystem.
-R option is used and the source designates a directory, cp copies the entire subtree at
source to the destination.
cp can be used to copy files between Under file systems.
$ ./bin/alluxio fs cp /hdfs/file1 /s3/
distributedLoad command loads a file or directory into Alluxio memory. This will behave similarly to the regular
load command, except in the case where the file’s blocks are distributed in the under storage system. In this case, the blocks will be loaded to Alluxio in parallel, preferring to load blocks to local Alluxio workers.
For example, if an HDFS file has three blocks
C, and the blocks are on
host3 respectively, distributedLoad will load
A to the Alluxio worker on
B to the Alluxio worker on
C to the Alluxio worker on
host3. These loads will happen in parallel.
$ ./bin/alluxio fs distributedLoad /hdfs/file1
distributedMv command moves a file or directory from one location to another. Unlike regular
mv, this move may go across mount points, performing a copy and delete between two systems. When moving a directory across mount points, the file copies will be parallelized across the Alluxio workers.
distributedMv can be used to move a directory from HDFS to S3.
$ ./bin/alluxio fs distributedMv /hdfs/dir /s3
du command outputs the size of a file. If a directory is specified, it will output the
aggregate size of all files in the directory and its children directories.
For example, if the Alluxio space is unexpectedly over utilized,
du can be used to detect
which folders are taking up the most space.
$ ./bin/alluxio fs du /\\*
free command sends a request to the master to evict all blocks of a file from the Alluxio
workers. If the argument to
free is a directory, it will recursively
free all files. This
request is not guaranteed to take effect immediately, as readers may be currently using the blocks
of the file.
free will return immediately after the request is acknowledged by the master. Note
that, files must be persisted already in under storage before being freed, or the
free command will fail;
also any pinned files cannot be freed unless
-f option is specified. The
does not delete any data from the under storage system, but only removing the blocks of those files in
Alluxio space to reclaim space. In addition, metadata will not be affected by this operation, meaning the freed file
will still show up if an
ls command is run.
free can be used to manually manage Alluxio’s data caching.
$ ./bin/alluxio fs free /unused/data
getCapacityBytes command returns the maximum number of bytes Alluxio is configured to store.
getCapacityBytes can be used to verify if your cluster is set up as expected.
$ ./bin/alluxio fs getCapacityBytes
getSyncPathList command returns a list of sync points that are currently actively synced between the UFS and the Alluxio namespace.
getSyncPathList can be used to verify that the sync points have been activated successfully.
$ ./bin/alluxio getSyncPathList
getUsedBytes command returns the number of used bytes in Alluxio.
getUsedBytes can be used to monitor the health of your cluster.
$ ./bin/alluxio fs getUsedBytes
help command prints the help message for a given
fs subcommand. If there isn’t given
command, prints help messages for all supported subcommands.
# Print all subcommands $ ./bin/alluxio fs help # # Print help message for ls $ ./bin/alluxio fs help ls
leader command prints the current Alluxio leader master host name.
$ ./bin/alluxio fs leader
load command moves data from the under storage system into Alluxio storage. If there is a
Alluxio worker on the machine this command is run from, the data will be loaded to that worker.
Otherwise, a random worker will be selected to store the data. Load will be a no-op if the file
data is already present in Alluxio. If
load is run on a directory, files in the directory
will be loaded recursively.
load can be used to prefetch data for analytics jobs.
$ ./bin/alluxio fs load /data/today
location command returns the addresses of all the Alluxio workers which contain blocks
belonging to the given file.
location can be used to debug data locality when running jobs using a compute
$ ./bin/alluxio fs location /data/2015/logs-1.txt
ls command lists all the immediate children in a directory and displays the file size, last
modification time, and in memory status of the files. Using
ls on a file will only display the
information for that specific file. When using
ls to list the contents of a persisted directory,
ls loads the metadata for immediate children of a directory.
-R option also recursively lists child directories, displaying the entire subtree starting
from the input path.
ls command will also load the metadata for any file or directory from the under storage system
to Alluxio namespace, if it does not exist in Alluxio yet.
ls queries the under storage system for
any file or directory matching the given path and then creates a mirror of the file in Alluxio backed
by that file. Only the metadata, such as the file name and size are loaded this way and no data
-f option forces loading metadata for immediate children in a directory. By default, it
loads metadata only at the first time at which a directory is listed.
ls can be used to browse the file system.
$ ./bin/alluxio fs mount /s3/data s3a://bucket/folder # Loads metadata for all immediate children of /s3/data and lists them. $ ./bin/alluxio fs ls /s3/data # # Forces loading metadata. $ aws s3 cp /tmp/somedata s3://bucket/folder/somedata $ ./bin/alluxio fs ls -f /s3/data # # Files are not removed from Alluxio if they are removed from the UFS (s3 here) only. $ aws s3 rm s3://bucket/folder/somedata $ ./bin/alluxio fs ls -f /s3/data
masterInfo command prints information regarding master fault tolerance such as leader address, list of master addresses, and the configured Zookeeper address. If Alluxio is running in single
masterInfo will print the master address. If Alluxio is running in fault tolerance mode,
the leader address, list of master addresses and the configured Zookeeper address will be printed.
masterInfo can be used to print information regarding master fault tolerance.
$ ./bin/alluxio fs masterInfo
mkdir command creates a new directory in Alluxio space (and its nonexistent parent
directories if needed). Note that the created directory will not be created in the under
storage system until a file in the directory is persisted to the underlying storage. Using
on an invalid or already existing path will fail.
mkdir can be used by an admin to set up work directories for different users.
$ ./bin/alluxio fs mkdir /users $ ./bin/alluxio fs mkdir /users/Alice $ ./bin/alluxio fs mkdir /users/Bob
mount command links an under storage path to an Alluxio path, and files and folders created
in Alluxio space under the path will be backed by a corresponding file or folder in the under
storage path. For more details, see Unified Namespace.
mount can be used to make data in another storage system available in Alluxio.
$ ./bin/alluxio fs mount /s3/data s3a://bucket/folder
mv command moves a file or directory to another path in Alluxio. The destination path must not
exist or be a directory. If it is a directory, the file or directory will be placed as a child of
mv is purely a metadata operation and does not affect the data blocks of the file.
mv cannot be done between mount points of different under storage systems.
mv can be used to move older data into a non working directory.
$ ./bin/alluxio fs mv /data/2014 /data/archives/2014
persist command persists data in Alluxio storage into the under storage system. This is a data
operation and will take time depending on how large the file is. After persist is complete, the file
in Alluxio will be backed by the file in the under storage, make it still valid if the Alluxio
blocks are evicted or otherwise lost.
persist can be used after filtering a series of temporary files for the ones
containing useful data.
$ ./bin/alluxio fs persist /tmp/experimental-logs-2.txt
pin command marks a file or folder as pinned in Alluxio. This is a metadata operation and will
not cause any data to be loaded into Alluxio. If a file is pinned, any blocks belonging to the file
will never be evicted from an Alluxio worker. If there are too many pinned files, Alluxio workers may
run low on storage space preventing other files from being cached.
pin can be used to manually ensure performance if the administrator understands the
$ ./bin/alluxio fs pin /data/today
rm command removes a file from Alluxio space and the under storage system. The file will be
unavailable immediately after this command returns, but the actual data may be deleted a while
-R option will delete all contents of the directory and then the directory itself.
rm can be used to remove temporary files which are no longer needed.
$ ./bin/alluxio fs rm /tmp/unused-file
setReplication command sets the max and/or min replication level of a file or all files under a directory recursively. This is a metadata operation and will not cause any replication to be created or removed immediately and the replication level of the target file or directory will be changed automatically. This command takes an argument of
-min to specify the minimal replication level and
-max for the maximal replication. Specify
-1 as the argument of
-max option to indicate no limit of the maximum number of replicas. If ‘path’ is a directory and ‘-R’ is specified, it will recursively set all files in this directory.
setReplication can be used to ensure the replication level of a file has at least one copy and at most three copies in Alluxio:
$ ./bin/alluxio fs setReplication -max 3 -min 1 /foo
setTtl command sets the time-to-live of a file or a directory, in milliseconds. If set ttl
to a directory, all the children inside that directory will set too. So a directory’s TTL expires,
all the children inside that directory will also expire. Action parameter will indicate the action
to perform once the current time is greater than the TTL + creation time of the file.
delete (default) will delete file or directory from both Alluxio and the under storage system,
free will just free the file from Alluxio even the file is pinned.
setTtl with action
delete can be used to clean up files the administrator knows are
unnecessary after a period of time, or with action
free just remove the contents from Alluxio to
free up space in Alluxio.
$ ./bin/alluxio fs setTtl /data/good-for-one-day 86400000
startSync command starts the active UFS sync process for a designated path in the Alluxio namespace. This adds a sync point to the list of active sync points and a background process will periodically poll the UFS for any changes under the sync point directory. If there are changes, the same process will synchronize the content between UFS and Alluxio namespace.
Currently, this feature is only supported when the UFS is HDFS 2.6 or above, since this feature relies on Inotify feature in HDFS which is only available after 2.6.
For example, the following adds
syncedDir as a sync point. Any changes in
syncedDir directory in the UFS will be actively synced to the Alluxio namespace.
$ ./bin/alluxio startSync /syncedDir
The details of the syncing process can be controlled by setting a number of configuration parameters.
alluxio.master.activesync.intervalcontrols the time interval the background.
alluxio.master.activesync.maxagecontrols the max number of intervals we will wait before syncing a particular changed file. The total staleness of the file should not exceed the product of these two configuration parameters.
alluxio.master.activesync.maxactivityWhile Alluxio is syncing in the background, it tries to start syncing the directories when the sync point is no longer actively written to.
alluxio.master.activesync.maxactivityis the maximum number of activities a sync point can have before we deem it too busy to be synced. It will still be synced if any files have reached the
stopSync command stops the active UFS sync process for a designated path in the Alluxio namespace. The directories may be out of sync after the command is finished.
stat command dumps the FileInfo representation of a file or a directory to the console. It is primarily intended to assist powerusers in debugging their system. Generally viewing the file info in the UI will be much easier to understand.
One can specify
-f <arg> to display info in given format:
- “%N”: name of the file;
- “%z”: size of file in bytes;
- “%u”: owner;
- “%g”: group name of owner;
- “%y” or “%Y”: modification time, %y shows ‘yyyy-MM-dd HH:mm:ss’ (the UTC date), %Y it shows milliseconds since January 1, 1970 UTC;
- “%b”: Number of blocks allocated for file
stat can be used to debug the block locations of a file. This is useful when trying to achieve locality for compute workloads.
# Displays file's stat $ ./bin/alluxio fs stat /data/2015/logs-1.txt # # Displays directory's stat $ ./bin/alluxio fs stat /data/2015 # # Displays the size of file $ ./bin/alluxio fs stat -f %z /data/2015/logs-1.txt
tail command outputs the last 1 KB of data in a file to the console.
tail can be used to verify the output of a job is in the expected format or contains
$ ./bin/alluxio fs tail /output/part-00000
test command tests a property of a path, returning 0 if the property is true, or 1
-d to test whether the path is a directory, Specify
to test whether the path is a file, Specify
-e to test whether the path
-s to test whether the directory is not empty, Specify
to test whether the file is zero length,
-doption tests whether path is a directory.
-eoption tests whether path exists.
-foption tests whether path is a file.
-soption tests whether path is not empty.
-zoption tests whether file is zero length.
$ ./bin/alluxio fs test -d /someDir $ echo $?
touch command creates a 0-byte file. Files created with
touch cannot be overwritten and are
mostly useful as flags.
touch can be used to create a file signifying the completion of analysis on a
$ ./bin/alluxio fs touch /data/yesterday/_DONE_
unmount command disassociates an Alluxio path with an under storage directory. Alluxio metadata
for the mount point will be removed along with any data blocks, but the under storage system will
retain all metadata and data. See Unified Namespace for
unmount can be used to remove an under storage system when the users no longer need
data from that system.
$ ./bin/alluxio fs unmount /s3/data
unpin command unmarks a file or directory in Alluxio as pinned. This is a metadata operation
and will not evict or delete any data blocks. Once a file is unpinned, its data blocks can be
evicted from the various Alluxio workers containing the block.
unpin can be used when the administrator knows there is a change in the data access
$ ./bin/alluxio fs unpin /data/yesterday/join-table
unsetTtl command will remove the TTL of a file in Alluxio. This is a metadata operation and
will not evict or store blocks in Alluxio. The TTL of a file can later be reset with
unsetTtl can be used if a regularly managed file requires manual management due to
some special case.
$ ./bin/alluxio fs unsetTtl /data/yesterday/data-not-yet-analyzed