Command Line Interface
Alluxio’s command line interface provides users with basic file system operations. You can invoke the command line utility using:
./bin/alluxio fs
All “path” variables in fs commands should start with
alluxio://<master node address>:<master node port>/<path>
Or, if no header is provided, the default hostname and port (set in the env file) will be used.
/<path>
Wildcard Input
Most of the commands which require path components allow wildcard arguments for ease of use. For example:
./bin/alluxio fs rm /data/2014*
The example command would delete anything in the data
directory with a prefix of 2014
.
Note that some shells will attempt to glob the input paths, causing strange errors (Note: the number 21 could be different and comes from the number of matching files in your local filesystem):
rm takes 1 arguments, not 21
As a work around, you can disable globbing (depending on your shell, for example set -f
) or by
escaping wildcards, for example:
./bin/alluxio fs cat /\\*
Note the double escape, this is because the shell script will eventually call a java program which should have the final escaped parameters (cat /\*).
List of Operations
Operation | Syntax | Description |
---|---|---|
cat | cat "path" | |
checkConsistency | checkConsistency "path" | |
checksum | checksum "path" | |
chgrp | chgrp "group" "path" | |
chmod | chmod "permission" "path" | |
chown | chown "owner" "path" | |
copyFromLocal | copyFromLocal "source path" "remote path" | |
copyToLocal | copyToLocal "remote path" "local path" | |
count | count "path" | |
cp | cp "src" "dst" | |
du | du "path" | |
fileInfo | fileInfo "path" | |
free | free "path" | |
getCapacityBytes | getCapacityBytes | |
getUsedBytes | getUsedBytes | |
leader | leader | |
load | load "path" | |
loadMetadata | loadMetadata "path" | |
location | location "path" | |
ls | ls "path" | |
mkdir | mkdir "path1" ... "pathn" | |
mount | mount "path" "uri" | |
mv | mv "source" "destination" | |
persist | persist "path1" ... "pathn" | |
pin | pin "path" | |
report | report "path" | |
rm | rm "path" | |
setTtl | setTtl "path" "time" | |
tail | tail "path" | |
touch | touch "path" | |
unmount | unmount "path" | |
unpin | unpin "path" | |
unsetTtl | unsetTtl "path" |
Example Use Cases
cat
The cat
command prints the entire contents of a file in Alluxio to the console. This can be
useful for verifying the file is what the user expects. If you wish to copy the file to your local
file system, copyToLocal
should be used.
For example, when trying out a new computation job, cat
can be used as a quick way to check the
output:
./bin/alluxio fs cat /output/part-00000
checkConsistency
The checkConsistency
command compares Alluxio and under storage metadata for a given path. If the
path is a directory, the entire subtree will be compared. The command returns a message listing each
inconsistent file or directory. The system administrator should reconcile the differences of these
files at their discretion. To avoid metadata inconsistencies between Alluxio and under storages,
design your systems to modify files and directories through the Alluxio and avoid directly modifying
state in the underlying storage.
NOTE: This command requires a read lock on the subtree being checked, meaning writes and updates to files or directories in the subtree cannot be completed until this command completes.
For example, checkConsistency
can be used to periodically validate the integrity of the namespace.
./bin/alluxio fs checkConsistency /
checksum
The checksum
command outputs the md5 value of a file in Alluxio.
For example, checksum
can be used to verify the content of a file stored in Alluxio
matches the content stored in an UnderFS or local filesystem:
./bin/alluxio fs checksum /LICENSE
md5sum: bf0513403ff54711966f39b058e059a3
md5 LICENSE
MD5 (LICENSE) = bf0513403ff54711966f39b058e059a3
chgrp
The chgrp
command changes the group of the file or directory in Alluxio. Alluxio supports file
authorization with Posix file permission. Group is an authorizable entity in Posix file permission
model. The file owner or super-user can execute this command to change the group of the file or
directory.
Adding -R
option also changes the group of child file and child directory recursively.
For example, chgrp
can be used as a quick way to change the group of file:
./bin/alluxio fs chgrp alluxio-group-new /input/file1
chmod
The chmod
command changes the permission of file or directory in Alluxio. Currently octal mode
is supported: the numerical format accepts three octal digits which refer to permissions for the
file owner, the group and other users. Here is number-permission mapping table:
Adding -R
option also changes the permission of child file and child directory recursively.
Number | Permission | rwx |
---|---|---|
7 | read, write and execute | rwx |
6 | read and write | rw- |
5 | read and execute | r-x |
4 | read only | r-- |
3 | write and execute | -wx |
2 | write only | -w- |
1 | execute only | --x |
0 | none | --- |
For example, chmod
can be used as a quick way to change the permission of file:
./bin/alluxio fs chmod 755 /input/file1
chown
The chown
command changes the owner of the file or directory in Alluxio. For obvious security
reasons, the ownership of a file can only be altered by a super-user.
Adding -R
option also changes the owner of child file and child directory recursively.
For example, chown
can be used as a quick way to change the owner of file:
./bin/alluxio fs chown alluxio-user /input/file1
copyFromLocal
The copyFromLocal
command copies the contents of a file in your local file system into Alluxio.
If the node you run the command from has an Alluxio worker, the data will be available on that
worker. Otherwise, the data will be copied to a random remote node running an Alluxio worker. If a
directory is specified, the directory and all its contents will be copied recursively.
For example, copyFromLocal
can be used as a quick way to inject data into the system for
processing:
./bin/alluxio fs copyFromLocal /local/data /input
copyToLocal
The copyToLocal
command copies the contents of a file in Alluxio to a file in your local file
system. If a directory is specified, the directory and all its contents will be downloaded
recurisvely.
For example, copyToLocal
can be used as a quick way to download output data for additional
investigation or debugging.
./bin/alluxio fs copyToLocal /output/part-00000 part-00000
wc -l part-00000
count
The count
command outputs the number of files and folders matching a prefix as well as the
total size of the files. count
works recursively and accounts for any nested directories and
files. count
is best utilized when the user has some predefined naming conventions for their
files.
For example, if data files are stored by their date, count
can be used to determine the number of
data files and their total size for any date, month, or year.
./bin/alluxio fs count /data/2014
cp
The cp
command copies a file or directory in the Alluxio filesystem.
If the -R
option is used and the source designates a directory, cp copies the entire subtree at
source to the destination.
For example, cp
can be used to copy files between Under file systems.
./bin/alluxio fs cp /hdfs/file1 /s3/
du
The du
command outputs the size of a file. If a directory is specified, it will output the
aggregate size of all files in the directory and its children directories.
For example, if the Alluxio space is unexpectedly over utilized, du
can be used to detect
which folders are taking up the most space.
./bin/alluxio fs du /\\*
fileInfo
The fileInfo
command dumps the FileInfo representation of a file to the console. It is primarily
intended to assist powerusers in debugging their system. Generally viewing the file info in the UI
will be much easier to understand.
For example, fileInfo
can be used to debug the block locations of a file. This is useful when
trying to achieve locality for compute workloads.
./bin/alluxio fs fileInfo /data/2015/logs-1.txt
free
The free
command sends a request to the master to evict all blocks of a file from the Alluxio
workers. If the argument to free
is a directory, it will recursively free
all files. This
request is not guaranteed to take effect immediately, as readers may be currently using the blocks
of the file. Free
will return immediately after the request is acknowledged by the master. Note
that free
does not delete any data from the under storage system, and only affects data stored in
Alluxio space. In addition, metadata will not be affected by this operation, meaning the freed file
will still show up if an ls
command is run.
For example, free
can be used to manually manage Alluxio’s data caching.
./bin/alluxio fs free /unused/data
getCapacityBytes
The getCapacityBytes
command returns the maximum number of bytes Alluxio is configured to store.
For example, getCapacityBytes
can be used to verify if your cluster is set up as expected.
./bin/alluxio fs getCapacityBytes
getUsedBytes
The getUsedBytes
command returns the number of used bytes in Alluxio.
For example, getUsedBytes
can be used to monitor the health of your cluster.
./bin/alluxio fs getUsedBytes
leader
The leader
command prints the current Alluxio leader master host name.
./bin/alluxio fs leader
load
The load
command moves data from the under storage system into Alluxio storage. If there is a
Alluxio worker on the machine this command is run from, the data will be loaded to that worker.
Otherwise, a random worker will be selected to serve the data. Load will no-op if the file is
already in Alluxio memory level storage. If load
is run on a directory, files in the directory
will be recursively loaded.
For example, load
can be used to prefetch data for analytics jobs.
./bin/alluxio fs load /data/today
loadMetadata
The loadMetadata
command is deprecated since Alluxio version 1.1.
Please use alluxio fs ls <path>
command instead.
The loadMetadata
command queries the under storage system for any file or directory matching the
given path and then creates a mirror of the file in Alluxio backed by that file. Only the metadata,
such as the file name and size are loaded this way and no data transfer occurs.
For example, loadMetadata
can be used when other systems output to the under storage directly
(bypassing Alluxio), and the application running on Alluxio needs to use the output of those
systems.
./bin/alluxio fs loadMetadata /hdfs/data/2015/logs-1.txt
location
The location
command returns the addresses of all the Alluxio workers which contain blocks
belonging to the given file.
For example, location
can be used to debug data locality when running jobs using a compute
framework.
./bin/alluxio fs location /data/2015/logs-1.txt
ls
The ls
command lists all the immediate children in a directory and displays the file size, last
modification time, and in memory status of the files. Using ls
on a file will only display the
information for that specific file.
Adding -R
option also recursively lists child directories, displaying the entire subtree starting
from the input path.
The ls
command will also load the metadata for any file or directory from the under storage system to Alluxio namespace, if
it does not exist in Alluxio yet. ls
queries the under storage system for any file or directory matching the given path and
then creates a mirror of the file in Alluxio backed by that file. Only the metadata, such as the file name and size are
loaded this way and no data transfer occurs.
Adding -f
option forces loading metadata for immediate children in a directory. By default, it loads metadata only
at the first time at which a directory is listed.
For example, ls
can be used to browse the file system.
./bin/alluxio fs mount /s3/data s3n://data-bucket/
# Loads metadata for all immediate children of /s3/data and lists them.
./bin/alluxio fs ls /s3/data/
#
# Forces loading metadata.
aws s3 cp /tmp/somedata s3n://data-bucket/somedata
./bin/alluxio fs ls -f /s3/data
#
# Files are not removed from Alluxio if they are removed from the UFS (s3 here) only.
aws s3 rm s3n://data-bucket/somedata
./bin/alluxio fs ls -f /s3/data
ls
loads the metadata for immedidate children of a directory.
mkdir
The mkdir
command creates a new directory in Alluxio space. It is recursive and will create any
nonexistent parent directories. Note that the created directory will not be created in the under
storage system until a file in the directory is persisted to the underlying storage. Using mkdir
on an invalid or already existing path will fail.
For example, mkdir
can be used by an admin to set up the basic folder structures.
./bin/alluxio fs mkdir /users
./bin/alluxio fs mkdir /users/Alice
./bin/alluxio fs mkdir /users/Bob
mount
The mount
command links an under storage path to an Alluxio path, and files and folders created
in Alluxio space under the path will be backed by a corresponding file or folder in the under
storage path. For more details, see Unified Namespace.
For example, mount
can be used to make data in another storage system available in Alluxio.
./bin/alluxio fs mount /s3/data s3n://data-bucket/
mv
The mv
command moves a file or directory to another path in Alluxio. The destination path must not
exist or be a directory. If it is a directory, the file or directory will be placed as a child of
the directory. mv
is purely a metadata operation and does not affect the data blocks of the file.
mv
cannot be done between mount points of different under storage systems.
For example, mv
can be used to move older data into a non working directory.
./bin/alluxio fs mv /data/2014 /data/archives/2014
persist
The persist
command persists data in Alluxio storage into the under storage system. This is a data
operation and will take time depending on how large the file is. After persist is complete, the file
in Alluxio will be backed by the file in the under storage, make it still valid if the Alluxio
blocks are evicted or otherwise lost.
For example, persist
can be used after filtering a series of temporary files for the ones
containing useful data.
./bin/alluxio fs persist /tmp/experimental-logs-2.txt
pin
The pin
command marks a file or folder as pinned in Alluxio. This is a metadata operation and will
not cause any data to be loaded into Alluxio. If a file is pinned, any blocks belonging to the file
will never be evicted from an Alluxio worker. If there are too many pinned files, Alluxio workers may
run low on storage space preventing other files from being cached.
For example, pin
can be used to manually ensure performance if the administrator understands the
workloads well.
./bin/alluxio fs pin /data/today
report
The report
command marks a file as lost to the Alluxio master. This command should only be used
with files created using the Lineage API. Marking a file as lost will cause the
master to schedule a recomputation job to regenerate the file.
For example, report
can be used to force recomputation of a file.
./bin/alluxio fs report /tmp/lineage-file
rm
The rm
command removes a file from Alluxio space and the under storage system. The file will be
unavailable immediately after this command returns, but the actual data may be deleted a while
later.
Add -R
option will delete all contents of the directory and then the directory itself.
For example, rm
can be used to remove temporary files which are no longer needed.
./bin/alluxio fs rm /tmp/unused-file
setTtl
The setTtl
command sets the time-to-live of a file, in milliseconds. Action parameter will
indicate the action to perform once the current time is greater than the TTL + creation time of the file.
Action delete
(default) will delete file from both Alluxio and the under storage system, whereas action
free
will just free the file from Alluxio.
For example, setTtl
with action delete
can be used to clean up files the administrator knows are
unnecessary after a period of time, or can be used to just remove the contents from Alluxio to make room
for more space in Alluxio.
./bin/alluxio fs setTtl -action free /data/good-for-one-day 86400000
tail
The tail
command outputs the last 1 kb of data in a file to the console.
For example, tail
can be used to verify the output of a job is in the expected format or contains
expected values.
./bin/alluxio fs tail /output/part-00000
touch
The touch
command creates a 0-byte file. Files created with touch
cannot be overwritten and are
mostly useful as flags.
For example, touch
can be used to create a file signifying the compeletion of analysis on a
directory.
./bin/alluxio fs touch /data/yesterday/_DONE_
unmount
The unmount
command disassociates an Alluxio path with an under storage directory. Alluxio metadata
for the mount point will be removed along with any data blocks, but the under storage system will
retain all metadata and data. See Unified Namespace for
more dtails.
For example, unmount
can be used to remove an under storage system when the users no longer need
data from that system.
./bin/alluxio fs unmount /s3/data
unpin
The unpin
command unmarks a file or directory in Alluxio as pinned. This is a metadata operation
and will not evict or delete any data blocks. Once a file is unpinned, its data blocks can be
evicted from the various Alluxio workers containing the block.
For example, unpin
can be used when the administrator knows there is a change in the data access
pattern.
./bin/alluxio fs unpin /data/yesterday/join-table
unsetTtl
The unsetTtl
command will remove the TTL of a file in Alluxio. This is a metadata operation and
will not evict or store blocks in Alluxio. The TTL of a file can later be reset with setTtl
.
For example, unsetTtl
can be used if a regularly managed file requires manual management due to
some special case.
./bin/alluxio fs unsetTtl /data/yesterday/data-not-yet-analyzed