Alluxio Namespace and Under File System Namespaces
Introduction
We use the term “Under File System (UFS)” for a storage system managed and cached by Alluxio. Alluxio is built on top of the storage layer, providing cache speed-up and various other data management functionalities. Therefore, those storage systems are “under” the Alluxio layer.
Each UFS possesses its own namespace. For example, a file (object) in AWS S3
s3://data-bucket/images/img-0001
is in the namespace of the AWS S3 storage.
By being a data management and caching layer on top of the UFS, Alluxio’s namespace is just made up of the namespaces of all independent UFSes.
The Alluxio Mount Table
Alluxio manages independent UFS namespaces by the Alluxio Mount Table. The mount table defines the mappings from Alluxio paths to different UFSes.
An example mount table looks like as below:
/s3-images s3://my-bucket/data/images
/hive hdfs://hdfs-cluster.company.com/user/hive
/presto hdfs://hdfs-cluster.company.com/user/presto
The mount table from the example above consists of 2 columns, and 3 entries. The first column is the paths of mount points in Alluxio namespace. And the second column is the corresponding UFS paths that are mounted on Alluxio.
The first mount entry defines a mapping from an S3 path s3://my-bucket/data/images
to
an Alluxio path /s3-images
. Therefore, any objects with the S3 prefix s3://my-bucket/data/images
will be available under the Alluxio directory/s3-images
. For example, s3://my-bucket/data/images/picture.png
can be found at Alluxio path /s3-images/picture.png
.
The second and third entries define mappings between the Alluxio paths /hive
and /presto
to two directories in the same HDFS, hdfs://hdfs-cluster.company.com/user/hive
and
hdfs://hdfs-cluster.company.com/user/presto
, respectively. Similarly, files and directories
under the two HDFS directory trees will be available at their corresponding Alluxio paths.
For example, hdfs://hdfs-cluster.company.com/user/hive/schema/table/part1.parquet
becomes
/hive/schema/table/part1.parquet
in Alluxio namespace.
Mount Table Rules
A mount table entry consists of two parts: a mount point in Alluxio, and a UFS URI that is mounted. For a mount table entry (mount point), there are a few rules.
Rule 1. Mount directly under root path /
A mount point in Alluxio MUST be a direct child of the root path /
. For example,
/s3-images
, /hive
and /presto
are valid mount points.
The root path /
, is just a virtual node in Alluxio namespace. It does NOT map to any UFS path.
# This is invalid, you cannot mount to the root path directly
/ s3://my-bucket/
# This is invalid, a mount point can only be directly under /
/s3-images/dataset1 s3://my-bucket/data/images/dataset1
# This is valid
/s3-images s3://my-bucket/data/images/dataset1
There is one exception to this rule and will be explained in a later section.
Rule 2. No nested mount points
Mount points cannot be nested. The Alluxio path of one mount point cannot be under the Alluxio path of another mount point. Similarly, the UFS path of one mount cannot be under the UFS path of another mount point.
# Suppose we have this mount point
/data s3://bucket/data
# This new mount point is invalid -- the Alluxio path is under an existing mount point
/data/hdfs hdfs://host:port/data
# This is also invalid -- the UFS path is under an existing mount point
/images s3://bucket/data/images
is an invalid mount table configuration, as s3://bucket/data
is a prefix of s3://bucket/data/images
. If this were
allowed, a file s3://bucket/data/images/picture.png
would have two valid locations in Alluxio:
/data/images/picture.png
and /images/picture.png
.
The two rules above ensure that all mount points in Alluxio namespace are directly under root /
.
This keeps mount points independent of each other, both in Alluxio namespace and in UFS namespaces.
Therefore, it is easy for admins to add and remove mount points.
Configure the Mount Table
Alluxio supports loading mount table entries from different persistent backends. The options currently supported are:
- An etcd database (ETCD mode)
- A static configuration file (STATIC_FILE mode)
- Not using a mount table (NONE mode)
ETCD mode
Alluxio supports using an etcd database to store the mount table information.
By storing the mount table in etcd, all Alluxio processes (clients, workers, fuse, etc.)
will read etcd for the mount table information. The mount points are stored under path prefix /mounts
in etcd.
To use etcd as the mount table backend, add the following configurations to alluxio-site.properties
:
alluxio.mount.table.source=ETCD
alluxio.etcd.endpoints=<connection URI of etcd cluster>
Set alluxio.etcd.endpoints
to be the list of instances in the etcd cluster, e.g.
# Typically an etcd cluster has at least 3 nodes, for high availability
alluxio.etcd.endpoints=http://etcd-node0:2379,http://etcd-node1:2379,http://etcd-node2:2379
Alluxio processes will connect to etcd when they start and requires the etcd to be up. Then they regularly poll etcd for updates on the mount table. The poll interval is specified by the configuration below in alluxio-site.properties.
# By default a poll happens every 3s
alluxio.mount.table.etcd.polling.interval.ms=3s
In a large cluster with thousands of Alluxio clients and hundreds of Alluxio workers, you may want to use a larger interval to reduce the pressure on etcd. If your mount table is seldom updated, feel free to use a much larger interval.
When using etcd for mount table storage, you can add/remove mount points at runtime. Refer to section Update the Mount Table for more details. Note that the update on the mount table takes at most one poll interval to take effect on an Alluxio process (client, worker, fuse, etc.).
STATIC_FILE mode
Alluxio also supports using a static configuration file for mount table information. The configuration file is a simple text file that looks like:
# lines starting with "#" are comments
/s3_bucket s3://bucket/dir
/hdfs hdfs://namenode/user/data
Each line defines a mount entry, with the mount point in Alluxio namespace in the first column, and the UFS URI in the second. The columns are separated by one or more whitespaces.
The configuration file should be accessible and readable by the Alluxio processes. It’s easiest to put it inside the Alluxio configuration directory along with the other critical configuration files.
To enable the static mount table, add the following configurations to alluxio-site.properties
:
alluxio.mount.table.source=STATIC_FILE
alluxio.mount.table.static.conf.file=${alluxio.conf.dir}/mount_table
It’s important to note that all Alluxio processes need to see the same mount table. The same configuration file should be present on each node running those Alluxio processes.
A mount table based on a static configuration file is not mutable. To make changes to the mount table, you need to make changes to the configuration file, propagate the changes to all Alluxio nodes, and restart the Alluxio processes.
This mode is best used in a testing environment to quickly configure the mount table and bootstrap the Alluxio cluster. However, due to the operations overhead in maintaining the same mount table file across the cluster, it is not the best mode for a production environment, unless in your environment the mount points almost never change.
NONE mode
Alluxio also supports mounting to the Alluxio root path /
directly.
This is essentially NOT having a mount “table” because there will be only one entry.
Because we only need to configure for the Alluxio root path, we do not need an etcd database or a separate file, just configure as the example below in alluxio-site.properties:
alluxio.mount.table.source=NONE
alluxio.dora.client.ufs.root=hdfs://host:port/data
This is the default mode in Alluxio, for an easy test deployment with the minimal configurations.
If you need to change the UFS path that Alluxio root /
maps to, you will need to update the configuration
file, propagate it to all nodes and Alluxio processes, and restart the Alluxio services.
it is not the best mode for a production environment, unless in your environment the UFS path
never changes.
This mode provides the same behavior as Alluxio Community Edition.
Configure for UFS
After the mount points are specified, when Alluxio processes talk to the corresponding UFS, they also need UFS-specific configurations, like security credentials.
Currently, Alluxio only supports reading configurations for each UFS from the configuration file and/or environment variables. And the configurations will be shared by all mount points. For example:
# Configure the S3 credentials for all mount points
s3a.accessKeyId=<S3 ACCESS KEY>
s3a.secretKey=<S3 SECRET KEY>
# Configure HDFS configurations for all mount points
alluxio.underfs.hdfs.configuration=/path/to/hdfs/conf/core-site.xml:/path/to/hdfs/conf/hdfs-site.xml
In other words, Alluxio does not support using different configurations for different mount points. This is a known limitation and will change in future versions.
Manage the Mount Table
List the mount table
You can list the current mount table using Alluxio command line:
$ bin/alluxio mount list
Update the mount table in ETCD mode
In ETCD mode, the admin can utilize Alluxio command line to add/remove mount points.
# Add a mount point to an S3 bucket
$ bin/alluxio mount add --path /s3 --ufs-uri s3://data/
# Add a mount point to an HDFS path
$ bin/alluxio mount add --path /s3 --ufs-uri hdfs://host:port/data/
# Add a mount point to a local path for testing
$ bin/alluxio mount add --path /local --ufs-uri file:///Users/bob/data
# Remove a mount point by its Alluxio path
$ bin/alluxio mount remove --path /s3/
Update the mount table in STATIC_FILE mode
To add new entries to or remove entries from the mount table, you can make appropriate changes to
the configuration file specified by alluxio.mount.table.static.conf.file
.
This updated configuration file should be made accessible to all Alluxio processes.
And it takes a restart for those Alluxio processes to reload the file and observe the new configuration.
Update the mount table in NONE mode
Update the configuration property alluxio.dora.client.ufs.root
, make sure the new configuration
is accessible to all Alluxio processes, and restart those processes for it to take effect.