Transparent URI
- Hadoop Compatible Client Configuration
- Alluxio Master Transparent URI Handling
- Native Client By-Pass
- Auto-mounting
Alluxio users often have existing applications accessing their existing storage systems. Alluxio can be added to existing ecosystems, but one thing that must always be changed is the URI used by the application. Transparent-URI allows users to access their existing storage systems without changing URIs at application level.
Transparent-URI requires configuration at hadoop compatible application client and master side in order to be able to re-route foreign URI access. A fundamental requirement is that storage systems that are being accessed should be already mounted on Alluxio namespace. (See Auto-mounting below)
Hadoop Compatible Client Configuration
For Alluxio to accept non-Alluxio URI schemes, a new hadoop compatible file system client implementation should be configured with applications.
This new ShimFileSystem
will replace existing FileSystem
whenever client is configured for receiving foreign URIs.
Hadoop compatible compute frameworks define the mapping from FileSystem scheme to FileSystem implementation.
ShimFileSystem
is such implementation that can be associated with arbitrary URI schemes.
In order to configure ShimFileSystem
make surefs.<scheme>.impl
or fs.AbstractFileSystem.<scheme>.impl
properties are configured with it.
See the example for configuring ShimFileSystem
for s3
or s3a
URI schemes.
<property>
<name>fs.s3.impl</name>
<value>alluxio.hadoop.ShimFileSystem</value>
</property>
<property>
<name>fs.AbstractFileSystem.s3.impl</name>
<value>alluxio.hadoop.AlluxioShimFileSystem</value>
</property>
Also make sure Alluxio client jar is on the JVM classpath of all nodes of the compute frameworks.
Alluxio Master Transparent URI Handling
Once ShimFileSystem
is configured, master will need to route URIs that are native to external storage system, to Alluxio namespace.
This requires storage system to have been mounted in Alluxio namespace.
For example:
- Application, with ShimFileSystem configured for
fs.s3.impl
, accessess3://some-bucket/foo/bar
. ShimFileSystem
relays the request to Alluxio with original URI.- Alluxio detects
some-bucket
s3 bucket is already mounted at/s3-buckets/some-bucket
. - Given URI is transparently translated to
alluxio://s3-buckets/some-bucket/foo/bar
and request is served.
Native Client By-Pass
The hadoop compatible client, ShimFileSystem
can be configured for forwarding certain paths to a native file system client.
Native file system client, for the scheme that ShimFileSystem
is configured for, will be searched in the class path of client application.
Using by-pass configuration, ShimFileSystem
will be instructed to use this native client for certain paths.
For this capability, a client side property, alluxio.user.shimfs.bypass.prefix.list
, can be configured with comma separated prefix paths.
Prefix paths should be absolute paths with scheme, authority and path fields.
Example:
alluxio.user.shimfs.bypass.prefix.list
is configured to “s3://foo/staging,s3://foo/tmp”.- Transparent URI client is used to create a file at “s3://foo/staging/data1.dat”.
- Native client by-pass detects that the path is by-passed, so native file system is used for creating the file.
Auto-mounting
When the incoming foreign URI is not an inner path of any Alluxio existing mounts, Alluxio can automatically mount the target storage system without requiring external admin operation.
Auto-mounting is disabled by default. To enable, set alluxio.master.shimfs.auto.mount.enabled=true
on Alluxio master configuration.
When enabled, if a foreign URI can’t be found among Alluxio mounts,
Alluxio will try to mount the storage system for that URI to a designated folder in Alluxio namespace.
Credentials
Auto-mounting doesn’t yet support passing credentials for newly discovered storage. The user under which Alluxio runs, should be able to mount the target storage system with its existing credentials provider chain.
For example, within an EC2 instance which attached an IAM role with S3 full access, users can mount their own S3 buckets without providing S3 security credentials.
Mount procedure
When mounting the storage system, the foreign URI is used to determine exact storage system location to mount. Alluxio will attempt to mount the highest component of the URI into Alluxio and moves down in the path until it’s successful.
All auto-mounted storage systems will be mounted under a configured folder in Alluxio namespace.
Under this folder, auto-mounting will nest the mount for target storage system under a folder for scheme
followed by mount authority
.
Example:
- Path
s3://foo/bar/baz.txt
received at master. - No existing mount point found, so auto-mounting is triggered.
s3://foo
is attempted to be mounted to Alluxio at/auto-mount/s3/foo
.- auto-mount fails due to insufficient access.
s3://foo/bar
is attempted to be mounted to Alluxio at/auto-mount/s3/foo/bar
.- auto-mount succeeds.
Advanced settings
alluxio.master.shimfs.auto.mount.root
: It is used to specify Alluxio folder for auto-mounting storage systems under. Default:/auto-mount
.alluxio.master.shimfs.auto.mount.readonly
: Whether auto-mounted storage systems will be read-only. Default:true
.alluxio.master.shimfs.auto.mount.shared
: Whether auto-mounted storage systems will be shared, that is visible by other users. Default:false
.