- Hadoop Compatible Client Configuration
- Alluxio Master Transparent URI Handling
- Native Client By-Pass
Alluxio users often have existing applications accessing their existing storage systems. Alluxio can be added to existing ecosystems, but one thing that must always be changed is the URI used by the application. Transparent-URI allows users to access their existing storage systems without changing URIs at application level.
Transparent-URI requires configuration at hadoop compatible application client and master side in order to be able to re-route foreign URI access. A fundamental requirement is that storage systems that are being accessed should be already mounted on Alluxio namespace. (See Auto-mounting below)
Hadoop Compatible Client Configuration
For Alluxio to accept non-Alluxio URI schemes, a new hadoop compatible file system client implementation should be configured with applications.
ShimFileSystem will replace existing
FileSystem whenever client is configured for receiving foreign URIs.
Hadoop compatible compute frameworks define the mapping from FileSystem scheme to FileSystem implementation.
ShimFileSystem is such implementation that can be associated with arbitrary URI schemes.
In order to configure
ShimFileSystem make sure
fs.AbstractFileSystem.<scheme>.impl properties are configured with it.
See the example for configuring
s3a URI schemes.
<property> <name>fs.s3.impl</name> <value>alluxio.hadoop.ShimFileSystem</value> </property>
<property> <name>fs.AbstractFileSystem.s3.impl</name> <value>alluxio.hadoop.AlluxioShimFileSystem</value> </property>
Also make sure Alluxio client jar is on the JVM classpath of all nodes of the compute frameworks.
Alluxio Master Transparent URI Handling
ShimFileSystem is configured, master will need to route URIs that are native to external storage system, to Alluxio namespace.
This requires storage system to have been mounted in Alluxio namespace.
- Application, with ShimFileSystem configured for
ShimFileSystemrelays the request to Alluxio with original URI.
- Alluxio detects
some-buckets3 bucket is already mounted at
- Given URI is transparently translated to
alluxio://s3-buckets/some-bucket/foo/barand request is served.
Native Client By-Pass
The hadoop compatible client,
ShimFileSystem can be configured for forwarding certain paths to a native file system client.
Native file system client, for the scheme that
ShimFileSystem is configured for, will be searched in the class path of client application.
Using by-pass configuration,
ShimFileSystem will be instructed to use this native client for certain paths.
For this capability, a client side property,
alluxio.user.shimfs.bypass.prefix.list, can be configured with comma separated prefix paths.
Prefix paths should be absolute paths with scheme, authority and path fields.
alluxio.user.shimfs.bypass.prefix.listis configured to “s3://foo/staging,s3://foo/tmp”.
- Transparent URI client is used to create a file at “s3://foo/staging/data1.dat”.
- Native client by-pass detects that the path is by-passed, so native file system is used for creating the file.
When the incoming foreign URI is not an inner path of any Alluxio existing mounts, Alluxio can automatically mount the target storage system without requiring external admin operation.
Auto-mounting is disabled by default. To enable, set
alluxio.master.shimfs.auto.mount.enabled=true on Alluxio master configuration.
When enabled, if a foreign URI can’t be found among Alluxio mounts,
Alluxio will try to mount the storage system for that URI to a designated folder in Alluxio namespace.
Auto-mounting doesn’t yet support passing credentials for newly discovered storage. The user under which Alluxio runs, should be able to mount the target storage system with its existing credentials provider chain.
For example, within an EC2 instance which attached an IAM role with S3 full access, users can mount their own S3 buckets without providing S3 security credentials.
When mounting the storage system, the foreign URI is used to determine exact storage system location to mount. Alluxio will attempt to mount the highest component of the URI into Alluxio and moves down in the path until it’s successful.
All auto-mounted storage systems will be mounted under a configured folder in Alluxio namespace.
Under this folder, auto-mounting will nest the mount for target storage system under a folder for
scheme followed by mount
s3://foo/bar/baz.txtreceived at master.
- No existing mount point found, so auto-mounting is triggered.
s3://foois attempted to be mounted to Alluxio at
- auto-mount fails due to insufficient access.
s3://foo/baris attempted to be mounted to Alluxio at
- auto-mount succeeds.
alluxio.master.shimfs.auto.mount.root: It is used to specify Alluxio folder for auto-mounting storage systems under. Default:
alluxio.master.shimfs.auto.mount.readonly: Whether auto-mounted storage systems will be read-only. Default:
alluxio.master.shimfs.auto.mount.shared: Whether auto-mounted storage systems will be shared, that is visible by other users. Default: