Transparent URI

Slack Docker Pulls

Alluxio users often have existing applications accessing their existing storage systems. Alluxio can be added to existing ecosystems, but one thing that must always be changed is the URI used by the application. Transparent-URI allows users to access their existing storage systems without changing URIs at application level.

Transparent-URI requires configuration at hadoop compatible application client and master side in order to be able to re-route foreign URI access. A fundamental requirement is that storage systems that are being accessed should be already mounted on Alluxio namespace. (See Auto-mounting below)

Hadoop Compatible Client Configuration

For Alluxio to accept non-Alluxio URI schemes, a new hadoop compatible file system client implementation should be configured with applications. This new ShimFileSystem will replace existing FileSystem whenever client is configured for receiving foreign URIs.

Hadoop compatible compute frameworks define the mapping from FileSystem scheme to FileSystem implementation. ShimFileSystem is such implementation that can be associated with arbitrary URI schemes. In order to configure ShimFileSystem make surefs.<scheme>.impl or fs.AbstractFileSystem.<scheme>.impl properties are configured with it.

See the example for configuring ShimFileSystem for s3 or s3a URI schemes.

<property>
  <name>fs.s3.impl</name>
  <value>alluxio.hadoop.ShimFileSystem</value>
</property>
<property>
  <name>fs.AbstractFileSystem.s3.impl</name>
  <value>alluxio.hadoop.AlluxioShimFileSystem</value>
</property>

Also make sure Alluxio client jar is on the JVM classpath of all nodes of the compute frameworks.

Alluxio Master Transparent URI Handling

Once ShimFileSystem is configured, master will need to route URIs that are native to external storage system, to Alluxio namespace. This requires storage system to have been mounted in Alluxio namespace.

For example:

  • Application, with ShimFileSystem configured for fs.s3.impl, accesses s3://some-bucket/foo/bar.
  • ShimFileSystem relays the request to Alluxio with original URI.
  • Alluxio detects some-bucket s3 bucket is already mounted at /s3-buckets/some-bucket.
  • Given URI is transparently translated to alluxio://s3-buckets/some-bucket/foo/bar and request is served.

Native Client By-Pass

The hadoop compatible client, ShimFileSystem can be configured for forwarding certain paths to a native file system client. Native file system client, for the scheme that ShimFileSystem is configured for, will be searched in the class path of client application. Using by-pass configuration, ShimFileSystem will be instructed to use this native client for certain paths.

For this capability, a client side property, alluxio.user.shimfs.bypass.prefix.list, can be configured with comma separated prefix paths. Prefix paths should be absolute paths with scheme, authority and path fields.

Example:

  • alluxio.user.shimfs.bypass.prefix.list is configured to “s3://foo/staging,s3://foo/tmp”.
  • Transparent URI client is used to create a file at “s3://foo/staging/data1.dat”.
  • Native client by-pass detects that the path is by-passed, so native file system is used for creating the file.

Auto-mounting

When the incoming foreign URI is not an inner path of any Alluxio existing mounts, Alluxio can automatically mount the target storage system without requiring external admin operation.

Auto-mounting is disabled by default. To enable, set alluxio.master.shimfs.auto.mount.enabled=true on Alluxio master configuration. When enabled, if a foreign URI can’t be found among Alluxio mounts, Alluxio will try to mount the storage system for that URI to a designated folder in Alluxio namespace.

Credentials

Auto-mounting doesn’t yet support passing credentials for newly discovered storage. The user under which Alluxio runs, should be able to mount the target storage system with its existing credentials provider chain.

For example, within an EC2 instance which attached an IAM role with S3 full access, users can mount their own S3 buckets without providing S3 security credentials.

Mount procedure

When mounting the storage system, the foreign URI is used to determine exact storage system location to mount. Alluxio will attempt to mount the highest component of the URI into Alluxio and moves down in the path until it’s successful.

All auto-mounted storage systems will be mounted under a configured folder in Alluxio namespace. Under this folder, auto-mounting will nest the mount for target storage system under a folder for scheme followed by mount authority.

Example:

  • Path s3://foo/bar/baz.txt received at master.
  • No existing mount point found, so auto-mounting is triggered.
  • s3://foo is attempted to be mounted to Alluxio at /auto-mount/s3/foo.
    • auto-mount fails due to insufficient access.
  • s3://foo/bar is attempted to be mounted to Alluxio at /auto-mount/s3/foo/bar.
    • auto-mount succeeds.

Advanced settings

  • alluxio.master.shimfs.auto.mount.root: It is used to specify Alluxio folder for auto-mounting storage systems under. Default: /auto-mount.
  • alluxio.master.shimfs.auto.mount.readonly: Whether auto-mounted storage systems will be read-only. Default: true.
  • alluxio.master.shimfs.auto.mount.shared: Whether auto-mounted storage systems will be shared, that is visible by other users. Default: false.