Fast Durable Write

Slack Docker Pulls

This guide describes how user can leverage ASYNC_THROUGH write type provided by Alluxio to speed up the write performance without trading off durability.

When the application is running at the same nodes where Alluxio is deployed, the application achieves the fastest write performance by storing data directly in Alluxio. However, because memory is volatile, when a node in Alluxio goes down or restarts, any data in that node’s memory is lost. To prevent data loss, Alluxio provides the ability to synchronously write the data to the persistent under-store in addition to Alluxio memory. The downside, however, is that the speed at which data can be written into the under-store is typically much slower than the speed the data can be written into memory.

With this fast durable write feature, applications can first synchronously write a file to Alluxio with N copies (configurable by property alluxio.user.file.replication.durable), and then persist to under-store asynchronously. To be specific, an application will write each block of this file to the top tier of the local worker and N-1 other workers. At this point, the file will show up in Alluxio filesystem and other applications can read, copy, update attributes, rename or even delete this file. In the background, Alluxio system automatically persists the file to under-store transparently to the users or applications.

If property alluxio.user.file.replication.durable is set to one, writing a file to Alluxio using ASYNC_THROUGH can complete at memory speed, but if a node crashes or restarts before the data is persisted, the data can be lost; setting alluxio.user.file.replication.durable to N>1 ensures the data is replicated at N different nodes in Alluxio to survive from N-1 node failures before persisted, at the cost of temporarily storing N times more data into Alluxio. Note that, before the file is eventually persisted, in case any of these N workers fails, Alluxio will take care of the failure by re-replicating the under-replicated blocks and ensuring sufficient copies of the file; also free command will not work before the file is persisted. Once the file is persisted to under-store, the in-Alluxio copies of this file can be evictable if other configuration allows this.

Optionally, when using fast durable write, if property alluxio.user.file.ufs.tier.enabled is true (default to false), the writes can by-pass writing to Alluxio but save data in UFS directly when Alluxio workers are filled up and no more space is available for the in-Alluxio copies. In this case, the fallback is initiated and handled by Alluxio servers and completely transparent to the applications. Thus users will not see exceptions or data loss due to the fallback, though the writes can potentially slowdown after falling back to UFS.

How to Use Fast Durable Write

Alluxio System Configuration

By setting alluxio-site.properties with the following configuration properties, applications will achieve fast durable writes with three initial copies.

alluxio.user.file.writetype.default=ASYNC_THROUGH
alluxio.user.file.replication.durable=3

Alluxio FileSystem Java API

Java client can use Alluxio FileSystem API to write fast and durable files:

FileOutStream os = fileSystem.createFile(/foo,
    CreateFileOptions.defaults().setWriteType(WriteType.ASYNC_THROUGH).setReplicationDurable(3));

Alluxio FileSystem REST API

It is also possible to use the REST API to create such files:

POST: http://host:port/v1/api/paths/<path>/create-file/
BODY:
{
  writeType: ASYNC_THROUGH
  replicationDurable: 3
}

Command-line Interface

Command-line is also supported in Alluxio to use fast durable writes. For example, fs copyFromLocal command can copy data from local file system to Alluxio using fast durable writes:

$ ./bin/alluxio fs -Dalluxio.user.file.writetype.default=ASYNC_THROUGH \
  -Dalluxio.user.file.replication.durable=3 copyFromLocal <localPath> <alluxioPath>