Active Replication

Slack Docker Pulls

This guide describes how to manage active data replication in Alluxio.

Like many distributed file systems, each file in Alluxio consists of one or multiple blocks stored across the cluster. By default, Alluxio adjusts the replication level of different blocks dynamically and automatically based on the workload and storage capacity. For example, Alluxio may create more replicas of a particular block when more clients request to read this block with read type CACHE or CACHE_PROMOTE; Alluxio may also remove existing replicas when they are less often used to reclaim the space for data that is more often accessed (Evictor in Alluxio Storage). It is possible that in the same file different blocks have different number of replicas due to different demand level.

By default, this replication or eviction decision and the corresponding data transfer is completely transparent to users and applications accessing Alluxio data.

Active Replication

In addition to the dynamic replication adjustment, Alluxio also provides APIs and command-line interfaces for users to maintain a target range of replication level for a file explicitly. Particularly, user can configure the following two properties for a file in Alluxio (see how to change Configuration Settings).

  1. alluxio.user.file.replication.min is the minimum possible number of replicas of this file. Its default value is 0, so in the default case Alluxio may completely evict this file from Alluxio managed space after the file becomes cold. By setting this property to a positive integer, Alluxio will check the replication levels of all the blocks in this file periodically. When some blocks become under-replicated, Alluxio ensures no more eviction on these blocks and will actively create more replicas to restore the replication level.

  2. alluxio.user.file.replication.max is the maximum number of replicas. Once the property of this file is set to a positive integer, Alluxio will check replication level and remove the excessive replicas. Set this property to -1 to make no upper limit (the default case), and to 0 to prevent storing any data of this file in Alluxio. Note that, the value of alluxio.user.file.replication.max must be no less than alluxio.user.file.replication.min.

CLI Example

Copy a local file /path/to/file to Alluxio with at least two replicas initially:

$ ./bin/alluxio fs -Dalluxio.user.file.replication.min=2 copyFromLocal /path/to/file /file

Set the replication level range of /file between 3 and 5. Note that, this command will return right after setting the new replication level range in a background process and achieving the target asynchronously.

$ ./bin/alluxio fs setReplication -min 3 -max 5 /file

Set the alluxio.user.file.replication.max to unlimited.

$ ./bin/alluxio fs setRepliation -max -1 /file

Recursirvely set replication level of all files inside a directory /dir (including its sub-directories) using -R:

$ ./bin/alluxio fs setRepliation -min 3 -max -5 -R /dir

To check the target replication level of a file, run

$ bin/alluxio fs stat /foo

and look for the replicationMin and replicationMax fields in the output.

Programming Example

Create a new file /file with at least 2 replicas in Alluxio:

CreateFileOptions options = CreateFileOptions.defaults().setReplicationMin(2);
FileOutStream os = fileSystem.createFile(new AlluxioURI("/file"), options);

Change the target replication level of the existing file /file to be at least 3 and at most 5 replcias:

SetAttributeOptions options = SetAttributeOptions.defaults().setReplicationMin(3)
FileOutStream os = fileSystem.setAttribute(new AlluxioURI("/file"), options);

Check the target replication levels of a given file:

URIStatus status = fileSystem.getStatus(new AlluxioURI("/file"));
int replicationMin = status.getReplicationMin();
int replicationMax = status.getReplicationMax();