This guide describes how to manage active data replication in Alluxio.
Like many distributed file systems, each file in Alluxio consists of one or
multiple blocks stored across the cluster. By default, Alluxio adjusts the replication level of
different blocks dynamically and automatically based on the workload and storage capacity. For example,
Alluxio may create more replicas of a particular block when more clients request to read
this block with read type
CACHE_PROMOTE; Alluxio may also remove
existing replicas when they are less often used to reclaim the space
for data that is more often accessed
(Evictor in Alluxio Storage). It is possible that in the
same file different blocks have different number of replicas due to different demand level.
By default, this replication or eviction decision and the corresponding data transfer is completely transparent to users and applications accessing Alluxio data.
In addition to the dynamic replication adjustment, Alluxio also provides APIs and command-line interfaces for users to maintain a target range of replication level for a file explicitly. Particularly, user can configure the following two properties for a file in Alluxio (see how to change Configuration Settings).
alluxio.user.file.replication.minis the minimum possible number of replicas of this file. Its default value is 0, so in the default case Alluxio may completely evict this file from Alluxio managed space after the file becomes cold. By setting this property to a positive integer, Alluxio will check the replication levels of all the blocks in this file periodically. When some blocks become under-replicated, Alluxio ensures no more eviction on these blocks and will actively create more replicas to restore the replication level.
alluxio.user.file.replication.maxis the maximum number of replicas. Once the property of this file is set to a positive integer, Alluxio will check replication level and remove the excessive replicas. Set this property to
-1to make no upper limit (the default case), and to
0to prevent storing any data of this file in Alluxio. Note that, the value of
alluxio.user.file.replication.maxmust be no less than
Copy a local file
/path/to/file to Alluxio with at least two replicas initially:
$ ./bin/alluxio fs -Dalluxio.user.file.replication.min=2 copyFromLocal /path/to/file /file
Set the replication level range of
/file between 3 and 5.
Note that, this command will return right after setting the new
replication level range in a background process and achieving the target asynchronously.
$ ./bin/alluxio fs setReplication -min 3 -max 5 /file
alluxio.user.file.replication.max to unlimited.
$ ./bin/alluxio fs setRepliation -max -1 /file
Recursirvely set replication level of all files inside a directory
/dir (including its
$ ./bin/alluxio fs setRepliation -min 3 -max -5 -R /dir
To check the target replication level of a file, run
$ bin/alluxio fs stat /foo
and look for the
replicationMax fields in the output.
Create a new file
/file with at least 2 replicas in Alluxio:
CreateFileOptions options = CreateFileOptions.defaults().setReplicationMin(2); FileOutStream os = fileSystem.createFile(new AlluxioURI("/file"), options);
Change the target replication level of the existing file
/file to be at least 3 and at most 5 replcias:
SetAttributeOptions options = SetAttributeOptions.defaults().setReplicationMin(3) .setReplicationMin(5); FileOutStream os = fileSystem.setAttribute(new AlluxioURI("/file"), options);
Check the target replication levels of a given file:
URIStatus status = fileSystem.getStatus(new AlluxioURI("/file")); int replicationMin = status.getReplicationMin(); int replicationMax = status.getReplicationMax();