Cross Cluster Synchronization
This page describes the cross cluster synchronization feature of Alluxio.
The cross cluster synchronization feature has been available since version
2.9.0
.
Overview
Cross cluster synchronization allows multiple Alluxio clusters to mount the same (or intersecting) UFS path and keep files synchronized across the clusters.
For example, assume there are two Alluxio clusters called C1
and C2
. Both clusters mount the S3 bucket
s3://my-bucket/
to the folder /mnt
in Alluxio. Now when cluster C1
modifies a file on path /mnt
the
modification will also be visible on cluster C2
.
Instead of polling the UFS to check for changes based on a user interval (see ufs metadata sync), when cross cluster synchronization is enabled on a mount point and a file is modified through Alluxio, the cluster will directly notify other Alluxio clusters that the file needs synchronization. The next time the file is accessed on one of these clusters a metadata synchronization will be performed with the UFS. Note that only path information is communicated between clusters, any data or metadata synchronization will happen between the Alluxio clusters and the UFS. As a result any metadata that is stored in Alluxio only will not be synchronized (for example ACL data in certain configurations). Internally, the metadata synchronization will use the same mechanisms as the ufs metadata sync.
By tracking modifications directly between clusters, synchronization with the UFS only happens when necessary. This should reduce the load on the UFS and decrease the time for an update from one cluster to be visible on another cluster.
Requirements
For the feature to work correctly the following properties must be ensured:
- To ensure consistency across clusters, writes to cross cluster enabled UFS mount points must happen through Alluxio (see consistency).
- Writes to these mount points must use the
THROUGH
,CACHE_THROUGH
, orASYNC_THROUGH
write type option. - The UFS being mounted must ensure strong consistency in the sense that after a successful response is received from an operation, the result of that operation must be globally visible. UFSs such as HDFS, S3, and GCP cloud storage ensure this level of consistency for most operations.
- The masters of each Alluxio cluster must be able to communicate with the masters of each other Alluxio cluster
through their master RPC port (19998 by default, set by
alluxio.master.rpc.port
). - There is exactly one cross cluster master process running. It may be stopped and restarted at any time, but for consistency to be ensured it must eventually be running.
Components
Cross cluster synchronization is broken up into two main components, a naming service called the Cross Cluster Master for cluster and mount discovery, and a publish/subscribe service that runs on Alluxio masters to notify other clusters of modifications to the UFS.
Alluxio Masters
To use cross cluster synchronization, the Alluxio clusters must set the following properties in
alluxio-site.properties
at the master nodes:
alluxio.master.cross.cluster.enabled=true
- cross cluster synchronization must be enabled.alluxio.master.cross.cluster.id={some-unique-id}
- each cluster must be given a unique ID which can be any string.alluxio.master.cross.cluster.rpc.addresses={cross-cluster-master-address-port}
- thehost:port
combination of the cross cluster master process must be set (see cross cluster master).alluxio.master.mount.table.root.cross.cluster=true
- must be set if cross cluster synchronization should be enabled on the root mount.
Now when mounting a UFS path to Alluxio, the –-cross-cluster
flag must be included if files on that
mount are to be synchronized across clusters. For example:
/bin/alluxio fs mount --crosscluster /mnt/hdfs hdfs://localhost:9000
For cross cluster synchronization to work, the masters of the different clusters must be able to communicate with
each other using their master RPC ports (19998 by default, set by alluxio.master.rpc.port
).
The addresses used for communication will be calculated in the same way as at the clients
(see Optional Configuration).
If the cluster should be visible using different addresses then the alluxio.master.cross.cluster.external.rpc.addresses
can be set to list of comma-separated host:port
RPC addresses (note that this will not change how anything
on the local cluster is run, just that external clusters will try to connect to this cluster using these addresses).
Cross Cluster Master
The cross cluster master is a single global standalone java process that must be reachable by each Alluxio cluster’s masters. This process will be responsible for cluster and mount discovery between the different Alluxio clusters. The process can be started on any node where Alluxio is installed using the command:
bin/alluxio-start.sh cross_cluster_master
By default, this process will be listening on port 20009, this can be changed using the property key
alluxio.cross.cluster.master.rpc.port
.
For Alluxio clusters to be able to connect to the cross cluster master they must set the property key
alluxio.master.cross.cluster.rpc.addresses
with the host:port
RPC address of cross cluster master,
for example alluxio.master.cross.cluster.rpc.addresses=cross-cluster-master-hostname:20009
.
If a cluster is not able to connect to this process then it will not be able to add, remove, or update cross cluster enabled mount points. Paths already mounted will continue to be synchronized even if clusters are unable to communicate with this process at least until there is a master failover.
Logs for the cross cluster master can be found in logs/cross_cluster_master.log
.
See basic logging for
more details about logging in Alluxio.
Command line options:
To see the currently registered clusters at the cross cluster master run the following command from any Alluxio cluster:
bin/alluxio fsadmin crossCluster listClusters
This will display the addresses that the different Alluxio clusters will use to communicate with each other as well as their existing cross cluster synchronized mount points.
To remove a registered cluster at the cross cluster master use the following command from any Alluxio cluster:
bin/alluxio fsadmin crossCluster removeCluster {cluster-id}
Where {cluster-id}
is the value set by property key alluxio.master.cross.cluster.id
at the cluster
to be removed.
Removing a running cluster will result in the cluster re-registering with the cross cluster master.
Note that shutting down a cluster will not automatically remove it from the list of cross cluster enabled mount points, thus this command can be used to remove the cluster if it will no longer be used with cross cluster synchronization.
Consistency
If the requirements are ensured, the state of the files in the clusters will be eventually consistent. Meaning that eventually the latest version of every file will be visible on every cluster. (See limitations below which may cause inconsistencies to occur).
The time it takes for an update from one cluster to be visible on another cluster will depend on the latency between the clusters and the UFS, the latency between the clusters themselves, as well as the current load on the clusters.
If a write operation happens outside Alluxio and the user wishes for this path to be synced, then there are two options:
- Perform a metadata sync manually by setting the
-Dalluxio.user.file.metadata.sync.interval=0
flag when performing an operation (note that for cross cluster enabled mounts thealluxio.user.file.metadata.sync.interval
value set inalluxio-site.properties
is ignored, but if it is set for an individual command, or using path configurations (see the pathConf command) the included value will be used). - Mark a path as needing synchronization by running the command
bin/alluxio fs needsSync {path}
. Following this, the next time this path is accessed by a file system operation, the metadata for this path will be synced (see the needsSync command). Note that this is a lightweight operation that will not sync the files immediately, only individual files/directories will be synced as they are accessed.
Performance tuning
There are several options that can be used for tuning the performance of the cross cluster synchronization:
alluxio.master.ufs.path.cache.capacity
- When files are checked if needing synchronization with the UFS, a cache is checked whose size is given by this property key. This cache tracks the synchronization and invalidation times on a path level, using an LRU eviction scheme when full. When a path is evicted it may mark its parent as needing synchronization, thus increasing the size of this cache may reduce the number of spurious accesses to the UFS. The default size of this cache is100000
, larger values may result in increased memory usage on the masters. The size of a cache entry includes the string representation of the path as well as approximately100
bytes of timing/object metadata.alluxio.master.cross.cluster.invalidation.queue.size
- When a cluster modifies a file, it sends an invalidation message to other clusters that have also mounted this file. This property sets the maximum number of invalidation messages that can be queued or in transit at the same time for each external mount. The default value is10000
, larger values may result in increased memory usage on the masters.alluxio.master.cross.cluster.invalidation.queue.wait
- If the size of the invalidation message queue from the above property is exceeded, then the value set here is the amount of time a thread will wait for there to be room in the queue for its invalidation message. If this time is exceeded then messages will be dropped which will result in extra synchronization operations to ensure the clusters are consistent. This can be thought of as a backpressure mechanism, which will slow down the publisher in case of instabilities in the network or at the receiver. If the publisher should absolutely not be slowed down then this value can be set to0
, but may result in the subscriber cluster performing additional synchronizations later. The default value is1
second.
Limitations
- Concurrent writes to the same file may result in the file being out of sync across clusters.
- If a client encounters an error while writing a file and leaves the file in a partially complete state, the file may not be correctly synchronized across clusters.
- Performing a mount, update mount, or unmount operation during periods of high cross cluster synchronization load may cause instability for the cross cluster operations. This is due to the mount operations blocking the cross cluster subscriptions while in progress.
Example configurations
Below are some example configurations using cross cluster sync.
For all of these configurations it is assumed that there is
a node with address master-hostname1
where the cross cluster master is running,
i.e. on this node the following command has been run:
bin/alluxio-start.sh cross_cluster_master
Here is an example of a configuration with two Alluxio clusters and a single S3 bucket to be synchronized across clusters as the root mount in Alluxio.
At one Alluxio cluster we have the following in the alluxio-site.properties
file at all masters:
alluxio.master.cross.cluster.id=C1
alluxio.master.cross.cluster.rpc.addresses=master-hostname1:20009
alluxio.master.cross.cluster.enabled=true
alluxio.user.file.writetype.default=CACHE_THROUGH
alluxio.master.mount.table.root.cross.cluster=true
alluxio.master.mount.table.root.ufs=s3://my-bucket
At another Alluxio cluster we have the following in the alluxio-site.properties
file at all masters:
alluxio.master.cross.cluster.id=C2
alluxio.master.cross.cluster.rpc.addresses=master-hostname1:20009
alluxio.master.cross.cluster.enabled=true
alluxio.user.file.writetype.default=CACHE_THROUGH
alluxio.master.mount.table.root.cross.cluster=true
alluxio.master.mount.table.root.ufs=s3://my-bucket
The clusters are now started by running the following command at each cluster.
bin/alluxio-start.sh all
Now any modification done through Alluxio to the S3 bucket s3://my-bucket
will be synchronized between the clusters
under their respective root mounts.
Here is an example of a configuration with two Alluxio clusters and a single S3 bucket to be synchronized across clusters as a nested mount in Alluxio.
At one Alluxio cluster we have the following in the alluxio-site.properties
file at all masters:
alluxio.master.cross.cluster.id=C1
alluxio.master.cross.cluster.rpc.addresses=master-hostname1:20009
alluxio.master.cross.cluster.enabled=true
alluxio.user.file.writetype.default=CACHE_THROUGH
At another Alluxio cluster we have the following in the alluxio-site.properties
file at all masters:
alluxio.master.cross.cluster.id=C2
alluxio.master.cross.cluster.rpc.addresses=master-hostname1:20009
alluxio.master.cross.cluster.enabled=true
alluxio.user.file.writetype.default=CACHE_THROUGH
The clusters are now started by running the following command at each cluster:
bin/alluxio-start.sh all
Now if we want to keep the S3 bucket s3://my-bucket
synchronized between the clusters under the path /mnt
, we run the command at both clusters:
bin/alluxio fs mount --crosscluster /mnt/ s3://my-bucket
Here is an example of a configuration with two Alluxio clusters and a single S3 bucket to be synchronized across clusters as the root mount in Alluxio and an HFDS cluster mounted as a nested mount.
At one Alluxio cluster we have the following in the alluxio-site.properties
file at all masters:
alluxio.master.cross.cluster.id=C1
alluxio.master.cross.cluster.rpc.addresses=master-hostname1:20009
alluxio.master.cross.cluster.enabled=true
alluxio.user.file.writetype.default=CACHE_THROUGH
alluxio.master.mount.table.root.cross.cluster=true
alluxio.master.mount.table.root.ufs=s3://my-bucket
At another Alluxio cluster we have the following in the alluxio-site.properties
file at all masters:
alluxio.master.cross.cluster.id=C2
alluxio.master.cross.cluster.rpc.addresses=master-hostname1:20009
alluxio.master.cross.cluster.enabled=true
alluxio.user.file.writetype.default=CACHE_THROUGH
alluxio.master.mount.table.root.cross.cluster=true
alluxio.master.mount.table.root.ufs=s3://my-bucket
The clusters are now started by running the following command at each cluster.
bin/alluxio-start.sh all
Now any modification done through Alluxio to the S3 bucket s3://my-bucket
will be synchronized between the clusters
under their respective root mounts.
Additionally, the clusters wish to mount the HDFS UFS located at hdfs://hdfs-hostname:9000
at the local Alluxio path /mnt/hdfs
and have updates synchronized across clusters.
To do this, the following command is run at each cluster:
/bin/alluxio fs mount --crosscluster /mnt/hdfs hdfs://hdfs-hostname:9000
Here is an example of a configuration with two Alluxio clusters and a single S3
bucket to be synchronized across clusters as the root mount in Alluxio.
In this case the first cluster wants to mount the full bucket s3://my-bucket
while the second cluster only wants to mount the section of the bucket with
the prefix s3://my-bucket/my-prefix
.
At the first Alluxio cluster we have the following in the alluxio-site.properties
file at all masters:
alluxio.master.cross.cluster.id=C1
alluxio.master.cross.cluster.rpc.addresses=master-hostname1:20009
alluxio.master.cross.cluster.enabled=true
alluxio.user.file.writetype.default=CACHE_THROUGH
alluxio.master.mount.table.root.cross.cluster=true
alluxio.master.mount.table.root.ufs=s3://my-bucket
At the second Alluxio cluster we have the following in the alluxio-site.properties
file at all masters:
alluxio.master.cross.cluster.id=C2
alluxio.master.cross.cluster.rpc.addresses=master-hostname1:20009
alluxio.master.cross.cluster.enabled=true
alluxio.user.file.writetype.default=CACHE_THROUGH
alluxio.master.mount.table.root.cross.cluster=true
alluxio.master.mount.table.root.ufs=s3://my-bucket/my-prefix
The clusters are now started by running the following command at each cluster.
bin/alluxio-start.sh all
Now any modification done through Alluxio to the S3 bucket s3://my-bucket
will be visible on
the first cluster, and any modification under the prefix s3://my-bucket/my-prefix
will be visible on the second cluster.
Here is an example of a configuration with two Alluxio clusters and a single S3 bucket to be synchronized across clusters as the root mount in Alluxio. In this case the clusters are located in different regions and must use different hostnames for communicating across regions.
Each cluster is running a high availability setup with three masters.
At one Alluxio cluster we have the following in the alluxio-site.properties
file at all masters:
alluxio.master.cross.cluster.id=C1
alluxio.master.cross.cluster.rpc.addresses=master-hostname1:20009
alluxio.master.cross.cluster.enabled=true
alluxio.user.file.writetype.default=CACHE_THROUGH
alluxio.master.mount.table.root.cross.cluster=true
alluxio.master.mount.table.root.ufs=s3://my-bucket
alluxio.master.rpc.addresses=C1-master1:19998,C1-master2:19998,C1-master3:19998
alluxio.master.cross.cluster.external.rpc.addresses=C1-master1-public:19998,C1-master2-public:19998,C1-master3-public:19998
At another Alluxio cluster we have the following in the alluxio-site.properties
file at all masters:
alluxio.master.cross.cluster.id=C2
alluxio.master.cross.cluster.rpc.addresses=master-hostname1:20009
alluxio.master.cross.cluster.enabled=true
alluxio.user.file.writetype.default=CACHE_THROUGH
alluxio.master.mount.table.root.cross.cluster=true
alluxio.master.mount.table.root.ufs=s3://my-bucket
alluxio.master.rpc.addresses=C2-master1:19998,C2-master2:19998,C2-master3:19998
alluxio.master.cross.cluster.external.rpc.addresses=C2-master1-public:19998,C2-master2-public:19998,C2-master3-public:19998
The clusters are now started by running the following command at each cluster.
bin/alluxio-start.sh all
Now any modification done through Alluxio to the S3 bucket s3://my-bucket
will be synchronized between the clusters
under their respective root mounts.
Clusters will communicate with each other using their external hostnames
(i.e. those with the suffix -public
in this example).
Assume we are in the case of the previous example A two cluster setup with a synchronized S3 root mount
where the S3 bucket s3://my-bucket
is mounted as the root UFS mount at both clusters with cross cluster
synchronization enabled.
Now assume some external workload has updated a set of file paths under the prefix s3://my-bucket/external-updated
and these updates should be visible on both of the Alluxio clusters.
On one of the clusters we can run the command:
bin/alluxio fs needsSync /external-updated
Now the next time any file under the path /external-updated
is accessed in either Alluxio clusters
the file metadata will be synced with the UFS.