Cross Cluster Synchronization

Slack Docker Pulls

This page describes the cross cluster synchronization feature of Alluxio.

The cross cluster synchronization feature has been available since version 2.9.0.

Overview

Cross cluster synchronization allows multiple Alluxio clusters to mount the same (or intersecting) UFS path and keep files synchronized across the clusters.

For example, assume there are two Alluxio clusters called C1 and C2. Both clusters mount the S3 bucket s3://my-bucket/ to the folder /mnt in Alluxio. Now when cluster C1 modifies a file on path /mnt the modification will also be visible on cluster C2.

Instead of polling the UFS to check for changes based on a user interval (see ufs metadata sync), when cross cluster synchronization is enabled on a mount point and a file is modified through Alluxio, the cluster will directly notify other Alluxio clusters that the file needs synchronization. The next time the file is accessed on one of these clusters a metadata synchronization will be performed with the UFS. Note that only path information is communicated between clusters, any data or metadata synchronization will happen between the Alluxio clusters and the UFS. As a result any metadata that is stored in Alluxio only will not be synchronized (for example ACL data in certain configurations). Internally, the metadata synchronization will use the same mechanisms as the ufs metadata sync.

By tracking modifications directly between clusters, synchronization with the UFS only happens when necessary. This should reduce the load on the UFS and decrease the time for an update from one cluster to be visible on another cluster.

Requirements

For the feature to work correctly the following properties must be ensured:

  • To ensure consistency across clusters, writes to cross cluster enabled UFS mount points must happen through Alluxio (see consistency).
  • Writes to these mount points must use the THROUGH, CACHE_THROUGH, or ASYNC_THROUGH write type option.
  • The UFS being mounted must ensure strong consistency in the sense that after a successful response is received from an operation, the result of that operation must be globally visible. UFSs such as HDFS, S3, and GCP cloud storage ensure this level of consistency for most operations.
  • The masters of each Alluxio cluster must be able to communicate with the masters of each other Alluxio cluster through their master RPC port (19998 by default, set by alluxio.master.rpc.port).
  • There is exactly one cross cluster master process running. It may be stopped and restarted at any time, but for consistency to be ensured it must eventually be running.

Components

Cross cluster synchronization is broken up into two main components, a naming service called the Cross Cluster Master for cluster and mount discovery, and a publish/subscribe service that runs on Alluxio masters to notify other clusters of modifications to the UFS.

Alluxio Masters

To use cross cluster synchronization, the Alluxio clusters must set the following properties in alluxio-site.properties at the master nodes:

  • alluxio.master.cross.cluster.enabled=true - cross cluster synchronization must be enabled.
  • alluxio.master.cross.cluster.id={some-unique-id} - each cluster must be given a unique ID which can be any string.
  • alluxio.master.cross.cluster.rpc.addresses={cross-cluster-master-address-port} - the host:port combination of the cross cluster master process must be set (see cross cluster master).
  • alluxio.master.mount.table.root.cross.cluster=true - must be set if cross cluster synchronization should be enabled on the root mount.

Now when mounting a UFS path to Alluxio, the –-cross-cluster flag must be included if files on that mount are to be synchronized across clusters. For example:

/bin/alluxio fs mount --crosscluster /mnt/hdfs hdfs://localhost:9000

For cross cluster synchronization to work, the masters of the different clusters must be able to communicate with each other using their master RPC ports (19998 by default, set by alluxio.master.rpc.port). The addresses used for communication will be calculated in the same way as at the clients (see Optional Configuration). If the cluster should be visible using different addresses then the alluxio.master.cross.cluster.external.rpc.addresses can be set to list of comma-separated host:port RPC addresses (note that this will not change how anything on the local cluster is run, just that external clusters will try to connect to this cluster using these addresses).

Cross Cluster Master

The cross cluster master is a single global standalone java process that must be reachable by each Alluxio cluster’s masters. This process will be responsible for cluster and mount discovery between the different Alluxio clusters. The process can be started on any node where Alluxio is installed using the command:

bin/alluxio-start.sh cross_cluster_master

By default, this process will be listening on port 20009, this can be changed using the property key alluxio.cross.cluster.master.rpc.port.

For Alluxio clusters to be able to connect to the cross cluster master they must set the property key alluxio.master.cross.cluster.rpc.addresses with the host:port RPC address of cross cluster master, for example alluxio.master.cross.cluster.rpc.addresses=cross-cluster-master-hostname:20009.

If a cluster is not able to connect to this process then it will not be able to add, remove, or update cross cluster enabled mount points. Paths already mounted will continue to be synchronized even if clusters are unable to communicate with this process at least until there is a master failover.

Logs for the cross cluster master can be found in logs/cross_cluster_master.log. See basic logging for more details about logging in Alluxio.

Command line options:

To see the currently registered clusters at the cross cluster master run the following command from any Alluxio cluster:

bin/alluxio fsadmin crossCluster listClusters

This will display the addresses that the different Alluxio clusters will use to communicate with each other as well as their existing cross cluster synchronized mount points.

To remove a registered cluster at the cross cluster master use the following command from any Alluxio cluster:

bin/alluxio fsadmin crossCluster removeCluster {cluster-id}

Where {cluster-id} is the value set by property key alluxio.master.cross.cluster.id at the cluster to be removed. Removing a running cluster will result in the cluster re-registering with the cross cluster master.

Note that shutting down a cluster will not automatically remove it from the list of cross cluster enabled mount points, thus this command can be used to remove the cluster if it will no longer be used with cross cluster synchronization.

Consistency

If the requirements are ensured, the state of the files in the clusters will be eventually consistent. Meaning that eventually the latest version of every file will be visible on every cluster. (See limitations below which may cause inconsistencies to occur).

The time it takes for an update from one cluster to be visible on another cluster will depend on the latency between the clusters and the UFS, the latency between the clusters themselves, as well as the current load on the clusters.

If a write operation happens outside Alluxio and the user wishes for this path to be synced, then there are two options:

  1. Perform a metadata sync manually by setting the -Dalluxio.user.file.metadata.sync.interval=0 flag when performing an operation (note that for cross cluster enabled mounts the alluxio.user.file.metadata.sync.interval value set in alluxio-site.properties is ignored, but if it is set for an individual command, or using path configurations (see the pathConf command) the included value will be used).
  2. Mark a path as needing synchronization by running the command bin/alluxio fs needsSync {path}. Following this, the next time this path is accessed by a file system operation, the metadata for this path will be synced (see the needsSync command). Note that this is a lightweight operation that will not sync the files immediately, only individual files/directories will be synced as they are accessed.

Performance tuning

There are several options that can be used for tuning the performance of the cross cluster synchronization:

  • alluxio.master.ufs.path.cache.capacity - When files are checked if needing synchronization with the UFS, a cache is checked whose size is given by this property key. This cache tracks the synchronization and invalidation times on a path level, using an LRU eviction scheme when full. When a path is evicted it may mark its parent as needing synchronization, thus increasing the size of this cache may reduce the number of spurious accesses to the UFS. The default size of this cache is 100000, larger values may result in increased memory usage on the masters. The size of a cache entry includes the string representation of the path as well as approximately 100 bytes of timing/object metadata.
  • alluxio.master.cross.cluster.invalidation.queue.size - When a cluster modifies a file, it sends an invalidation message to other clusters that have also mounted this file. This property sets the maximum number of invalidation messages that can be queued or in transit at the same time for each external mount. The default value is 10000, larger values may result in increased memory usage on the masters.
  • alluxio.master.cross.cluster.invalidation.queue.wait - If the size of the invalidation message queue from the above property is exceeded, then the value set here is the amount of time a thread will wait for there to be room in the queue for its invalidation message. If this time is exceeded then messages will be dropped which will result in extra synchronization operations to ensure the clusters are consistent. This can be thought of as a backpressure mechanism, which will slow down the publisher in case of instabilities in the network or at the receiver. If the publisher should absolutely not be slowed down then this value can be set to 0, but may result in the subscriber cluster performing additional synchronizations later. The default value is 1 second.

Limitations

  • Concurrent writes to the same file may result in the file being out of sync across clusters.
  • If a client encounters an error while writing a file and leaves the file in a partially complete state, the file may not be correctly synchronized across clusters.
  • Performing a mount, update mount, or unmount operation during periods of high cross cluster synchronization load may cause instability for the cross cluster operations. This is due to the mount operations blocking the cross cluster subscriptions while in progress.

Example configurations

Below are some example configurations using cross cluster sync.

For all of these configurations it is assumed that there is a node with address master-hostname1 where the cross cluster master is running, i.e. on this node the following command has been run:

bin/alluxio-start.sh cross_cluster_master

Here is an example of a configuration with two Alluxio clusters and a single S3 bucket to be synchronized across clusters as the root mount in Alluxio.

At one Alluxio cluster we have the following in the alluxio-site.properties file at all masters:

alluxio.master.cross.cluster.id=C1
alluxio.master.cross.cluster.rpc.addresses=master-hostname1:20009
alluxio.master.cross.cluster.enabled=true
alluxio.user.file.writetype.default=CACHE_THROUGH
alluxio.master.mount.table.root.cross.cluster=true
alluxio.master.mount.table.root.ufs=s3://my-bucket

At another Alluxio cluster we have the following in the alluxio-site.properties file at all masters:

alluxio.master.cross.cluster.id=C2
alluxio.master.cross.cluster.rpc.addresses=master-hostname1:20009
alluxio.master.cross.cluster.enabled=true
alluxio.user.file.writetype.default=CACHE_THROUGH
alluxio.master.mount.table.root.cross.cluster=true
alluxio.master.mount.table.root.ufs=s3://my-bucket

The clusters are now started by running the following command at each cluster.

bin/alluxio-start.sh all

Now any modification done through Alluxio to the S3 bucket s3://my-bucket will be synchronized between the clusters under their respective root mounts.

Here is an example of a configuration with two Alluxio clusters and a single S3 bucket to be synchronized across clusters as a nested mount in Alluxio.

At one Alluxio cluster we have the following in the alluxio-site.properties file at all masters:

alluxio.master.cross.cluster.id=C1
alluxio.master.cross.cluster.rpc.addresses=master-hostname1:20009
alluxio.master.cross.cluster.enabled=true
alluxio.user.file.writetype.default=CACHE_THROUGH

At another Alluxio cluster we have the following in the alluxio-site.properties file at all masters:

alluxio.master.cross.cluster.id=C2
alluxio.master.cross.cluster.rpc.addresses=master-hostname1:20009
alluxio.master.cross.cluster.enabled=true
alluxio.user.file.writetype.default=CACHE_THROUGH

The clusters are now started by running the following command at each cluster:

bin/alluxio-start.sh all

Now if we want to keep the S3 bucket s3://my-bucket synchronized between the clusters under the path /mnt, we run the command at both clusters:

bin/alluxio fs mount --crosscluster /mnt/ s3://my-bucket

Here is an example of a configuration with two Alluxio clusters and a single S3 bucket to be synchronized across clusters as the root mount in Alluxio and an HFDS cluster mounted as a nested mount.

At one Alluxio cluster we have the following in the alluxio-site.properties file at all masters:

alluxio.master.cross.cluster.id=C1
alluxio.master.cross.cluster.rpc.addresses=master-hostname1:20009
alluxio.master.cross.cluster.enabled=true
alluxio.user.file.writetype.default=CACHE_THROUGH
alluxio.master.mount.table.root.cross.cluster=true
alluxio.master.mount.table.root.ufs=s3://my-bucket

At another Alluxio cluster we have the following in the alluxio-site.properties file at all masters:

alluxio.master.cross.cluster.id=C2
alluxio.master.cross.cluster.rpc.addresses=master-hostname1:20009
alluxio.master.cross.cluster.enabled=true
alluxio.user.file.writetype.default=CACHE_THROUGH
alluxio.master.mount.table.root.cross.cluster=true
alluxio.master.mount.table.root.ufs=s3://my-bucket

The clusters are now started by running the following command at each cluster.

bin/alluxio-start.sh all

Now any modification done through Alluxio to the S3 bucket s3://my-bucket will be synchronized between the clusters under their respective root mounts. Additionally, the clusters wish to mount the HDFS UFS located at hdfs://hdfs-hostname:9000 at the local Alluxio path /mnt/hdfs and have updates synchronized across clusters. To do this, the following command is run at each cluster:

/bin/alluxio fs mount --crosscluster /mnt/hdfs hdfs://hdfs-hostname:9000

Here is an example of a configuration with two Alluxio clusters and a single S3 bucket to be synchronized across clusters as the root mount in Alluxio. In this case the first cluster wants to mount the full bucket s3://my-bucket while the second cluster only wants to mount the section of the bucket with the prefix s3://my-bucket/my-prefix.

At the first Alluxio cluster we have the following in the alluxio-site.properties file at all masters:

alluxio.master.cross.cluster.id=C1
alluxio.master.cross.cluster.rpc.addresses=master-hostname1:20009
alluxio.master.cross.cluster.enabled=true
alluxio.user.file.writetype.default=CACHE_THROUGH
alluxio.master.mount.table.root.cross.cluster=true
alluxio.master.mount.table.root.ufs=s3://my-bucket

At the second Alluxio cluster we have the following in the alluxio-site.properties file at all masters:

alluxio.master.cross.cluster.id=C2
alluxio.master.cross.cluster.rpc.addresses=master-hostname1:20009
alluxio.master.cross.cluster.enabled=true
alluxio.user.file.writetype.default=CACHE_THROUGH
alluxio.master.mount.table.root.cross.cluster=true
alluxio.master.mount.table.root.ufs=s3://my-bucket/my-prefix

The clusters are now started by running the following command at each cluster.

bin/alluxio-start.sh all

Now any modification done through Alluxio to the S3 bucket s3://my-bucket will be visible on the first cluster, and any modification under the prefix s3://my-bucket/my-prefix will be visible on the second cluster.

Here is an example of a configuration with two Alluxio clusters and a single S3 bucket to be synchronized across clusters as the root mount in Alluxio. In this case the clusters are located in different regions and must use different hostnames for communicating across regions.

Each cluster is running a high availability setup with three masters.

At one Alluxio cluster we have the following in the alluxio-site.properties file at all masters:

alluxio.master.cross.cluster.id=C1
alluxio.master.cross.cluster.rpc.addresses=master-hostname1:20009
alluxio.master.cross.cluster.enabled=true
alluxio.user.file.writetype.default=CACHE_THROUGH
alluxio.master.mount.table.root.cross.cluster=true
alluxio.master.mount.table.root.ufs=s3://my-bucket
alluxio.master.rpc.addresses=C1-master1:19998,C1-master2:19998,C1-master3:19998
alluxio.master.cross.cluster.external.rpc.addresses=C1-master1-public:19998,C1-master2-public:19998,C1-master3-public:19998

At another Alluxio cluster we have the following in the alluxio-site.properties file at all masters:

alluxio.master.cross.cluster.id=C2
alluxio.master.cross.cluster.rpc.addresses=master-hostname1:20009
alluxio.master.cross.cluster.enabled=true
alluxio.user.file.writetype.default=CACHE_THROUGH
alluxio.master.mount.table.root.cross.cluster=true
alluxio.master.mount.table.root.ufs=s3://my-bucket
alluxio.master.rpc.addresses=C2-master1:19998,C2-master2:19998,C2-master3:19998
alluxio.master.cross.cluster.external.rpc.addresses=C2-master1-public:19998,C2-master2-public:19998,C2-master3-public:19998

The clusters are now started by running the following command at each cluster.

bin/alluxio-start.sh all

Now any modification done through Alluxio to the S3 bucket s3://my-bucket will be synchronized between the clusters under their respective root mounts. Clusters will communicate with each other using their external hostnames (i.e. those with the suffix -public in this example).

Assume we are in the case of the previous example A two cluster setup with a synchronized S3 root mount where the S3 bucket s3://my-bucket is mounted as the root UFS mount at both clusters with cross cluster synchronization enabled.

Now assume some external workload has updated a set of file paths under the prefix s3://my-bucket/external-updated and these updates should be visible on both of the Alluxio clusters. On one of the clusters we can run the command:

bin/alluxio fs needsSync /external-updated

Now the next time any file under the path /external-updated is accessed in either Alluxio clusters the file metadata will be synced with the UFS.