Release Notes

Slack Docker Pulls

2.7.0-1.0

We are excited to announce the release of Alluxio Enterprise 2.7.0-1.0! This is the first release on the Alluxio Enterprise 2.7.X line.

Alluxio 2.7 further enhances the functionality and performance of machine learning and AI training workloads, serving as a key component in data pre-processing and training pipelines. Alluxio 2.7 also introduces improved scalability of job service operations by batching data management jobs. In addition, enhancements in master RPCs were added to enable launching clusters with a massive number of workers. We also improved fault tolerance capabilities in Alluxio 2.7.

Highlights

Deeper Integration with Machine Learning Workloads

Alluxio 2.7 emphasizes the integration with machine learning and AI training workloads. Alluxio 2.7 has been optimized specifically to support a large number of small files and highly concurrent data access. In addition, Alluxio 2.7 improves POSIX compatibility. We were able to improve training-related jobs such as data preloading, achieving faster data loading and supporting better scalability.

To ensure this integration is solid for this workload, there are multiple testing frameworks added in this release, including Alluxio embedded fuse stressbench, to prevent regression in functionality and performance in training workloads (documentation).

Alluxio 2.7 also contains improvements in Helm charts, CSI, and Fluid integration which help deploy Alluxio in Kubernetes clusters co-located with training jobs. More metrics and detailed docs are added to help users monitor, debug, and investigate issues with Alluxio POSIX API.

Improved Job Service Scalability

Job Service has become an essential service in many deployments and data processing pipelines involving Alluxio. Alluxio 2.7 introduces Batched Job to reduce the management overhead in the job master and allows the job service to handle an order of magnitude more total jobs and concurrent jobs (see configuration and documentation). This is necessary as users are increasingly using Job Service to perform tasks like distributedLoad and distributedCopy on a very large number of files.

In addition, on the client side, more efficient polling is implemented so that pinning and setReplication commands on large directories place less stress on the job service. This allows users to pin a larger set of files.

The load command is improved to use the new worker API to avoid extra data copy to the client.

Better Master RPC Scalability

Alluxio 2.7 introduces better cluster scalability by improving the efficiency in how workers register to the master upon startup (documentation). When a large number of workers register with the master at the same time, this can overwhelm the master by overconsuming memory, leading to the master being unresponsive.

Alluxio 2.7 introduces a master-side flow control mechanism, called “worker registration lease”. The master can now control how many workers may register at the same time, forcing additional workers to wait for their turn. This ensures a stable consumption of resources on the master process to avoid catastrophic situations.

Alluxio 2.7 also introduces a new option for the worker to break the request into a stream of smaller messages. Streaming will further smooth the master-side memory consumption registering workers, allowing higher concurrency and is more GC-friendly.

Increased Fault Tolerance

Impact of sudden network partitions and node crashes within the Alluxio HA cluster can be fatal to running applications. With intelligent retry support, Alluxio 2.7 now efficiently tracks each RPC in the journal in order for applications to have improved fault tolerance.

Support Transfer Alluxio Leadership during Runtime

When deploying a High Availability cluster using the Embedded Journal, users can now manually specify the leading master. This is useful when users want to debug or do maintenance on a server without killing an existing running master process. This new functionality transfers the leadership of the quorum gracefully to another master specified.

The docs show how to use this new feature.

Add Alluxio Stress Test Framework

Alluxio StressBench is a built-in tool to benchmark the performance of an Alluxio deployment without any extra services. Alluxio supports the following suites to benchmark:

Master RPC throughput (430e4a)

  • bin/alluxio runClass alluxio.stress.cli.RegisterWorkerBench
  • bin/alluxio runClass alluxio.stress.cli.WorkerHeartbeatBench
  • bin/alluxio runClass alluxio.stress.cli.GetPinnedFileIdsBench

Alluxio POSIX API read throughput (634bd32)

  • bin/alluxio runClass alluxio.stress.cli.fuse.FuseIOBench

Job Service throughput (0cae910)

  • bin/alluxio runClass alluxio.stress.cli.StressJobServiceBench

Enterprise Edition Highlights

Embedded Journal and RocksDb as default options

In Alluxio 2.0, embedded journal and RocksDB were introduced as alternative methods for Alluxio to journal metadata and store file system metadata respectively. These were initiatives to ensure Alluxio would be able to handle larger scales and decouple its dependencies on third party systems. After many iteration cycles of testing, debugging, and fixing critical issues, these two options are recommended as the default choices for their respective operational responsibilities in Alluxio 2.7.

To use the previous journal type, set alluxio.master.journal.type to UFS. To use the previous metastore type, set alluxio.master.metastore to HEAP. Without explicitly setting these property values, Alluxio 2.7 will use EMBEDDED and ROCKS for these two configuration properties.

Other Improvements

Docker, CSI, and K8s

  • Add toggle to Helm chart to enable or disable master & worker resources (17e0542)
  • Add ImagePullSecrets to Helm chart (c775c70)
  • Fix url address of the alive workers when workers start with kubernetes (3239740)
  • Add probe improvements to Helm chart (d7c252d)
  • Fix load command trigger cache block fail in container env (a25f0a1)
  • Add fall back logic for counting cpu in container is 1 (fe0420c)
  • Support CSI in both docker images (575d205)
  • Create new dockerfile with centos7 as base image (768d45c)
  • Improve docker image to decrease size (71f62c36)

Metrics

  • Fix master lost file count metric is always 0 (91fb695)
  • Fix worker cannot report worker cluster aggregated metrics issue (aa823e4)
  • Add worker and client metrics size to log info (cc50847)
  • Add worker threadPool and task metrics (fed20cd)
  • Add active job size metric (23f3a5d)
  • Fix the Worker.BlocksEvicted is zero all the time issue (492647b)
  • Add metric of audit log (a06d374)
  • Add metric of raft journal index (3df0ef4)
  • Add lock pool metrics (fb1f1b8)
  • Expose Prometheus metrics from all servers (1a6054ad)
  • Add metrics of Alluxio logging (0fba8bb)
  • Print web metrics servlet page as human-readable format (b1db0716)
  • Support export ratis metrics (a684440)
  • Add master LostFile and lost blocks metric (2238a64b)
  • Add metric of jvm pause monitor (ef4aaab2)
  • Add metric of Operating System (67e568ff4)
  • Add metadata cache metric (e2ee953)
  • Register journal sequence number metric (449d1ae9)
  • Support total block replica count metric (13ec038b)
  • Add metrics to track master RPC throughput (b2a40192)
  • Add read time metrics for client local cache (c4bd024)
  • ​​Improve the metric of shadow cache for relatively small working set (3422cb6)
  • Add Fuse data structure size metrics and improve metrics docs (d588684c)
  • Add user info to ufs block read metrics (fceea2a)
  • Support export jvm metric (cd6fc34)
  • Add cluster_cache_hit_rate metric (0d8d17b)
  • Add CacheContext to URIStatus to enable per-read metrics (3ce5298)
  • Implement Shadow Cache (d135ce1)

S3 API

  • Fix S3 bucket region (e76258f)
  • Implement S3 API features required to use basic Spark operations (23be470)
  • Support listObjectV2 for S3 rest api (23f5b85)
  • Fix S3 rest api listObjects delimiter process error (0892405)
  • Set S3 endpoint region and support OCI object storage (36a4d5c)
  • Fix S3 listBuckets rest api non-standard response issue (bf11633)
  • Add application/octet-stream to consumed content types in S3 Handler (0901eb6)

General improvements

  • Web UI pages and displayed metrics (33b8a24) (b20ef39) (73bf2ce) (4087a38) (11ae654) (af54be3)
  • Intellij support for Alluxio processes (71165a5) (2458871) (0a5669c) (69273af)
  • Implement configurable ExecutorService creation for Alluxio processes (0eb8e89)
  • Bump rocksdb version to 6.25.3 (b41b519)
  • Convert worker registration to a stream (7c020f6)
  • Add master-side flow control for worker registration with RegisterLease (e883136)
  • Replace ForkJoinPool with FixedThreadPool (0cd3b7e)
  • Add Batched job for job Service (dc95b3b)
  • Reset priorities after transfer of leadership (6a18760)
  • Implement Resource Leak Detection (ab5e505)
  • Add ReconfigurableRegistry and Reconfigurable (5e5b3e6)
  • Remove fall-back logic from journal appending (4f870de)
  • Add retry support for initializing AlluxioFileOutStream (2134192)
  • Allow higher keep-alive frequency for worker data server (bd7a958)
  • Enable more settings on job-master’s RPC server (b14f6e6)
  • Increase allowed keep-alive rate for master RPC server (c061b93)
  • Improve exception message in block list merger function (62a9958)
  • Implement FS operation tracking for improving partition tolerance (a9f121c)
  • Enable additional keep-alive configuration on gRPC servers (3f94a5b)
  • Introduce client-side keep-alive monitoring for channels of RPC network (e6cded4)
  • Support logical name as HA address (0526fae)
  • Support Huawei Cloud (PFS && OBS) (0cf5b91)
  • Introduce deadlines for sync RPC stubs (70112cb)
  • Use jobMasterRpcAddresses to judge ServiceType (b0eba95)
  • Support shuffle master addresses for rpc client (665960e)
  • Avoid redundant query for conf address (6012721)
  • Add container host information on worker page (e5e53e08)
  • Release workerInfoList when a job completes (a0c3c6a4)
  • Support web server for fuse process (83c16f67c)
  • Make umount fuse properly (2df83726
  • Provide entry points for providing java-based TLS security to gRPC Channels (ea49f3b31)
  • Count the number of successful and failed job in distributed job commands (2c792f987)
  • Allow Probes to configure in Helm Chart (4991e84)
  • Support list a specific status of job (fdf9d4f4)
  • Add doc on Presto and Iceberg (2a56d12)
  • Reduce the risk of sensitive information leak in rpc debug/error log (ea00090)
  • Add configuration of min and max election timeout (b26d200ca)
  • Support Fuse on Worker process in Kubernetes helm yaml files (d2e947243)
  • Create smaller alpine and centos development docker image (22ecb2c2)
  • Add property to skip listing broken symlinks on local UFS (b5f318e7a)
  • Update evictor(LRU) reference when get a page in LocalCacheManager (c9e396a3)
  • Close gRPC input stream when finished reading to speed up data loading in ML/DL workloads (4f7a8877)
  • Add soft kill option to all processes (df5dcab)
  • Invalidate cache after write operation (183552b)
  • Add Container Storage Interface (CSI) (3ec7ef9)
  • Improve server-side logging (1bd9345)
  • Disable RATIS’s RPC level safety (efbec94)
  • Support range header for s3 get object api (1d0c089)
  • Support using regex stand for schemeAuthority for udb mount options (62265d1)

Bugs

  • Resolve lock leak during recursive delete (ae281b5)
  • Fix startMasters() from receiving unchecked exceptions (8fce673c)
  • Close UFS in UfsJournal close (b2c256e3)
  • Add filtering to MergeJournalContext (f8b2cd1)
  • Fix distributedLoad after file is already loaded (924199f)
  • Save transfer of leadership from unchecked exceptions (146bbce)
  • Fix null pointer exception for AuditLog (6d2ce19)
  • Return directly when an exception is thrown during transfer leader (bddb70e)
  • Fix SnapshotLastIndex update error (fa99e3873)
  • Fix RetryHandlingJournalMasterClient (4de3acd)
  • Fix copyFromLocal command (1662dcf)
  • Fix the semantic ambiguity of LoadMetadataCommand (b7e67d5)
  • Fix recursive path can not be created bug when init multipart upload (28ee226c)
  • Catch errors during explicit cache cleanup (5339349)
  • Fix leader master priority always 0 (01347b2)
  • Fix process local read write client side logics and add unit tests (87e08e2)
  • Stop leaking state-lock when journal is closed (ef2d38f6)
  • Fix ArrayIndexOutOfBoundsException when using shared.caching.reader (f1f49e5ea)
  • Fix the job server or job worker starts failed (3f5b76da)
  • Fix job completion logging (80cf7ca)
  • Fix block count metrics (edb5169)
  • Fix last snapshot index in delegated backup (15c0838a)
  • Make quorum info command more expressive (8704ea1)
  • Handle some exceptional cases to prevent leaks (bd2f945e3)
  • Remove ramfs from size-checking condition(257da58)
  • Make the stopwatch thread-safe in readInternal (8e03d6d1c)
  • Fix the job server service hangs on when setting a no privileged path (6a0c01d)
  • Fix IndexOutOfBoundsException on async cache (6ad1e4f)
  • Fix client pool leak (9682207)
  • Fix async cache error (7d70714)
  • Fix NPE during swap-restore plan generation (030ce84)
  • Fix integer overflow in usedPercentage (3f73cf9)
  • Fix exception propagation for RPC connection retries (4705d65)