Release Notes
November 16, 2022
This is the first release on the Alluxio 2.9.X line. This release introduces a feature for fine-grained caching of data, metrics for monitoring the master process health, changes to the default configuration to better handle master failover and journal backups. Multiple improvements and fixes were also made for the S3 API, helm charts, and POSIX API.
Highlights
Paging Storage on Workers
The Alluxio workers support fine-grained page-level caching, typically at the 1 MB size, as an alternative to the existing block-based tiered caching, which defaults to 64 MB. Through this feature, caching performance will be improved by reducing amplification of data read by applications. See the documentation for more details.
Master Monitoring Metrics
The Alluxio master periodically checks its resource usage, including CPU and memory usage, and several internal data structures that are performance critical. By inspecting snapshots of resource utilization metrics, the state of the system can be inferred, which can be retrieved by inspecting the master.system.status
metric. The possible statuses are:
IDLE
ACTIVE
STRESSED
OVERLOADED
The monitoring indicators describe the system status in a heuristic way to have a basic understanding of its load. See the documentation for more information about monitoring.
Journal and Failover Stability
The default configuration as of 2.9.0 skips the block integrity check upon master startup and failover (a493b69e2d). This speeds up failovers considerably to minimize system downtime during master leadership transfers. Instead, block integrity checks will be performed in the background periodically so as not to interfere with normal master operations.
Another default configuration change will delegate the journal backup operation to a standby master (e3ed7b674f) so as to not block the leading master’s operations for an extended period of time. Use the --allow-leader
flag to allow the leading master to also take a backup or force the leader to take a backup with the --bypass-delegation
flag. See the documentation for additional information about backup delegation.
Improvements and Bugfixes Since 2.8.1
Notable Configuration Property Changes
Property Key | Old 2.8.1 value | New 2.9.0 value |
---|---|---|
alluxio.master.metrics.heap.enabled |
true | false |
alluxio.master.periodic.block.integrity.check.repair |
false | true |
alluxio.master.startup.block.integrity.check.enabled |
true | false |
Metadata and Journal
- Add CLI for marking a path as needing sync with UFS (1c781f7de1)
- Make metadata sync work with merge inode journal feature flag (7ff9df2789)
- Make journal context thread safe (f5b2a5f438)
- Fix error in snapshot-taking when using large group ids (2803dc4603)
- Mark root as needing sync on backup restore (0fe867ac72)
- Make
MountTable.State
thread safe (45ce753499) - Avoid canceling duplicate metadata sync prefetch job (aacee53fbc)
- Support multithread checkpointing with compression/decompression (ae065e34b9)
- Avoid and dedup concurrent metadata sync (025ca19d09)
- Refine add
mPendingPaths
inInodeSyncStream
new type (749f70cd1a) - Fix single master embedded journal checkpoint (a115fa3d5b)
- Create inode before updating the
MountTable
(56d1c6a9bc) - Allow root sync path to use child sync time (4d6bce3d5f)
- Add client operation for partial listing (bc6e63f7f8)
- Determine the primary master address by calling
GetNodeState
endpoint (c63ec951e4) - Upgrade Apache Ratis from 2.0.0 to 2.3.0 (dc0a21daed)
- Merge journals & flush journals before lock release (8dafc272be)
- Add Partial listing of files in
listStatus
(5f50dd8ab3) - Improve error handling and naming on journal threads (e69adee025)
- Add Partial listing of files in
listStatus
(ec0ce2a656) - Fix journal shutdown deadlock (155a370fbe)
- Allow writing to read-only file when creating (fe27139f6c)
- Avoid checking file permissions in
getFileInfo
method (54494af052) - Fix master down when master change to leader (1ada2ac8c7)
Cache and Storage
- Implement Byte array pool (d071d5ef7c)
- Add block size for paged blocks (fd865e24a1)
- Make worker init tiers parallel (2af80f6e19)
- Fix early release of buffer (5e8d62c8a7)
- Separate page store configuration from client cache (f6cce53631)
- Optimize
getFileBlockLocations
performance (e5692d2261) - Ignore parent path
NoSuchFileException
whenlocalPageStore
deletepageId
(6a64a178b2) - Support size encoding for clock cuckoo filter in shadowcache (7395bed318)
- Do not add the worker to failed list for client exception (951f3568a2)
- Fix the leak of block lock (b97677e78e)
- Fix failures in mem page store when zero copy enabled (cbc2008b19)
- Fix load
sessionId
(d763b4077e) - Fix paged block reader transfer offset (463506e178)
- Implement locking and pinning for paged block store (9c67f34682)
- Allocate buffer in load api (b93535cb6a)
- Fix worker stream register forget release lease (2df21b9aee)
- Fix paged block store tier name (92c0a4fb73)
- Fix potential deadlock in tier store and refine the code (d1efe52f44)
- Fix
MaxFreeAllocator.allocateBlock
(21be94c923) - Disable passive cache for pinned files (0e53f84b7c)
- Fix out of bound read in
PagedBlockReader
(63318bb3bb)
S3 API and Proxy
- Fix out of bound error in parsing s3 Authorization header (5f95931723)
- Implement S3 headbucket API (0236480f58)
- Fix double checked locking in S3 uploader (cdbfa096b2)
- Fix special char support (1ea5a840ac)
- Fix aws s3 cp with source object having special characters in it (7eba438a6f)
- Respect prefix param to avoid recursive
ls
on root dir (619089a7ea) - Fix S3 API file mode bits to prevent unauthorized reads (87602c7a71)
- Add empty string check for delimiter (de0bebfeed)
- Extract Authentication and common logging into the specific filters (e5f9b8d7a0)
- Fix access control issues with S3 API metadata directory (7ab414ed06)
- Update
ListMultipartUploads
to prevent leaking other users’ upload IDs (9364fd0190) - Require valid “Authorization” header in S3 API Proxy (2f3c42cf8e)
- Fix S3 API writing objects yields
BucketNotFound 404
(72f68208d4) - Add s3 rest service audit log (fdcba75c7e)
Kubernetes and Docker
- Improve Helm Chart (71260634f8)
- Fix generating proxy templates in helm-chart (4bcde90623)
- Enable java 11 in Alluxio base image (c1c40bf442)
- Remove CSI client in Helm chart (a65627debb)
- Change Dockerfile to use CentOS instead of Alpine (298df80a47)
- Remove alluxio-fuse-client (fa34430b5e)
FUSE/POSIX API
- Modify Libfuse version configuration (496e91069b)
- Fix fuse mount options and refactor path cache loader (9cfba09aa0)
- Make configuration source of truth in AlluxioFuse (3a26baeabc)
- Fix alluxio-fuse unmount to get the right pid when no options (49022a5443)
- Support setting sleep time for alluxio-fuse mount (ca8db364cf)
- Avoid
chown
if the file already has correct owner and group (a4a84ee0bf) - Fix fuse check file name length method name (d3dee12ce4)
UFS
- Add support for Azure Data lake Gen2 MSI (5dfa1789c6)
- Add support for OFS schema name (cecaa37744)
- Add configuration for kerberos authentication for ufs HDFS (72f6763f13)
- Delete temporary files when uploading files to OBS fails (65a8084709)
- Fix Ozone mount failure (4cc34dff2b)
CLI
- Support ignore delete mount point directory by ttl action (2603b609ed)
- Support strict version match option for mount (2ed18f54a3)
- Support
getMountTable
without invoke ufs (86e2f8210f) - Support record audit log for
getMountTable
op (82aa6633ff)
Error and Exception Handling
- Make worker error propagate to client (9cee334eb2)
- Fix worker swallow OOM (aae5a02a5f)
- Support failover worker while reading (5de0314361)
- Filter exception that need to be retried in
ObjectUnderFileSystem
(d5ea085afc) - Force metadata sync when data read fails due to out-of-range error (badca18e3f)
- Catch runtime exception in rpc (4b3fac5dd1)
- Update worker exception (6982d6c759)
Metrics and Monitoring
- Fix gauges when creating a new rpc server (7a4e35240f)
- Add an overloaded check according to the JVM pause time (d064cbff66)
- Add direct mem used metrics (fea89c61e1)
- Initialize
AuditLog
writer inWebServer
for proxy (1c24dc3425) - Add some metrics of threads and docs of worker
CacheManager
threads (e07e17d510)
StressBench and MicroBench
- Fix misuse of variable in
computeMaxThroughput
(5e544aa421) - Add POSIX API to
StressClientIOBench
(f6345919a1) - Fix
clientIO
stressbench throughput calculation (7422dfb209) - Make
clientIO
a multi-node test (94f3703c7f) - Support multiple files random and sequential read in
StressWorkerBench
(8e1a25df2c) - Implement Alluxio POSIX API master stressbench test (b4af5969ae)
- Add microbenchmarks for multiple implementations of
BlockStore
(5828d56a82)
Deprecations
- Clean up ignored table unit/integration tests and maven (8875c66ed4)
- Remove ufs extension (9fb093c10f)
- Remove conf & doc for tiered locality (11c1c7c5bf)
- Remove Configuration and CLI of Alluxio table (c0571e72a7)
Miscellaneous
- Add
LANG
toalluxio-env.sh
(e842df2719) - Allow dot in
chown
username or group (9b04c8040f) - Support Long type config values (d61aee0992)
- Ensure the HadoopFS default port is the same across all Hadoop fs (25e4301a7b)
- Bump up maven frontend plugin for m1 arm support (b69bf4df59)
Acknowledgements
We want to thank the community for their valuable contributions to the Alluxio 2.9.0 release. Especially, we would like to thank:
Bob Bai (bobbai00), Haoning Sun (Haoning-Sun), Jie Fu (DamonFool), Li Simian (LDawns), LiuJiahao0001, Shuai Wuyue (shuaiwuyue), XiChen (xichen01), Xinli Shang (shangxinli), XuanlinGuan, adol001, bigxiaochu, dangxiaodong (smdxdxd), Tianbao Ding (flaming-archer), Baolong Mao (maobaolong), Lei Qian (qian0817), Zhaoqun Deng (secfree), Yanbin Zhang (singer-bin), Xinyu Deng (voddle), Yangchen Ye (YangchenYe323), and Zhigang Huang (zerorclover)
Enjoy the new release and look forward to hearing your feedback on our Community Slack Channel.