We are extremely excited to announce the release of Alluxio Enterprise Edition 2.1.0-1.0!
This release contains a variety of improvements ranging from user experience, bug fixes, and major performance improvements. A list of changes can be found below.
Alluxio Structured Data Service
This release includes a new Alluxio subsystem for managing and transforming structured data. Structured data comes in the form of databases, tables, and partitions. It is the backbone of many companies’ analytics systems.
Alluxio is entering this space in order to provide performance improvements more than just raw I/O. With this release a simple command can transform data in raw formats (such as CSV) into Parquet files, a much more compact and performant file format which is more suitable for running queries on systems such as Presto. With Alluxio Enterprise Edition, data in tables can be sorted by columns to achieve even higher performance for interactive queries.
Additionally, the release artifacts now include a Presto connector which can be used to connect your Presto cluster to Alluxio.
Kubernetes Helm Chart Deployment
Thanks to community contributions, and a special thanks to the Alibaba Cloud Kubernetes Team a more robust set of Kubernetes templates as well as a helm chart for Kubernetes deployment is now included with Alluxio. Thanks to the team and Alluxio maintainers for this great improvement! Read more about how to use Helm Chart to deploy Alluxio in the Kubernetes environment.
Access Time Based Policy (Enterprise Edition Only)
Alluxio now supports definition data migration policies using access time based condition. Policies defined with this type of condition will be triggered when the last access time of a path is older or newer than a certain period, allowing data management on a per-use basis. Read more about how to set up access time based policies in the documentation.
Transparent URI (Enterprise Edition Only)
Alluxio users often have existing applications accessing their existing storage systems. Previously, users have to change URLs in their applications in order to access the data through Alluxio. In this release, we introduce transparent URI which allows users to access their existing storage systems through Alluxio without changing URIs at application level. Read more about setting up Transparent URI in the documentation.
GCS Version 2 (Enterprise Edition Only)
GCS version 2 is a newer version of Google cloud storage and Alluxio integration using Google Cloud API. It supports using Google application credentials as authentication way instead of interoperability access/secret keypair and has better read/write performance. Read more about setting up GCS version 2 integration in the documentation.
Support for Google Dataproc
A public Google Dataproc init action is now available for users to deploy Alluxio with Google Cloud. Read more about how to deploy Alluxio with Google Dataproc in the documentation.
Reduction of Default Block Size
In this release the default block size in Alluxio has been reduced from 512MB to 64MB. By decreasing the block size, evictions on workers will evict lesser-used data with finer granularity. This can improve use cases where the block size is relatively large compared to the block size of files.
Trial License (Enterprise Edition Only)
Starting from this release, a 30-day trial license is included in the downloaded distribution. The trial license allows users to evaluate Alluxio Enterprise Edition in a test environment during the trial period without having to manually acquire a license.
- The embedded journal quorum now utilizes a gRPC-based transport for its RPCs (4095a1c11c)
- Process launching can now be done in the foreground (603c6fc291)
- Add CLI options for listing last access time (d1d1adffab)
- Remove deprecated FaultTolerantFileSystem or alluxio-ft:// (3d7d18dbe2)
- Reduce the default block size to 64 MB from 512 MB (9d7338cb58)
- Docker containers now handle signals properly (8238b1e2b6)
- Hadoop 3.2 is now supported as a UFS (dcfd9cfc1f)
- Parallelism option for distributedLoad (f5b70fd71f)
- Support for Google Dataproc (ae33402852)
- Support for true owner and group with FUSE (a3987c6527)
- SCM revision option as a part of alluxio version command (2e67f3be2b)
- Removal of MapR support (86e8061ee2)
- Blacklist paths for files written with ASYNC_THROUGH (a31ee0a10a)
- Upgrade OSS UFS client dependency to 3.6.0 (76aa706215)
- Show progress when taking and applying backups: 5d61a3deb5)
- Add a command to report policy status
- Add a command option to specify a name when adding a policy
- Add a configuration property to limit the size of policy action schedule queue
- Add a configuration property to turn off policy engine
- Automatically sync UFS metadata when performing a policy scan
- Improve UFS migration policy nested directory removal efficiency
- Add ACL support in Ranger plugin
- Remove Kerberos service name property and infer from server principal
- NPE when UFS journal shutdown fails (f7c8c2e316)
- Remove query parameters in when browsing in UI (9a6cc464ed)
- NPE on Embedded journal shutdown (cf14aefd6f)
- Properly close journal when stopping (fd9b6f942c)
- Avoid exception when audit log is used with NOSASL authentication (0e3a152f96)
- Sync cache should consider syncing ancestors (b880cef284)
- Properly handle interrupt in DynamicResourcePool (ad5c15d6a5)
- Properly handle interrupts on various heartbeats (8d2a6ec179)
- Properly handle recovering from a journal UFS error (be4e9cb1f4)
- Remove duplicate entries in BlockInfo tab of worker WEBUI (bd00866095)
- Prevent data loss when writing with ASYNC_THROUGH (b69e73de1e)
- Fix union UFS status check APIs
- Fix alluxio.underfs.security.authorization.plugin.name property check