Frequently Asked Questions
- What is Alluxio?
- What platforms and Java versions can Alluxio run on?
- What license is Alluxio under?
- Why is my analytics job not running faster after deploying Alluxio?
- Should I deploy Alluxio as a stand-alone system or through an orchestration framework?
- Which programming language does Alluxio support?
- What happens if my data set does not fit in memory?
- Does Alluxio support a high availability mode?
- Will Alluxio rebalance cached blocks to the newly added nodes in order to balance memory space utilization?
- Does Alluxio require HDFS?
- How can I learn more about Alluxio?
What is Alluxio?
Alluxio, formerly Tachyon, is an open source, memory speed, virtual distributed storage. It enables any application to interact with any data from any storage system at memory speed. Read more about Alluxio here.
What platforms and Java versions can Alluxio run on?
Alluxio requires JDK 1.8 or JDK 11 to run on various distributions of Linux / MacOS.
What license is Alluxio under?
Alluxio is open sourced under the Apache 2.0 license.
Why is my analytics job not running faster after deploying Alluxio?
Some possible reasons to consider:
- The job is computation bound and does not spend significant time reading or writing data. Because the bottleneck is not in I/O performance, the benefit from faster Alluxio I/O is small.
- The persistent storage is co-located with compute (e.g. Alluxio is connected to a local HDFS) and the input data of the job is in the OS buffer cache.
- Due to misconfiguration, clients are not able to identify their corresponding local Alluxio worker. This results in reading from remote Alluxio workers through the network, resulting in low data-locality.
- Input data is not loaded into Alluxio yet or already evicted, causing the job to read from the under storage instead of the Alluxio cache.
Should I deploy Alluxio as a stand-alone system or through an orchestration framework?
It is recommended to deploy Alluxio as a stand-alone system. Orchestration frameworks supported include:
Which programming language does Alluxio support?
Alluxio can be run as a FUSE mount exposing a POSIX API. This enables any program which normally accesses a local file system to access data from Alluxio without modification. This is a common way for applications written in non-Java languages or non-Hadoop APIs to access Alluxio data without needing to rewrite the application.
What happens if my data set does not fit in memory?
It is not required for the input data set to fit in Alluxio storage space in order for applications to work. Alluxio will transparently load data on demand from the under storage. To help fit more data in Alluxio’s storage space, configure Alluxio to leverage other storage resources such as SSD and HDD in addition to memory to extend Alluxio storage capacity. Read more about Alluxio storage setup here.
Does Alluxio support a high availability mode?
Yes. See instructions about Deploy Alluxio on a Cluster with HA.
Will Alluxio rebalance cached blocks to the newly added nodes in order to balance memory space utilization?
No, rebalancing of data blocks in Alluxio is not currently supported.
Does Alluxio require HDFS?
No, Alluxio can run on many under storage systems such as Amazon S3 or Swift in addition to HDFS.
How can I learn more about Alluxio?
Join the Alluxio community Slack Channel to chat with users and developers.
Read the Alluxio book to learn Alluxio comprehensively.
Read the recent blogs and presentations.
Join the meetup group for Alluxio at http://www.meetup.com/Alluxio/. Other Alluxio events can be found here.