Running Presto with Alluxio

Slack Docker Pulls GitHub edit source

This guide describes how to run Presto with Alluxio, so that you can easily use Presto to query Hive tables stored in Alluxio’s tiered storage.


The prerequisite for this part is that you have Java. And the Java version must use Java 8 Update 60 or higher (8u60+), 64-bit. Alluxio cluster should also be set up in accordance to these guides for either Local Mode or Cluster Mode.

Please Download Presto(this doc uses presto-0.170). Also, please complete Hive setup using Hive On Alluxio


Presto gets the database and table metadata information from Hive Metastore. At the same time, the file system location of table data is obtained from the table’s metadata entries. So you need to configure Presto on HDFS. In order to access HDFS, you need to add the Hadoop conf files (core-site.xml,hdfs-site.xml), and use hive.config.resources in file /<PATH_TO_PRESTO>/etc/catalog/ to point to the file’s location for every Presto worker.

Configure core-site.xml

You need to add the following configuration items to the core-site.xml configured in

  <description>The Alluxio AbstractFileSystem (Hadoop 2.x)</description>

If your Alluxio is HA, another configuration need to added:

  <description>The Alluxio FileSystem (Hadoop 1.x and 2.x) with fault tolerant support</description>

Configure additional Alluxio properties

Similar to above, add additional Alluxio properties to core-site.xml of Hadoop configuration in Hadoop directory on each node. For example, change alluxio.user.file.writetype.default from default MUST_CACHE to CACHE_THROUGH:


Alternatively, you can also append the path to to Presto’s JVM config at etc/jvm.config under Presto folder. The advantage of this approach is to have all the Alluxio properties set within the same file of


Also, it’s recommended to increase to a bigger value (e.g. 10 mins) to avoid the timeout failure when reading large files from remote worker.

Increase hive.max-split-size

Presto’s Hive integration uses the config hive.max-split-size to control the parallelism of the query. It’s recommended to set this size no less than Alluxio’s block size to avoid the read contention within the same block.

Distribute the Alluxio Client Jar

Distribute the Alluxio client jar to all worker nodes in Presto:

  • You must put Alluxio client jar /<PATH_TO_ALLUXIO>/client/presto/alluxio-1.5.0-presto-client.jar into Presto cluster’s worker directory $PRESTO_HOME/plugin/hive-hadoop2/ (For different versions of Hadoop, put the appropriate folder), And restart the process of coordinator and worker.

Alternatively, advanced users can choose to compile this client jar from the source code. Follow the instructs here and use the generated jar at /<PATH_TO_ALLUXIO>/core/client/runtime/target/alluxio-core-client-runtime-1.5.0-jar-with-dependencies.jar for the rest of this guide.

Presto cli examples

Create a table in Hive and load a file in local path into Hive:

You can download the data file from

hive> CREATE TABLE u_user (
userid INT,
age INT,
gender CHAR(1),
occupation STRING,
zipcode STRING)

hive> LOAD DATA LOCAL INPATH '<path_to_ml-100k>/u.user'

View Alluxio WebUI at http://master_hostname:19999 and you can see the directory and file Hive creates:


Using a single query:

/home/path/presto/presto-cli-0.170-executable.jar --server masterIp:prestoPort --execute "use default;select * from u_user limit 10;" --user username --debug

And you can see the query results from console:


Presto Server log: