Deploy Alluxio Edge for Trino
This document describes how to deploy Alluxio Edge for Trino to work with AWS S3 as UFS. The key step is to place Alluxio jar files to the Trino path, update configuration files, then proceed with the normal deployment of Trino.
Prerequisites
- Your AWS S3 credentials
- Preparation of storage for Alluxio Edge: After identifying the storage mounted for Alluxio Edge cache, please note:
- The size of the local storage to be provisioned for Alluxio Edge
alluxio.user.client.cache.size
- The path where it is mounted
alluxio.user.client.cache.dirs
- The size of the local storage to be provisioned for Alluxio Edge
- Assuming you already have Trino in your environment, and set up similarly as
Trino deployment documentation. Note down the
installation directory of Trino. We will refer to it as
${TRINO_HOME}
throughout this document which you will need to update. - A running ETCD cluster; the set of endpoint URLs is a required configuration setting.
Request a trial version of Alluxio Edge.
Contact your Alluxio account representative at sales@alluxio.com
to request a trial version of Alluxio Edge.
Follow their instructions to download the installation tar file into the directory you prepared.
The tar file follows the naming convention alluxio-enterprise-edge-*.tar.gz
. For example, if the tarball is named
alluxio-enterprise-edge-1.1-6.0.0.tar.gz
, the alluxio version is edge-1.1-6.0.0
.
Package Alluxio Edge with Trino
Remove any old Alluxio client jar files from the Trino directories
Running a command like this will usually work:
$ find ${TRINO_HOME} -name alluxio*shaded* -exec rm {} \;
Extract Alluxio Edge jars and Place into the Trino Directories
Three Alluxio Edge Java JAR files must be installed on each Trino node.
First, extract 2 jar files from the tarball using this command:
$ tar xf alluxio-enterprise-edge-*.tar.gz alluxio-enterprise-edge-*/client/alluxio-*-client.jar
$ tar xf alluxio-enterprise-edge-*.tar.gz alluxio-enterprise-edge-*/assembly/alluxio-prod-*.jar
Depending on your the Trino version, the jars need to be copied to different destination.
- For Trino versions 434 or later, copy to
${TRINO_HOME}/plugin/<pluginName>/hdfs
- For Trino versions older than 434, copy to
${TRINO_HOME}/plugin/<pluginName>
The following example shows the copy commands for version 434+ into the plugin directories for hive, hudi, delta lake, and iceberg.
$ cp alluxio-enterprise-edge-*/*/alluxio-*-client.jar ${TRINO_HOME}/plugin/hive/hdfs
$ cp alluxio-enterprise-edge-*/*/alluxio-prod-*.jar ${TRINO_HOME}/plugin/hive/hdfs
$ cp alluxio-enterprise-edge-*/*/alluxio-*-client.jar ${TRINO_HOME}/plugin/hudi/hdfs
$ cp alluxio-enterprise-edge-*/*/alluxio-prod-*.jar ${TRINO_HOME}/plugin/hudi/hdfs
$ cp alluxio-enterprise-edge-*/*/alluxio-*-client.jar ${TRINO_HOME}/plugin/delta-lake/hdfs
$ cp alluxio-enterprise-edge-*/*/alluxio-prod-*.jar ${TRINO_HOME}/plugin/delta-lake/hdfs
$ cp alluxio-enterprise-edge-*/*/alluxio-*-client.jar ${TRINO_HOME}/plugin/iceberg/hdfs
$ cp alluxio-enterprise-edge-*/*/alluxio-prod-*.jar ${TRINO_HOME}/plugin/iceberg/hdfs
Then, extract the Alluxio Edge S3 under store filesystem integration JAR file using this command:
$ tar xf alluxio-enterprise-edge-*.tar.gz alluxio-enterprise-edge-*/lib/alluxio-underfs-s3a-*.jar
$ cp alluxio-enterprise-edge-*/*/alluxio-underfs-s3a-*.jar ${TRINO_HOME}/lib/.
Download the Prometheus jar
Prometheus is the recommended database for observing the metrics that Alluxio Edge emits.
If the JMX exporter for Prometheus is not already set up in your environment, you can download the Java agent
JAR from https://github.com/prometheus/jmx_exporter/releases .
The file is named similarly to jmx_prometheus_javaagent-0.20.0.jar
. Place it in the ${TRINO_HOME}/lib/
directory.
$ cp jmx_prometheus_javaagent-0.20.0.jar ${TRINO_HOME}/lib/.
Update Configurations
Update Alluxio Configurations in Trino JVM Config
You can configure Alluxio configuration properties via the Trino jvm.config
file, which is usually inside
${TRINO_HOME}/etc/
, with the following format
-Dalluxio.<property name>=<value>
Define the Alluxio Edge conf directory
For example, to set ${TRINO_HOME}/etc/alluxio/
as the configuration directory for Alluxio Edge under, you would do the following
# Reference the Alluxio property file
-Dalluxio.conf.dir=${TRINO_HOME}/etc/alluxio/
Enable the Alluxio Client Jar file to work with Java 17 as of Trino 390
-add-opens java.management/sun.management=ALL-UNNAMED
Configure Alluxio Edge metrics for Prometheus integration
Create jmx_export_config.yaml
in configuration directory ${TRINO_HOME}/etc/alluxio/
with the following sample content.
---
startDelaySeconds: 0
ssl: false
global:
scrape_interval: 15s
evaluation_interval: 15s
rules:
- pattern: ".*"
In the jvm.config
file, add the following:
# Setup Alluxio Edge cache metrics
-Dalluxio.metrics.conf.file=${TRINO_HOME}/etc/alluxio/alluxio-metrics.properties
-javaagent:${TRINO_HOME}/lib/jmx_prometheus_javaagent-0.20.0.jar=9696:${TRINO_HOME}/etc/alluxio/jmx_export_config.yaml
Create Alluxio Edge Properties File
Create the alluxio-site.properties
file and place it in the config directory set in previous step.
In our example, it is ${TRINO_HOME}/etc/alluxio/
.
alluxio.license
and alluxio.etcd.endpoints
are required properties.
# FILE: alluxio-site.properties
#
# DESC: This is the main Alluxio Edge properties file and should
# be placed in the Alluxio Edge config directory, for example ${TRINO_HOME}/etc/alluxio/
#
alluxio.license=<YOUR LICENSE STRING>
# ex. alluxio.etcd.endpoints=http://trino-edge-etcd1:2379,http://trino-edge-etcd2:2379,http://trino-edge-etcd3:2379
alluxio.etcd.endpoints=<YOUR_ETCD_ENDPOINTS>
#
# Insert additional configuration properties here
#
# end of file
Please refer to configuration settings for details.
Enable S3AFileSystem
In alluxio-site.properties
, specify the S3 bucket and access credentials.
# ex. alluxio.dora.client.ufs.root=s3a://myBucket
alluxio.dora.client.ufs.root=<YOUR_S3_URL>
s3a.accessKeyId=<MY_KEY_ID>
s3a.secretKey=<MY_SECRET_KEY>
Add the following to ${TRINO_HOME}/etc/alluxio/alluxio-core-site.xml
to include the fs.s3a.impl
property to ensure that Trino
uses the S3AFileSystem when a Hive table LOCATION is set to the s3a://
scheme.
<!-- Enable the Alluxio Edge Cache Integration for s3 URIs -->
<property>
<name>fs.s3.impl</name>
<value>alluxio.hadoop.FileSystem</value>
</property>
<!-- Enable the Alluxio Edge Cache Integration for s3a URIs -->
<property>
<name>fs.s3a.impl</name>
<value>alluxio.hadoop.FileSystem</value>
</property>
Update Catalogs
Configure the Trino catalog (such as HIVE and Delta Lake catalog) to reference the Alluxio enabled
core-site.xml
file to the resources. You can likely find the catalog files in ${TRINO_HOME}/etc/catalog/
.
Some optional configurations
hive.non-managed-table-writes-enabled=true
hive.s3select-pushdown.enabled=true
hive.storage-format=PARQUET
hive.allow-drop-table=true
Deploy Trino
Now you can relaunch Trino with Alluxio Edge.