Configuration Settings

Slack Docker Pulls GitHub edit source

This page explains the configuration system of Alluxio and also provides recommendation on how to customize the configuration for Alluxio in different contexts.

Configuration in Alluxio

Alluxio runtime respects three sources of configuration settings:

  1. Application settings. Setting Alluxio configuration in this way is application-specific, and is required each time when running an application instance (e.g., a Spark job).
  2. Environment variables. This is an easy and fast way to set the basic properties to manage Alluxio servers and run Alluxio shell commands. Note that, configuration set through environment variables may not be realized by applications.
  3. Property files. This is a general approach to customize any supported Alluxio configuration properties. Configuration in those files can be respected by Alluxio servers, as well as applications.

The priority to load property values, from the highest to the lowest, is application settings (if any), environment variables, property files and the defaults.

Application settings

Alluxio shell users can use -Dkey=property to specify an Alluxio configuration value in commandline. For example,

bin/alluxio fs -Dalluxio.user.file.writetype.default=MUST_CACHE touch /foo

Spark users can add "-Dkey=property" to ${SPARK_DAEMON_JAVA_OPTS} in conf/spark-env.sh, or add it to spark.executor.extraJavaOptions (for Spark executors) and spark.driver.extraJavaOptions (for Spark drivers).

Hadoop MapReduce users can set "-Dkey=property" in hadoop jar command-lines to pass it down to Alluxio:

hadoop jar -Dalluxio.user.file.writetype.default=MUST_CACHE foo.jar

Note that, setting Alluxio configuration in this way is application specific and required for each job or command.

Environment variables

When you want to start Alluxio server processes, or use Alluxio command line interfaces with your specific configuration tuning, it is often fast and easy to set environment variables to customize basic Alluxio configuration. However, these environment variables will not affect application processes like Spark or MapReduce that use Alluxio as a client.

Alluxio supports a few basic and very frequently used configuration properties via the environment variables in conf/alluxio-env.sh, including:

Environment VariableMeaning
ALLUXIO_MASTER_HOSTNAME hostname of Alluxio master, defaults to localhost.
ALLUXIO_MASTER_ADDRESS deprecated by ALLUXIO_MASTER_HOSTNAME since version 1.1 and will be remove in version 2.0.
ALLUXIO_UNDERFS_ADDRESS under storage system address, defaults to ${ALLUXIO_HOME}/underFSStorage which is a local file system.
ALLUXIO_RAM_FOLDER the directory where a worker stores in-memory data, defaults to /mnt/ramdisk.
ALLUXIO_JAVA_OPTS Java VM options for both Master, Worker and Alluxio Shell configuration. Note that, by default ALLUXIO_JAVA_OPTS is included in both ALLUXIO_MASTER_JAVA_OPTS, ALLUXIO_WORKER_JAVA_OPTS and ALLUXIO_USER_JAVA_OPTS.
ALLUXIO_MASTER_JAVA_OPTS additional Java VM options for Master configuration.
ALLUXIO_WORKER_JAVA_OPTS additional Java VM options for Worker configuration.
ALLUXIO_USER_JAVA_OPTS additional Java VM options for Alluxio shell configuration.

For example, if you would like to setup an Alluxio master at localhost that talks to an HDFS cluster with a namenode also running at localhost, and enable Java remote debugging at port 7001, you can do so before starting master process using:

export ALLUXIO_MASTER_HOSTNAME="localhost"
export ALLUXIO_UNDERFS_ADDRESS="hdfs://localhost:9000"
export ALLUXIO_MASTER_JAVA_OPTS="$ALLUXIO_JAVA_OPTS -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=7001“

Users can either set these variables through shell or in conf/alluxio-env.sh. If this file does not exist yet, Alluxio can help you bootstrap the conf/alluxio-env.sh file by running

./bin/alluxio bootstrapConf <ALLUXIO_MASTER_HOSTNAME> [local|hdfs|s3|gcs|glusterfs|swift]

Alternatively, you can create one from a template we provided in the source code using:

cp conf/alluxio-env.sh.template conf/alluxio-env.sh

Property files

Alluxio site property file alluxio-site.properties can overwrite Alluxio configuration regardless the JVM is an Alluxio server process or a job using Alluxio client. For the site property file to be loaded, either the parent directory of this file is a part of the classpath of your target JVM process, or the file is in one of the pre-defined paths.

Using Alluxio supported environment variables has two limitations: first it only provides basic Alluxio settings, and second it does not affect non-Alluxio JVMs like Spark or MapReduce. To address them, Alluxio uses site property file alluxio-site.properties for users to customize all supported configuration properties, regardless of the JVM process. On startup, Alluxio runtime checks if the configuration property file exists and if so, it uses the content to override the default configuration. To be specific, it searches alluxio-site.properties in ${HOME}/.alluxio/, /etc/alluxio/ (can be customized by changing the default value of alluxio.site.conf.dir) and the classpath of the relevant Java VM process in order, and skips the remaining paths once a file is found.

For example, ${ALLUXIO_HOME}/conf/ is by default on the classpath of Alluxio master, worker and shell JVM processes. So you can simply create ${ALLUXIO_HOME}/conf/alluxio-site.properties by

cp conf/alluxio-site.properties.template conf/alluxio-site.properties

Then customize it to fit your configuration tuning needs to start Alluxio servers or to use Alluxio shell commands:

cp $ALLUXIO_HOME/conf/alluxio-site.properties.template $ALLUXIO_HOME/conf/alluxio-site.properties

For applications like Spark or MapReduce to use Alluxio property files, you can append the directory of your site property files to your application classpath. For example

export SPARK_CLASSPATH=${ALLUXIO_HOME}/conf:${SPARK_CLASSPATH} # for Spark jobs
export HADOOP_CLASSPATH=${ALLUXIO_HOME}/conf:${HADOOP_CLASSPATH} # for Hadoop jobs

Alternatively, with access to paths like /etc/, one can copy the site properties to /etc/alluxio/. This configuration will be shared across processes regardless the JVM is an Alluxio server or a job using Alluxio client.

Appendix

All Alluxio configuration properties fall into one of the six categories: Common (shared by Master and Worker), Master specific, Worker specific, User specific, Cluster specific (used for running Alluxio with cluster managers like Mesos and YARN), and Security specific (shared by Master, Worker, and User).

Common Configuration

The common configuration contains constants shared by different components.

Property NameDefaultMeaning
alluxio.conf.dir ${alluxio.home}/conf
alluxio.debug false
alluxio.home /mnt/alluxio_default_home
alluxio.logs.dir ${alluxio.home}/logs
alluxio.keyvalue.enabled false
alluxio.keyvalue.partition.size.bytes.max 512MB
alluxio.metrics.conf.file ${alluxio.conf.dir}/metrics.properties
alluxio.network.host.resolution.​timeout.ms 5000
alluxio.network.thrift.frame.​size.bytes.max 16MB
alluxio.site.conf.dir ${user.home}/.alluxio/,/etc/alluxio/
alluxio.test.mode false
alluxio.underfs.address ${alluxio.work.dir}/underFSStorage
alluxio.underfs.gcs.owner.id.to.username.mapping No default
alluxio.underfs.glusterfs.impl org.apache.hadoop.fs.glusterfs.​GlusterFileSystem
alluxio.underfs.glusterfs.mapred.​system.dir glusterfs:///mapred/system
alluxio.underfs.hdfs.configuration ${alluxio.conf.dir}/core-site.xml
alluxio.underfs.hdfs.impl org.apache.hadoop.hdfs.​DistributedFileSystem
alluxio.underfs.hdfs.prefixes hdfs://,glusterfs:///,maprfs:///
alluxio.underfs.hdfs.remote false
alluxio.underfs.listing.length 1000
alluxio.underfs.object.store.mount.shared.publicly false
alluxio.underfs.s3.owner.id.to.username.mapping No default
alluxio.underfs.s3.endpoint No default
alluxio.underfs.s3.proxy.host No default
alluxio.underfs.s3.proxy.https.only true
alluxio.underfs.s3.proxy.port No default
alluxio.underfs.s3.threads.max 40
alluxio.underfs.s3.admin.threads.max 20
alluxio.underfs.s3.upload.threads.max 20
alluxio.underfs.s3.disable.dns.buckets false
alluxio.underfs.s3a.consistency.timeout.ms 60000
alluxio.underfs.s3a.request.timeout.ms 60000
alluxio.underfs.s3a.secure.http.enabled false
alluxio.underfs.s3a.server.side.encryption.enabled false
alluxio.underfs.s3a.socket.timeout.ms 50000
alluxio.underfs.s3a.inherit_acl true
alluxio.web.resources ${alluxio.home}/core/server/src/main/webapp
alluxio.web.threads 1
alluxio.work.dir ${alluxio.home}
alluxio.zookeeper.address No default
alluxio.zookeeper.election.path /election
alluxio.zookeeper.enabled false
alluxio.zookeeper.leader.path /leader
alluxio.zookeeper.leader.inquiry.retry 10

Master Configuration

The master configuration specifies information regarding the master node, such as the address and the port number.

Property NameDefaultMeaning
alluxio.master.bind.host 0.0.0.0
alluxio.master.heartbeat.interval.ms 1000
alluxio.master.hostname localhost
alluxio.master.file.async.persist.handler alluxio.master.file.async.DefaultAsyncPersistHandler
alluxio.master.format.file_prefix "_format_"
alluxio.master.journal.flush.batch.time.ms 5
alluxio.master.journal.flush.timeout.ms 300000
alluxio.master.journal.folder ${alluxio.work.dir}/journal
alluxio.master.journal.formatter.class alluxio.master.journal.​ProtoBufJournalFormatter
alluxio.master.journal.log.size.bytes.max 10MB
alluxio.master.journal.tailer.​shutdown.quiet.wait.time.ms 5000
alluxio.master.journal.tailer.sleep.time.ms 1000
alluxio.master.lineage.checkpoint.interval.ms 600000
alluxio.master.lineage.checkpoint.class alluxio.master.lineage.checkpoint.​CheckpointLatestScheduler
alluxio.master.lineage.recompute.interval.ms 600000
alluxio.master.lineage.recompute.log.path ${alluxio.logs.dir}/recompute.log
alluxio.master.port 19998
alluxio.master.retry 29
alluxio.master.startup.consistency.check.enabled true
alluxio.master.ttl.checker.interval.ms 3600000
alluxio.master.web.bind.host 0.0.0.0
alluxio.master.web.hostname localhost
alluxio.master.web.port 19999
alluxio.master.whitelist /
alluxio.master.worker.threads.max 2048
alluxio.master.worker.threads.min 512
alluxio.master.worker.timeout.ms 300000
alluxio.master.tieredstore.global.levels 3
alluxio.master.tieredstore.global.level0.alias MEM
alluxio.master.tieredstore.global.level1.alias SSD
alluxio.master.tieredstore.global.level2.alias HDD
alluxio.master.keytab.file
alluxio.master.principal

Worker Configuration

The worker configuration specifies information regarding the worker nodes, such as the address and the port number.

Property NameDefaultMeaning
alluxio.worker.allocator.class alluxio.worker.block.allocator.​MaxFreeAllocator
alluxio.worker.bind.host 0.0.0.0
alluxio.worker.block.heartbeat.interval.ms 1000
alluxio.worker.block.heartbeat.timeout.ms 60000
alluxio.worker.block.threads.max 2048
alluxio.worker.block.threads.min 256
alluxio.worker.data.bind.host 0.0.0.0
alluxio.worker.data.folder /alluxioworker/
alluxio.worker.data.port 29999
alluxio.worker.data.server.class alluxio.worker.netty.​NettyDataServer
alluxio.worker.evictor.class alluxio.worker.block.​evictor.LRUEvictor
alluxio.worker.evictor.lrfu.attenuation.factor 2.0
alluxio.worker.evictor.lrfu.step.factor 0.25
alluxio.worker.file.persist.pool.size 64
alluxio.worker.filesystem.heartbeat.interval.ms 1000
alluxio.worker.hostname localhost
alluxio.worker.memory.size 128 MB
alluxio.worker.network.netty.boss.threads 1
alluxio.worker.network.netty.file.transfer MAPPED
alluxio.worker.network.netty.shutdown.quiet.period 2
alluxio.worker.network.netty.shutdown.timeout 15
alluxio.worker.network.netty.watermark.high 32768
alluxio.worker.network.netty.watermark.low 8192
alluxio.worker.network.netty.worker.threads 0
alluxio.worker.port 29998
alluxio.worker.session.timeout.ms 60000
alluxio.worker.tieredstore.block.lock.readers 1000
alluxio.worker.tieredstore.block.locks 1000
alluxio.worker.tieredstore.levels 1
alluxio.worker.tieredstore.level0.alias MEM
alluxio.worker.tieredstore.level0.dirs.path /mnt/ramdisk/
alluxio.worker.tieredstore.level0.dirs.quota ${alluxio.worker.memory.size}
alluxio.worker.tieredstore.level0.reserved.ratio 0.1
alluxio.worker.tieredstore.reserver.enabled false
alluxio.worker.tieredstore.reserver.interval.ms 1000
alluxio.worker.tieredstore.retry 3
alluxio.worker.web.bind.host 0.0.0.0
alluxio.worker.web.hostname localhost
alluxio.worker.web.port 30000
alluxio.worker.keytab.file
alluxio.worker.principal

User Configuration

The user configuration specifies values regarding file system access.

Property NameDefaultMeaning
alluxio.user.block.master.client.threads 10
alluxio.user.block.worker.client.threads 10
alluxio.user.block.remote.read.buffer.size.bytes 8 MB
alluxio.user.block.remote.reader.class alluxio.client.netty.​NettyRemoteBlockReader
alluxio.user.block.remote.writer.class alluxio.client.netty.​NettyRemoteBlockWriter
alluxio.user.block.size.bytes.default 512MB
alluxio.user.failed.space.request.limits 3
alluxio.user.file.buffer.bytes 1 MB
alluxio.user.file.cache.partially.read.block true
alluxio.user.file.master.client.threads 10
alluxio.user.file.waitcompleted.poll.ms 1000
alluxio.user.file.worker.client.threads 10
alluxio.user.file.write.location.policy.class alluxio.client.file.policy.LocalFirstPolicy
alluxio.user.file.readtype.default CACHE_PROMOTE
alluxio.user.file.writetype.default MUST_CACHE
alluxio.user.file.write.tier.default 0
alluxio.user.heartbeat.interval.ms 1000
alluxio.user.lineage.enabled false
alluxio.user.lineage.master.client.threads 10
alluxio.user.network.netty.timeout.ms 3000
alluxio.user.network.netty.worker.threads 0
alluxio.user.ufs.delegation.enabled true
alluxio.user.ufs.delegation.read.buffer.size.bytes 8MB
alluxio.user.ufs.delegation.write.buffer.size.bytes 2MB
alluxio.user.ufs.file.reader.class alluxio.client.netty.​NettyUnderFileSystemFileReader
alluxio.user.ufs.file.writer.class alluxio.client.netty.​NettyUnderFileSystemFileWriter
alluxio.user.packet.streaming.enabled false

Cluster Management

When running Alluxio with cluster managers like Mesos and YARN, Alluxio has additional configuration options.

Property NameDefaultMeaning
alluxio.integration.master.resource.cpu 1
alluxio.integration.master.resource.mem 1024 MB
alluxio.integration.mesos.executor.dependency.path http://downloads.alluxio.org/downloads/files/${alluxio.version}/alluxio-${alluxio.version}-bin.tar.gz
alluxio.integration.mesos.jdk.path jdk1.7.0_79
alluxio.integration.mesos.jdk.url https://alluxio-mesos.s3.amazonaws.com/jdk-7u79-linux-x64.tar.gz
alluxio.integration.mesos.master.name AlluxioMaster
alluxio.integration.mesos.master.node.count 1
alluxio.integration.mesos.principal alluxio
alluxio.integration.mesos.role *
alluxio.integration.mesos.secret
alluxio.integration.mesos.user
alluxio.integration.mesos.worker.name AlluxioWorker
alluxio.integration.worker.resource.cpu 1
alluxio.integration.worker.resource.mem 1024 MB
alluxio.integration.yarn.workers.per.host.max 1

Security Configuration

The security configuration specifies information regarding the security features, such as authentication and file permission. Properties for authentication take effect for master, worker, and user. Properties for file permission only take effect for master. See Security for more information about security features.

Property NameDefaultMeaning
alluxio.security.authentication.type SIMPLE
alluxio.security.authentication.socket.timeout.ms 600000
alluxio.security.authentication.custom.provider.class
alluxio.security.login.username
alluxio.security.authorization.permission.enabled true
alluxio.security.authorization.permission.umask 022
alluxio.security.authorization.permission.supergroup supergroup
alluxio.security.group.mapping.class alluxio.security.group.provider.​ShellBasedUnixGroupsMapping

Configure multihomed networks

Alluxio configuration provides a way to take advantage of multi-homed networks. If you have more than one NICs and you want your Alluxio master to listen on all NICs, you can specify alluxio.master.bind.host to be 0.0.0.0. As a result, Alluxio clients can reach the master node from connecting to any of its NIC. This is also the same case for other properties suffixed with bind.host.