Security
This document describes the following security related features in Alluxio.
- User Authentication:
Alluxio also supports
SIMPLE
,NOSASL
andKERBEROS
as authentication mechanisms. Alluxio filesystem will differentiate users accessing the service when the authentication mode isSIMPLE
orKERBEROS
. These modes are required for authorization. - User Authorization:
Alluxio filesystem supports Ranger based user authorization,
when
alluxio.security.authorization.permission.enabled=true
andalluxio.security.authorization.plugins.enabled=true
. Note that, authentication cannot beNOSASL
as authorization requires user information. A security server needs to be started on the node where Alluxio master is running to enable authorization. - Encryption: Alluxio supports TLS for network communication.
See Security specific configuration for different security properties.
Authentication
The authentication protocol is determined by the configuration property
alluxio.security.authentication.type
, with a default value of SIMPLE
.
Alluxio can integrate with Third-Party OIDC provider to handle Third-Party Authentication
SIMPLE
Authentication is enabled when the authentication type is SIMPLE
.
A client must identify itself with a username to the Alluxio service.
If the property alluxio.security.login.username
is set on the Alluxio client, its value will be
used as the login user, otherwise, the login user is inferred from the operating system user
executing the client process.
The provided user information is attached to the corresponding metadata when the client creates
directories or files.
NOSASL
Authentication is disabled when the authentication type is NOSASL
.
The Alluxio service will ignore the user of the client and no user information will be attached to the corresponding metadata when the client creates directories or files.
CUSTOM
Authentication is enabled when the authentication type is CUSTOM
.
Alluxio clients retrieves user information via the class provided by the
alluxio.security.authentication.custom.provider.class
property.
The specified class must implement the interface alluxio.security.authentication.AuthenticationProvider
.
This mode is currently experimental and should only be used in tests.
KERBEROS
Authentication is enabled and enforced via Kerberos. Kerberos is an authentication protocol that provides strong and mutual authentication between clients and servers.
The typical Kerberos principal format used for services is "primary/instance@REALM.COM"
.
It is required to prepare the Kerberos principals as Alluxio service principal names (SPN).
It is recommended to use hostname-associated instance name of Alluxio service principals, such as:
<alluxio-service-name>/<hostname>@REALM.COM
. This way each Alluxio server node has a unique
service principal. Note that the <hostname>
in each principal must match the server (either master
or worker) hostname.
On the other hand, Alluxio also supports cluster-wide unified instance name, like
<alluxio-service-name>/<alluxio-cluster-name>@REALM.COM
, so that all the Alluxio servers
share the same principal. To use this feature, please set
alluxio.security.kerberos.unified.instance.name=<alluxio-cluster-name>
.
Kerberos authentication
Alluxio Enterprise Edition supports Java Kerberos for Kerberos authentication:
Please refer to Kerberos security setup instructions to set set up Alluxio with Java Kerberos enabled.
Alluxio system administrators are responsible for specifying the Alluxio servers Kerberos credentials, with principal name and keytab file. Alluxio clients need valid Kerberos credentials (either keytab files or local ticket cache) to access a Kerberos-enabled Alluxio cluster.
When KERBEROS authentication is enabled, the login user for different component is obtained as follows:
-
For Alluxio servers, the login user is represented by the server-side Kerberos principal in
alluxio.security.kerberos.server.principal
. A corresponding keytab file must be specified inalluxio.security.kerberos.server.keytab.file
. The Alluxio user shown in Alluxio namespace is the short name of the Kerberos principal, which excludes the hostname and realm part. -
For Alluxio clients, the login user is represented by the client-side Kerberos principal in
alluxio.security.kerberos.client.principal
. There are two ways for the Alluxio clients to login via Kerberos. One is specify a keytab file inalluxio.security.kerberos.client.keytab.file
. The other way is to dokinit
Kerberos login for thealluxio.security.kerberos.client.principal
name on the client machine. Alluxio client first checks whether there is a validalluxio.security.kerberos.client.keytab.file
. If there is no valid keytab file which can login successfully, Alluxio client will fall back to find the login info in the ticket cache. If none of those Kerberos credentials exist, Alluxio client will throw a login failure, and ask the user to provide the keytab file, or login viakinit
.
Connecting with Secure-HDFS
Note that in Alluxio Enterprise Edition the way to configure Alluxio with secure-HDFS is different from
that in Alluxio Community Edition. alluxio.master.keytab.file
, alluxio.master.principal
,
alluxio.worker.keytab.file
and alluxio.worker.principal
are not used. Please use
alluxio.security.underfs.hdfs.kerberos.client.keytab.file
and
alluxio.security.underfs.hdfs.kerberos.client.principal
instead.
Note that an Alluxio server can authenticate to its clients using a (server) principal different from
the (client) principal used by an Alluxio server to access secure HDFS.
If alluxio.security.underfs.hdfs.kerberos.client.principal
is not specified, then Alluxio falls back to
using alluxio.security.kerberos.server.principal
.
If you want to setup a Kerberos-enabled Alluxio cluster on top of Kerberos-enabled HDFS, please refer to “Kerberos-enabled Alluxio integration with Secure-HDFS” in Kerberos setup guide for more details.
Auth-to-local configuration
Alluxio supports configurable translation from Kerberos principal name to operating system user,
via MIT Kerberos auth_to_local
conf. To make it easier to configure together with HDFS, the syntax
is the same as Hadoop auth_to_local.
Please use the alluxio.security.kerberos.auth.to.local
configuration property to set it up.
By default the value is DEFAULT
.
Authorization
Ranger enables administrator to centralize permission management for various resources. Alluxio supports using Ranger to manage and enforce access to directories and files.
Enable Authorization
To enable user authorization, please configure the follow properties:
# enables authorization
alluxio.security.authorization.plugins.enabled=true
alluxio.security.authorization.permission.enabled=true
In addition, Ranger plugin needs to be configured separately. Details can be found in Set up Ranger for Authorization but the following demonstrates a set of example configurations:
# enables ranger plugin
alluxio.security.authorization.plugin.name=ranger-2.1
alluxio.security.authorization.plugin.paths=<path>
alluxio.master.mount.table.root.option.alluxio.underfs.security.authorization.plugin.name=ranger-2.1
alluxio.master.mount.table.root.option.alluxio.underfs.security.authorization.plugin.paths=<path>
Start Security Server
An alluxio security server is required for Ranger based authorization. A security server has a gRPC port and an HTTP port respectively. Alluxio workers make RPCs to the security server to check the user permissions before an actual read/write operation is conducted. The HTTP port is used for certificate and public key distribution.
Start the security server on the node that runs alluxio master using the following command:
$ ./bin/alluxio process start security_server
If HA masters are used, A security server should be collocated with each master.
The security server starts the gRPC server at port 19995 and HTTP server at 19994 by default. The follwoing property can be changed to change the port:
alluxio.security_server.web.port=19994
alluxio.security.server.rpc.port=19995
On alluxio workers, the following configuration needs to be added so that workers are able to fetch public keys from the security server:
alluxio.security_server.jwks.address=http://{security_server_host_name}:19994/security/jwks.json
Set up Ranger for Authorization
Ranger enables administrator to centralize permission management for various resources. Alluxio supports using Ranger to manage and enforce access to directories and files.
There are two ways to use Ranger with Alluxio. User can use Ranger to directly manage Alluxio file system permissions, or configure Alluxio to enforce existing Ranger policies for HDFS under file systems. While it is possible to use Ranger to manage permissions for both Alluxio and under file systems, we don’t recommend enabling both at the same time because it can be confusing to reason about permissions over multiple sources of truth.
Managing Alluxio permissions with Ranger
First, make sure HDFS plugin is enabled in Ranger configuration. Follow the instruction in this page to set up a new HDFS repository for Alluxio. In the name node URL field, please put down the Alluxio service URI.
Copy core-site.xml
, hdfs-site.xml
, ranger-hdfs-security.xml
, ranger-hdfs-audit.xml
and ranger-policymgr-ssl.xml
from /etc/hadoop/conf/
on HDFS name node to a directory in Alluxio master nodes. Update the configuration
settings in ranger-hdfs-security.xml
to use the new HDFS repository. Specifically:
- Set
ranger.plugin.hdfs.policy.cache.dir
to a valid directory on Alluxio master nodes where you want to store the policy cache. - Set
ranger.plugin.hdfs.policy.rest.ssl.config.file
to point to the path of theranger-policymgr-ssl.xml
file on Alluxio master node. - Set
ranger.plugin.hdfs.service.name
to be the new HDFS repository name. - Verify that
ranger.plugin.hdfs.policy.rest.url
is pointing to the correct Ranger service URL. - Set
xasecure.add-hadoop-authorization
to true if you want Ranger to fallback to Alluxio default permission checker when a path is not managed by Ranger policy.
Configure Alluxio masters to use Ranger plugin for authorization. In alluxio-site.properties
, add the
following properties:
alluxio.security.authorization.plugins.enabled=true
alluxio.security.authorization.plugin.name=<plugin_name>
alluxio.security.authorization.plugin.paths=<your_ranger_plugin_configuration_files_location>
alluxio.security.authorization.plugin.name
should be either ranger-hdp-2.5
or ranger-hdp-2.6
depending on
your HDP cluster version.
alluxio.security.authorization.plugin.paths
should be the local directory path on Alluxio master where you
put the Ranger configuration files.
Restart all Alluxio masters to apply the new configurations. Now you can add some policies to the Alluxio repository in Ranger and verify it taking effect in Alluxio.
If Hive is used to work with data on Alluxio, please add the Alluxio scheme to the value of property
ranger.plugin.hive.urlauth.filesystem.schemes
in Hive configuration:
ranger.plugin.hive.urlauth.filesystem.schemes=hdfs:,file:,wasb:,adl:,alluxio:
Enforcing existing Ranger policies for HDFS under file system
Alluxio can be configured to enforce existing Ranger policies on HDFS under filesystems. First,
Copy core-site.xml
, hdfs-site.xml
, ranger-hdfs-security.xml
, ranger-hdfs-audit.xml
and ranger-policymgr-ssl.xml
from /etc/hadoop/conf/
on HDFS name node to Alluxio master nodes. Update the configuration settings in
ranger-hdfs-security.xml
to use the new HDFS repository. Specifically:
- Set
ranger.plugin.hdfs.policy.cache.dir
to a valid directory on Alluxio master nodes where you want to store the policy cache for this under file system. - Set
ranger.plugin.hdfs.policy.rest.ssl.config.file
to point to the path of theranger-policymgr-ssl.xml
file on Alluxio master node. - Verify that
ranger.plugin.hdfs.policy.rest.url
is pointing to the correct Ranger service URL.
Configure Alluxio masters to use Ranger plugin for authorization. In alluxio-site.properties
, add the following properties:
alluxio.security.authorization.plugins.enabled=true
If the HDFS file system is mounted as root under file system, Add the following properties in alluxio-site.properties
:
alluxio.master.mount.table.root.option.alluxio.underfs.security.authorization.plugin.name=<plugin_name>
alluxio.master.mount.table.root.option.alluxio.underfs.security.authorization.plugin.paths=<your_ranger_plugin_configuration_files_location>
alluxio.underfs.security.authorization.plugin.name
should be either ranger-hdp-2.5
or ranger-hdp-2.6
depending on
your HDP cluster version for Ranger service managing the under file system.
alluxio.underfs.security.authorization.plugin.paths
should be the local directory path on Alluxio master where you
put the Ranger configuration files for the corresponding under file system.
Please note that Alluxio masters need to be reformatted and then restarted for this change to take effect.
If the HDFS file system is supposed to be mounted as a nested under filesystem using the alluxio fs mount
command,
please add the following parameters to your mount
command:
--option alluxio.underfs.security.authorization.plugin.name=<plugin_name>
--option alluxio.underfs.security.authorization.plugin.paths=<your_ranger_plugin_configuration_files_location>
Encryption
Alluxio supports encryption of the network communication between services with TLS.
TLS Encryption for Network Communication
TLS is a cryptographic protocol that provides end-to-end security of data sent between applications over the Internet. It ensures the secure delivery of data over the Internet, avoiding possible eavesdropping and/or alteration of the content. For Alluxio network communication (RPCs, data transfers), Alluxio supports TLS encryption. In order to configure Alluxio to use TLS encryption, keystores and truststores must be created for Alluxio. A keystore is used by the server side of the TLS connection, and the truststore is used by the client side of the TLS connection.
Keystore
Alluxio servers (masters and workers) require a keystore in order to enable TLS. The keystore typically stores the key and certificate for the server. This keystore file must be readable by the OS user which launches the Alluxio server processes.
An example, self-signed keystore can be created like:
$ keytool -genkeypair -alias key -keyalg RSA -keysize 2048 -dname "cn=localhost, ou=Department, o=Company, l=City, st=State, c=US" -keystore /alluxio/keystore.jks -keypass keypass -storepass storepass
This will generate a keystore file to /alluxio/keystore.jks
, with a key password of keypass
and the keystore password as storepass
.
Truststore
All clients of a TLS connection must have access to a truststore to trust all the certificates of the servers. Clients include Alluxio clients, as well as Alluxio workers (since Alluxio workers create client connections to the Alluxio master). The truststore stores the trusted certificates, and must be readable by the process initiating the client connection (clients, workers).
An example truststore (based on the previous keystore) can be created like:
$ keytool -export -alias key -keystore /alluxio/keystore.jks -storepass storepass -rfc -file selfsigned.cer
$ keytool -import -alias key -noprompt -file selfsigned.cer -keystore /alluxio/truststore.jks -storepass trustpass
The first command extracts the certificate from the previously created keystore (using the keystore
password storepass
). Then, the second command creates a truststore file using that extracted
certificate, and saves the truststore to /alluxio/truststore.jks
, with a truststore password of
trustpass
.
Configuring Alluxio servers and clients
Once the keystores and truststores are created for all the machines involved, Alluxio needs to be configured to understand how to access those files.
On Alluxio servers (masters and workers), you must add these properties to
alluxio-site.properties
:
# enables TLS
alluxio.network.tls.enabled=true
alluxio.network.tls.ssl.context.provider.classname=alluxio.emon.util.network.tls.EnterpriseSslContextProvider
# keystore properties for the server side of connections
alluxio.network.tls.keystore.path=/alluxio/keystore.jks
alluxio.network.tls.keystore.password=storepass
alluxio.network.tls.keystore.key.password=keypass
# truststore properties for the client side of connections (worker to master, or master to master for embedded journal)
alluxio.network.tls.truststore.path=/alluxio/truststore.jks
alluxio.network.tls.truststore.password=trustpass
The Alluxio servers can explicitly specify which TLS protocols to use with the parameter
alluxio.network.tls.server.protocols
.
This can be set to a comma-separated list of TLS protocol names, for example:
alluxio.network.tls.server.protocols=TLSv1.1,TLSv1.2
.
This is useful for restricting the servers from enabling certain TLS protocols, since by default,
Java and Netty enable all supported protocols.
The Alluxio servers will use the secret key in the keystore, but sometimes, keystores contain multiple keys.
If there are multiple keys in the keystore, the key to use must be specified by providing the alias name via
alluxio.network.tls.keystore.alias
.
For example, if you want the servers to use the key with alias name serverkey
, then the configuration can be set like
alluxio.network.tls.keystore.alias=serverkey
.
For the embedded journal, only a single certificate can be loaded from the truststore. If there truststore only has
a single certificate, it will work. However, if the truststore contains multiple certificates, then the alias must be
specified with alluxio.network.tls.truststore.alias
. For example, if the alias name you want to use is cacert
,
the parameter should be set like alluxio.network.tls.truststore.alias=cacert
.
Once the servers are configured, additional Alluxio clients need to be configured with the client side properties:
# enables TLS
alluxio.network.tls.enabled=true
alluxio.network.tls.ssl.context.provider.classname=alluxio.emon.util.network.tls.EnterpriseSslContextProvider
# truststore properties for the client side of connections (worker to master)
alluxio.network.tls.truststore.path=/alluxio/truststore.jks
alluxio.network.tls.truststore.password=trustpass
Setting these configuration properties will be dependent on the specific application or computation framework you are using.
Once the servers and clients are configured, all network communication will be encrypted with TLS.
TLS Encryption on Kubernetes
To enable TLS encryption on Alluxio network traffic on Kubernetes, see TLS Encryption on Alluxio in Kubernetes
Configuring Spark with Alluxio TLS enabled client
Spark users can use JVM system properties to set Alluxio properties on to Spark jobs by
adding "-Dproperty=value"
to spark.executor.extraJavaOptions
for Spark executors and
spark.driver.extraJavaOptions
for Spark drivers. To enable the TLS connection for Alluxio client
in Spark, you can set the client side properties in spark-default.conf
as below:
spark.driver.extraJavaOptions -Dalluxio.network.tls.enabled=true -Dalluxio.network.tls.truststore.path=<TRUSTSTORE_PATH> -Dalluxio.network.tls.truststore.password=<TRUSTSTORE_PASSWORD>
spark.executor.extraJavaOptions -Dalluxio.network.tls.enabled=true -Dalluxio.network.tls.truststore.path=<TRUSTSTORE_PATH> -Dalluxio.network.tls.truststore.password=<TRUSTSTORE_PASSWORD>
Deployment
It is required to start Alluxio masters and workers using the same operating system user.
In the case where there is a user mismatch, standby master health check,
the command alluxio-start.sh all
, and certain file operations may fail because of
permission checks. Also make sure the alluxio.security_server.jwks.address
property
is configured on workers properly., otherwise workers cannot communicate with security server correctly.
Security servers need to be started on master nodes if authorization is enabled.
Note that the command alluxio-start.sh all
does not start security servers.
You need to run the alluxio-start.sh security_server
manually on each master node.