Security

Slack Docker Pulls

This document describes the following security related features in Alluxio.

  1. User Authentication: Alluxio also supports SIMPLE, NOSASL and KERBEROS as authentication mechanisms. Alluxio filesystem will differentiate users accessing the service when the authentication mode is SIMPLE or KERBEROS. These modes are required for authorization.
  2. User Authorization: Alluxio filesystem supports Ranger based user authorization, when alluxio.security.authorization.permission.enabled=true and alluxio.security.authorization.plugins.enabled=true. Note that, authentication cannot be NOSASL as authorization requires user information. A security server needs to be started on the node where Alluxio master is running to enable authorization.
  3. Encryption: Alluxio supports TLS for network communication.

See Security specific configuration for different security properties.

Authentication

The authentication protocol is determined by the configuration property alluxio.security.authentication.type, with a default value of SIMPLE.

Alluxio can integrate with Third-Party OIDC provider to handle Third-Party Authentication

SIMPLE

Authentication is enabled when the authentication type is SIMPLE.

A client must identify itself with a username to the Alluxio service. If the property alluxio.security.login.username is set on the Alluxio client, its value will be used as the login user, otherwise, the login user is inferred from the operating system user executing the client process. The provided user information is attached to the corresponding metadata when the client creates directories or files.

NOSASL

Authentication is disabled when the authentication type is NOSASL.

The Alluxio service will ignore the user of the client and no user information will be attached to the corresponding metadata when the client creates directories or files.

CUSTOM

Authentication is enabled when the authentication type is CUSTOM.

Alluxio clients retrieves user information via the class provided by the alluxio.security.authentication.custom.provider.class property. The specified class must implement the interface alluxio.security.authentication.AuthenticationProvider.

This mode is currently experimental and should only be used in tests.

KERBEROS

Authentication is enabled and enforced via Kerberos. Kerberos is an authentication protocol that provides strong and mutual authentication between clients and servers.

The typical Kerberos principal format used for services is "primary/instance@REALM.COM". It is required to prepare the Kerberos principals as Alluxio service principal names (SPN).

It is recommended to use hostname-associated instance name of Alluxio service principals, such as: <alluxio-service-name>/<hostname>@REALM.COM. This way each Alluxio server node has a unique service principal. Note that the <hostname> in each principal must match the server (either master or worker) hostname.

On the other hand, Alluxio also supports cluster-wide unified instance name, like <alluxio-service-name>/<alluxio-cluster-name>@REALM.COM, so that all the Alluxio servers share the same principal. To use this feature, please set alluxio.security.kerberos.unified.instance.name=<alluxio-cluster-name>.

Kerberos authentication

Alluxio Enterprise Edition supports Java Kerberos for Kerberos authentication:

Please refer to Kerberos security setup instructions to set set up Alluxio with Java Kerberos enabled.

Alluxio system administrators are responsible for specifying the Alluxio servers Kerberos credentials, with principal name and keytab file. Alluxio clients need valid Kerberos credentials (either keytab files or local ticket cache) to access a Kerberos-enabled Alluxio cluster.

When KERBEROS authentication is enabled, the login user for different component is obtained as follows:

  1. For Alluxio servers, the login user is represented by the server-side Kerberos principal in alluxio.security.kerberos.server.principal. A corresponding keytab file must be specified in alluxio.security.kerberos.server.keytab.file. The Alluxio user shown in Alluxio namespace is the short name of the Kerberos principal, which excludes the hostname and realm part.

  2. For Alluxio clients, the login user is represented by the client-side Kerberos principal in alluxio.security.kerberos.client.principal. There are two ways for the Alluxio clients to login via Kerberos. One is specify a keytab file in alluxio.security.kerberos.client.keytab.file. The other way is to do kinit Kerberos login for the alluxio.security.kerberos.client.principal name on the client machine. Alluxio client first checks whether there is a valid alluxio.security.kerberos.client.keytab.file. If there is no valid keytab file which can login successfully, Alluxio client will fall back to find the login info in the ticket cache. If none of those Kerberos credentials exist, Alluxio client will throw a login failure, and ask the user to provide the keytab file, or login via kinit.

Connecting with Secure-HDFS

Note that in Alluxio Enterprise Edition the way to configure Alluxio with secure-HDFS is different from that in Alluxio Community Edition. alluxio.master.keytab.file, alluxio.master.principal, alluxio.worker.keytab.file and alluxio.worker.principal are not used. Please use alluxio.security.underfs.hdfs.kerberos.client.keytab.file and alluxio.security.underfs.hdfs.kerberos.client.principal instead. Note that an Alluxio server can authenticate to its clients using a (server) principal different from the (client) principal used by an Alluxio server to access secure HDFS. If alluxio.security.underfs.hdfs.kerberos.client.principal is not specified, then Alluxio falls back to using alluxio.security.kerberos.server.principal.

If you want to setup a Kerberos-enabled Alluxio cluster on top of Kerberos-enabled HDFS, please refer to “Kerberos-enabled Alluxio integration with Secure-HDFS” in Kerberos setup guide for more details.

Auth-to-local configuration

Alluxio supports configurable translation from Kerberos principal name to operating system user, via MIT Kerberos auth_to_local conf. To make it easier to configure together with HDFS, the syntax is the same as Hadoop auth_to_local. Please use the alluxio.security.kerberos.auth.to.local configuration property to set it up. By default the value is DEFAULT.

Authorization

Ranger enables administrator to centralize permission management for various resources. Alluxio supports using Ranger to manage and enforce access to directories and files.

Enable Authorization

To enable user authorization, please configure the follow properties:

# enables authorization
alluxio.security.authorization.plugins.enabled=true
alluxio.security.authorization.permission.enabled=true

In addition, Ranger plugin needs to be configured separately. Details can be found in Set up Ranger for Authorization but the following demonstrates a set of example configurations:

# enables ranger plugin
alluxio.security.authorization.plugin.name=ranger-2.1
alluxio.security.authorization.plugin.paths=<path>
alluxio.master.mount.table.root.option.alluxio.underfs.security.authorization.plugin.name=ranger-2.1
alluxio.master.mount.table.root.option.alluxio.underfs.security.authorization.plugin.paths=<path>

Start Security Server

An alluxio security server is required for Ranger based authorization. A security server has a gRPC port and an HTTP port respectively. Alluxio workers make RPCs to the security server to check the user permissions before an actual read/write operation is conducted. The HTTP port is used for certificate and public key distribution.

Start the security server on the node that runs alluxio master using the following command:

$ ./bin/alluxio process start security_server

If HA masters are used, A security server should be collocated with each master.

The security server starts the gRPC server at port 19995 and HTTP server at 19994 by default. The follwoing property can be changed to change the port:

alluxio.security_server.web.port=19994
alluxio.security.server.rpc.port=19995

On alluxio workers, the following configuration needs to be added so that workers are able to fetch public keys from the security server:

alluxio.security_server.jwks.address=http://{security_server_host_name}:19994/security/jwks.json

Set up Ranger for Authorization

Ranger enables administrator to centralize permission management for various resources. Alluxio supports using Ranger to manage and enforce access to directories and files.

There are two ways to use Ranger with Alluxio. User can use Ranger to directly manage Alluxio file system permissions, or configure Alluxio to enforce existing Ranger policies for HDFS under file systems. While it is possible to use Ranger to manage permissions for both Alluxio and under file systems, we don’t recommend enabling both at the same time because it can be confusing to reason about permissions over multiple sources of truth.

Managing Alluxio permissions with Ranger

First, make sure HDFS plugin is enabled in Ranger configuration. Follow the instruction in this page to set up a new HDFS repository for Alluxio. In the name node URL field, please put down the Alluxio service URI.

Copy core-site.xml, hdfs-site.xml, ranger-hdfs-security.xml, ranger-hdfs-audit.xml and ranger-policymgr-ssl.xml from /etc/hadoop/conf/ on HDFS name node to a directory in Alluxio master nodes. Update the configuration settings in ranger-hdfs-security.xml to use the new HDFS repository. Specifically:

  • Set ranger.plugin.hdfs.policy.cache.dir to a valid directory on Alluxio master nodes where you want to store the policy cache.
  • Set ranger.plugin.hdfs.policy.rest.ssl.config.file to point to the path of the ranger-policymgr-ssl.xml file on Alluxio master node.
  • Set ranger.plugin.hdfs.service.name to be the new HDFS repository name.
  • Verify that ranger.plugin.hdfs.policy.rest.url is pointing to the correct Ranger service URL.
  • Set xasecure.add-hadoop-authorization to true if you want Ranger to fallback to Alluxio default permission checker when a path is not managed by Ranger policy.

Configure Alluxio masters to use Ranger plugin for authorization. In alluxio-site.properties, add the following properties:

alluxio.security.authorization.plugins.enabled=true
alluxio.security.authorization.plugin.name=<plugin_name>
alluxio.security.authorization.plugin.paths=<your_ranger_plugin_configuration_files_location>

alluxio.security.authorization.plugin.name should be either ranger-hdp-2.5 or ranger-hdp-2.6 depending on your HDP cluster version. alluxio.security.authorization.plugin.paths should be the local directory path on Alluxio master where you put the Ranger configuration files.

Restart all Alluxio masters to apply the new configurations. Now you can add some policies to the Alluxio repository in Ranger and verify it taking effect in Alluxio.

If Hive is used to work with data on Alluxio, please add the Alluxio scheme to the value of property ranger.plugin.hive.urlauth.filesystem.schemes in Hive configuration:

ranger.plugin.hive.urlauth.filesystem.schemes=hdfs:,file:,wasb:,adl:,alluxio:

Enforcing existing Ranger policies for HDFS under file system

Alluxio can be configured to enforce existing Ranger policies on HDFS under filesystems. First, Copy core-site.xml, hdfs-site.xml, ranger-hdfs-security.xml, ranger-hdfs-audit.xml and ranger-policymgr-ssl.xml from /etc/hadoop/conf/ on HDFS name node to Alluxio master nodes. Update the configuration settings in ranger-hdfs-security.xml to use the new HDFS repository. Specifically:

  • Set ranger.plugin.hdfs.policy.cache.dir to a valid directory on Alluxio master nodes where you want to store the policy cache for this under file system.
  • Set ranger.plugin.hdfs.policy.rest.ssl.config.file to point to the path of the ranger-policymgr-ssl.xml file on Alluxio master node.
  • Verify that ranger.plugin.hdfs.policy.rest.url is pointing to the correct Ranger service URL.

Configure Alluxio masters to use Ranger plugin for authorization. In alluxio-site.properties, add the following properties:

alluxio.security.authorization.plugins.enabled=true

If the HDFS file system is mounted as root under file system, Add the following properties in alluxio-site.properties:

alluxio.master.mount.table.root.option.alluxio.underfs.security.authorization.plugin.name=<plugin_name>
alluxio.master.mount.table.root.option.alluxio.underfs.security.authorization.plugin.paths=<your_ranger_plugin_configuration_files_location>

alluxio.underfs.security.authorization.plugin.name should be either ranger-hdp-2.5 or ranger-hdp-2.6 depending on your HDP cluster version for Ranger service managing the under file system. alluxio.underfs.security.authorization.plugin.paths should be the local directory path on Alluxio master where you put the Ranger configuration files for the corresponding under file system.

Please note that Alluxio masters need to be reformatted and then restarted for this change to take effect.

If the HDFS file system is supposed to be mounted as a nested under filesystem using the alluxio fs mount command, please add the following parameters to your mount command:

--option alluxio.underfs.security.authorization.plugin.name=<plugin_name>
--option alluxio.underfs.security.authorization.plugin.paths=<your_ranger_plugin_configuration_files_location>

Encryption

Alluxio supports encryption of the network communication between services with TLS.

TLS Encryption for Network Communication

TLS is a cryptographic protocol that provides end-to-end security of data sent between applications over the Internet. It ensures the secure delivery of data over the Internet, avoiding possible eavesdropping and/or alteration of the content. For Alluxio network communication (RPCs, data transfers), Alluxio supports TLS encryption. In order to configure Alluxio to use TLS encryption, keystores and truststores must be created for Alluxio. A keystore is used by the server side of the TLS connection, and the truststore is used by the client side of the TLS connection.

Keystore

Alluxio servers (masters and workers) require a keystore in order to enable TLS. The keystore typically stores the key and certificate for the server. This keystore file must be readable by the OS user which launches the Alluxio server processes.

An example, self-signed keystore can be created like:

$ keytool -genkeypair -alias key -keyalg RSA -keysize 2048 -dname "cn=localhost, ou=Department, o=Company, l=City, st=State, c=US" -keystore /alluxio/keystore.jks -keypass keypass -storepass storepass

This will generate a keystore file to /alluxio/keystore.jks, with a key password of keypass and the keystore password as storepass.

Truststore

All clients of a TLS connection must have access to a truststore to trust all the certificates of the servers. Clients include Alluxio clients, as well as Alluxio workers (since Alluxio workers create client connections to the Alluxio master). The truststore stores the trusted certificates, and must be readable by the process initiating the client connection (clients, workers).

An example truststore (based on the previous keystore) can be created like:

$ keytool -export -alias key -keystore /alluxio/keystore.jks -storepass storepass -rfc -file selfsigned.cer
$ keytool -import -alias key -noprompt -file selfsigned.cer -keystore /alluxio/truststore.jks -storepass trustpass

The first command extracts the certificate from the previously created keystore (using the keystore password storepass). Then, the second command creates a truststore file using that extracted certificate, and saves the truststore to /alluxio/truststore.jks, with a truststore password of trustpass.

Configuring Alluxio servers and clients

Once the keystores and truststores are created for all the machines involved, Alluxio needs to be configured to understand how to access those files.

On Alluxio servers (masters and workers), you must add these properties to alluxio-site.properties:

# enables TLS
alluxio.network.tls.enabled=true
alluxio.network.tls.ssl.context.provider.classname=alluxio.emon.util.network.tls.EnterpriseSslContextProvider
# keystore properties for the server side of connections
alluxio.network.tls.keystore.path=/alluxio/keystore.jks
alluxio.network.tls.keystore.password=storepass
alluxio.network.tls.keystore.key.password=keypass
# truststore properties for the client side of connections (worker to master, or master to master for embedded journal)
alluxio.network.tls.truststore.path=/alluxio/truststore.jks
alluxio.network.tls.truststore.password=trustpass

The Alluxio servers can explicitly specify which TLS protocols to use with the parameter alluxio.network.tls.server.protocols. This can be set to a comma-separated list of TLS protocol names, for example: alluxio.network.tls.server.protocols=TLSv1.1,TLSv1.2. This is useful for restricting the servers from enabling certain TLS protocols, since by default, Java and Netty enable all supported protocols.

The Alluxio servers will use the secret key in the keystore, but sometimes, keystores contain multiple keys. If there are multiple keys in the keystore, the key to use must be specified by providing the alias name via alluxio.network.tls.keystore.alias. For example, if you want the servers to use the key with alias name serverkey, then the configuration can be set like alluxio.network.tls.keystore.alias=serverkey.

For the embedded journal, only a single certificate can be loaded from the truststore. If there truststore only has a single certificate, it will work. However, if the truststore contains multiple certificates, then the alias must be specified with alluxio.network.tls.truststore.alias. For example, if the alias name you want to use is cacert, the parameter should be set like alluxio.network.tls.truststore.alias=cacert.

Once the servers are configured, additional Alluxio clients need to be configured with the client side properties:

# enables TLS
alluxio.network.tls.enabled=true
alluxio.network.tls.ssl.context.provider.classname=alluxio.emon.util.network.tls.EnterpriseSslContextProvider
# truststore properties for the client side of connections (worker to master)
alluxio.network.tls.truststore.path=/alluxio/truststore.jks
alluxio.network.tls.truststore.password=trustpass

Setting these configuration properties will be dependent on the specific application or computation framework you are using.

Once the servers and clients are configured, all network communication will be encrypted with TLS.

TLS Encryption on Kubernetes

To enable TLS encryption on Alluxio network traffic on Kubernetes, see TLS Encryption on Alluxio in Kubernetes

Configuring Spark with Alluxio TLS enabled client

Spark users can use JVM system properties to set Alluxio properties on to Spark jobs by adding "-Dproperty=value" to spark.executor.extraJavaOptions for Spark executors and spark.driver.extraJavaOptions for Spark drivers. To enable the TLS connection for Alluxio client in Spark, you can set the client side properties in spark-default.conf as below:

spark.driver.extraJavaOptions -Dalluxio.network.tls.enabled=true -Dalluxio.network.tls.truststore.path=<TRUSTSTORE_PATH> -Dalluxio.network.tls.truststore.password=<TRUSTSTORE_PASSWORD>
spark.executor.extraJavaOptions -Dalluxio.network.tls.enabled=true -Dalluxio.network.tls.truststore.path=<TRUSTSTORE_PATH> -Dalluxio.network.tls.truststore.password=<TRUSTSTORE_PASSWORD>

Deployment

It is required to start Alluxio masters and workers using the same operating system user. In the case where there is a user mismatch, standby master health check, the command alluxio-start.sh all, and certain file operations may fail because of permission checks. Also make sure the alluxio.security_server.jwks.address property is configured on workers properly., otherwise workers cannot communicate with security server correctly.

Security servers need to be started on master nodes if authorization is enabled. Note that the command alluxio-start.sh all does not start security servers. You need to run the alluxio-start.sh security_server manually on each master node.