Security
- Authentication
- Authorization
- Access Control Lists
- Data Path Authorization
- Client-Side Hadoop Impersonation
- Auditing
- Encryption
- Credential Management
- Storage Integration Access Token Framework
- Deployment
This document describes the following security related features in Alluxio.
- User Authentication:
Alluxio filesystem will differentiate users accessing the service
when the authentication mode is
SIMPLE
. Alluxio also supportsNOSASL
andKERBEROS
as authentication mechanisms. Having authentication mode to beSIMPLE
, orKERBEROS
is required for authorization. - User Authorization:
Alluxio filesystem will grant or deny user access based on the requesting user and
the POSIX permissions model of the files or directories to access,
when
alluxio.security.authorization.permission.enabled=true
. Note that, authentication cannot beNOSASL
as authorization requires user information. - Access Control Lists: In addition to the POSIX permission model, Alluxio implements an Access Control List (ACL) model similar to those found in Linux and HDFS. The ACL model is more flexible and allows administrators to manage any user or group’s permissions to any file system object.
- Client-Side Hadoop Impersonation: Alluxio supports client-side Hadoop impersonation so the Alluxio client can access Alluxio on the behalf of the Hadoop user. This can be useful if the Alluxio client is part of an existing Hadoop service.
- Auditing: If enabled, the Alluxio filesystem writes an audit log for all user accesses.
- Encryption: Alluxio supports TLS for network communication and encryption at rest.
See Security specific configuration for different security properties.
Authentication
The authentication protocol is determined by the configuration property
alluxio.security.authentication.type
, with a default value of SIMPLE
.
Alluxio can integrate with Third-Party OIDC provider to handle Third-Party Authentication
SIMPLE
Authentication is enabled when the authentication type is SIMPLE
.
A client must identify itself with a username to the Alluxio service.
If the property alluxio.security.login.username
is set on the Alluxio client, its value will be
used as the login user, otherwise, the login user is inferred from the operating system user
executing the client process.
The provided user information is attached to the corresponding metadata when the client creates
directories or files.
NOSASL
Authentication is disabled when the authentication type is NOSASL
.
The Alluxio service will ignore the user of the client and no user information will be attached to the corresponding metadata when the client creates directories or files.
CUSTOM
Authentication is enabled when the authentication type is CUSTOM
.
Alluxio clients retrieves user information via the class provided by the
alluxio.security.authentication.custom.provider.class
property.
The specified class must implement the interface alluxio.security.authentication.AuthenticationProvider
.
This mode is currently experimental and should only be used in tests.
KERBEROS
Authentication is enabled and enforced via Kerberos. Kerberos is an authentication protocol that provides strong and mutual authentication between clients and servers.
The typical Kerberos principal format used for services is "primary/instance@REALM.COM"
.
It is required to prepare the Kerberos principals as Alluxio service principal names (SPN).
It is recommended to use hostname-associated instance name of Alluxio service principals, such as:
<alluxio-service-name>/<hostname>@REALM.COM
. This way each Alluxio server node has a unique
service principal. Note that the <hostname>
in each principal must match the server (either master
or worker) hostname.
On the other hand, Alluxio also supports cluster-wide unified instance name, like
<alluxio-service-name>/<alluxio-cluster-name>@REALM.COM
, so that all the Alluxio servers
share the same principal. To use this feature, please set
alluxio.security.kerberos.unified.instance.name=<alluxio-cluster-name>
.
Kerberos authentication
Alluxio Enterprise Edition supports Java Kerberos for Kerberos authentication:
Please refer to Kerberos security setup instructions to set set up Alluxio with Java Kerberos enabled.
Alluxio system administrators are responsible for specifying the Alluxio servers Kerberos credentials, with principal name and keytab file. Alluxio clients need valid Kerberos credentials (either keytab files or local ticket cache) to access a Kerberos-enabled Alluxio cluster.
When KERBEROS authentication is enabled, the login user for different component is obtained as follows:
-
For Alluxio servers, the login user is represented by the server-side Kerberos principal in
alluxio.security.kerberos.server.principal
. A corresponding keytab file must be specified inalluxio.security.kerberos.server.keytab.file
. The Alluxio user shown in Alluxio namespace is the short name of the Kerberos principal, which excludes the hostname and realm part. -
For Alluxio clients, the login user is represented by the client-side Kerberos principal in
alluxio.security.kerberos.client.principal
. There are two ways for the Alluxio clients to login via Kerberos. One is specify a keytab file inalluxio.security.kerberos.client.keytab.file
. The other way is to dokinit
Kerberos login for thealluxio.security.kerberos.client.principal
name on the client machine. Alluxio client first checks whether there is a validalluxio.security.kerberos.client.keytab.file
. If there is no valid keytab file which can login successfully, Alluxio client will fall back to find the login info in the ticket cache. If none of those Kerberos credentials exist, Alluxio client will throw a login failure, and ask the user to provide the keytab file, or login viakinit
.
Connecting with Secure-HDFS
Note that in Alluxio Enterprise Edition the way to configure Alluxio with secure-HDFS is different from
that in Alluxio Community Edition. alluxio.master.keytab.file
, alluxio.master.principal
,
alluxio.worker.keytab.file
and alluxio.worker.principal
are not used. Please use
alluxio.security.underfs.hdfs.kerberos.client.keytab.file
and
alluxio.security.underfs.hdfs.kerberos.client.principal
instead.
Note that an Alluxio server can authenticate to its clients using a (server) principal different from
the (client) principal used by an Alluxio server to access secure HDFS.
If alluxio.security.underfs.hdfs.kerberos.client.principal
is not specified, then Alluxio falls back to
using alluxio.security.kerberos.server.principal
.
If you want to setup a Kerberos-enabled Alluxio cluster on top of Kerberos-enabled HDFS, please refer to “Kerberos-enabled Alluxio integration with Secure-HDFS” in Kerberos setup guide for more details.
Auth-to-local configuration
Alluxio supports configurable translation from Kerberos principal name to operating system user,
via MIT Kerberos auth_to_local
conf. To make it easier to configure together with HDFS, the syntax
is the same as Hadoop auth_to_local.
Please use the alluxio.security.kerberos.auth.to.local
configuration property to set it up.
By default the value is DEFAULT
.
Authorization
The Alluxio filesystem implements a permissions model similar to the POSIX permissions model.
Each file and directory is associated with:
- An owner, which is the user of the client process to create the file or directory.
- A group, which is the group fetched from user-groups-mapping service. See User group mapping.
- Permissions, which consist of three parts:
- Owner permission defines the access privileges of the file owner
- Group permission defines the access privileges of the owning group
- Other permission defines the access privileges of all users that are not in any of above two classes
Each permission has three actions:
- read (r)
- write (w)
- execute (x)
For files:
- Read permissions are required to read files
- Write permission are required to write files
For directories:
- Read permissions are required to list its contents
- Write permissions are required to create, rename, or delete files or directories under it
- Execute permissions are required to access a child of the directory
The output of the ls
shell command when authorization is enabled looks like:
$ ./bin/alluxio fs ls /
drwxr-xr-x jack staff 24 PERSISTED 06-14-2019 07:02:45:248 DIR /default_tests_files
-rw-r--r-- jack staff 80 NOT_PERSISTED 06-14-2019 07:02:26:487 100% /default_tests_files/BASIC_CACHE_PROMOTE_MUST_CACHE
User group mapping
For a given user, the list of groups is determined by a group mapping service, configured by
the alluxio.security.group.mapping.class
property, with a default implementation of
alluxio.security.group.provider.ShellBasedUnixGroupsMapping
.
This implementation executes the groups
shell command on the local machine
to fetch the group memberships of a particular user.
Running the groups
command for every query may be expensive, so
the user group mapping is cached, with an expiration period configured by the
alluxio.security.group.mapping.cache.timeout
property, with a default value of 60s
.
If set to a value of 0
, the caching is disabled.
If the cache timeout is too low or disabled, the groups
command will be run very frequently, and
may increase latency for operations.
If the cache timeout is too high, the groups
command will not be run frequently, but the cached
results may become stale.
Alluxio has super user, a user with special privileges typically needed to administer and maintain the system.
The super user is the operating system user executing the Alluxio master process.
The alluxio.security.authorization.permission.supergroup
property defines a super group.
Any additional operating system users belong to this operating system group are also super users.
The default value is supergroup
.
LDAP
If your organization use OpenLDAP or Active Directory to manage identities, it is recommended to sync LDAP users and groups to machines’ operating system running Alluxio. Alternatively, Alluxio also supports direct connection to OpenLDAP or Active Directory for group mapping service. To use the Alluxio integration with OpenLDAP or Active Directory, see examples below.
Example configuration for an LDAP server without SSL:
alluxio.security.group.mapping.class=alluxio.security.group.provider.LdapGroupsMapping
alluxio.security.group.mapping.ldap.url=ldap://example.com:389
alluxio.security.group.mapping.ldap.base=cn=Users,dc=example,dc=com
alluxio.security.group.mapping.ldap.bind.user=cn=alluxio,cn=Users,dc=example,dc=com
alluxio.security.group.mapping.ldap.bind.password=secret
Example configuration for an LDAP server with SSL:
alluxio.security.group.mapping.class=alluxio.security.group.provider.LdapGroupsMapping
alluxio.security.group.mapping.ldap.url=ldaps://example.com:636
alluxio.security.group.mapping.ldap.ssl=true
alluxio.security.group.mapping.ldap.ssl.keystore=/path/to/ldap.jks
alluxio.security.group.mapping.ldap.ssl.keystore.password=secret
alluxio.security.group.mapping.ldap.base=cn=Users,dc=example,dc=com
alluxio.security.group.mapping.ldap.bind.user=cn=alluxio,cn=Users,dc=example,dc=com
alluxio.security.group.mapping.ldap.bind.password=secret
If you have your own way of managing the SSL keystore, configure the properties related to LDAP SSL keystore according to your setup.
Otherwise, here is an example for generate the SSL keystore:
# get the LDAP server's certificate by
$ echo | openssl s_client -connect example.com:636 2>/dev/null | openssl x509 > /tmp/ldap.crt
# add the certificate to Java's trusted keystore by
$ sudo keytool -import -noprompt -trustcacerts -alias ldap -file /tmp/ldap.crt -keystore ${JAVA_HOME}/jre/lib/security/cacerts
# generate the keystore in JKS format, in the prompt, specify password as the value for property
# "alluxio.security.group.mapping.ldap.ssl.keystore.password", answer "yes" to the question of whether to trust the certificate
$ keytool -import -keystore /path/to/ldap.jks -file /tmp/ldap.crt
Below is a full list of properties relevant to LDAP configuration:
Property Name | Default | Meaning |
---|---|---|
alluxio.security.group.mapping.ldap.attr.group.name | cn | N/A |
alluxio.security.group.mapping.ldap.attr.member | member | N/A |
alluxio.security.group.mapping.ldap.base | N/A | |
alluxio.security.group.mapping.ldap.bind.password | N/A | |
alluxio.security.group.mapping.ldap.bind.password.file | N/A | |
alluxio.security.group.mapping.ldap.bind.user | N/A | |
alluxio.security.group.mapping.ldap.search.filter.group | (objectClass=group) | N/A |
alluxio.security.group.mapping.ldap.search.filter.user | (&(objectClass=user)(sAMAccountName={0})) | N/A |
alluxio.security.group.mapping.ldap.search.timeout | 10000 | N/A |
alluxio.security.group.mapping.ldap.ssl | false | N/A |
alluxio.security.group.mapping.ldap.ssl.keystore | N/A | |
alluxio.security.group.mapping.ldap.ssl.keystore.password | N/A | |
alluxio.security.group.mapping.ldap.ssl.keystore.password.file | N/A | |
alluxio.security.group.mapping.ldap.url | N/A |
Use Ranger for Authorization
If you are using Ranger for authorization and want to integrate Alluxio with Ranger, check out the HDP Ranger Integration.
Initialized directory and file permissions
When a file is created, it is initially assigned fully opened permissions of 666
by default.
Similarly, a directory is initially assigned with 777
permissions.
A umask is applied on the initial permissions; this is configured by the
alluxio.security.authorization.permission.umask
property, with a default of 022
.
Without any property modifications, files and directories are created with 644
and 755
permissions respectively.
Update directory and file permission model
The owner, group, and permissions can be changed by two ways:
- User application invokes the
setAttribute(...)
method ofFileSystem API
orHadoop API
. - CLI command in shell. See chown, chgrp, chmod.
The owner attribute can only be changed by a super user. The group and permission attributes can be changed by a super user or the owner of the path.
Access Control Lists
The POSIX permissions model allows administrators to grant permissions to owners, owning groups and other users. The permission bits model is sufficient for most cases. However, to help administrators express more complicated security policies, Alluxio also supports Access Control Lists (ACLs). ACLs allow administrators to grant permissions to any user or group.
A file or directory’s Access Control List consists of multiple entries. The two types of ACL entries are Access ACL entries and Default ACL entries.
1. Access ACL Entries:
This type of ACL entry specifies a particular user or group’s permission to read, write and execute.
Each ACL entry consists of:
- a type, which can be one of user, group or mask
- an optional name
- a permission string similar to the POSIX permission bits
The following table shows the different types of ACL entries that can appear in the access ACL:
ACL Entry Type | Description |
---|---|
user:userid:permission | Sets the access ACLs for a user. Empty userid implies the permission is for the owner of the file. |
group:groupid:permission | Sets the access ACLs for a group. Empty groupid implies the permission is for the owning group of the file. |
other::permission | Sets the access ACLs for all users not specified above. |
mask::permission | Sets the effective rights mask. The ACL mask indicates the maximum permissions allowed for all users other than the owner and for groups. |
Notice that ACL entries describing owner’s, owning group’s and other’s permissions already exist in
the standard POSIX permission bits model.
For example, a standard POSIX permission of 755
translates into an ACL list as follows:
user::rwx
group::r-x
other::r-x
These three entries are always present in each file and directory. When there are entries in addition to these standard entries, the ACL is considered an extended ACL.
A mask entry is automatically generated when an ACL becomes extended. Unless specifically set by the user, the mask’s value is adjusted to be the union of all permissions affected by the mask entry. This includes all the user entries other than the owner and all group entries.
For the ACL entry user::rw-
:
- the type is
user
- the name is empty, which implies the owner
- the permission string is
rw-
This culminates to the owner has read
and write
permissions, but not execute
.
For the ACL entry group:interns:rwx
and mask mask::r--
:
- the entry grants all permissions to the group
interns
- the mask only allows
read
permissions
This culminates to the interns
group having only read
access because the mask disallows all
other permissions.
2. Default ACL Entries:
Default ACLs only apply to directories. Any new file or directory created within a directory with a default ACL will inherit the default ACL as its access ACL. Any new directory created within a directory with a default ACL will also inherit the default ACL as its default ACL.
Default ACLs also consists of ACL entries, similar to those found in access ACLs.
The are distinguished by the default
keyword as the prefix.
For example, default:user:alluxiouser:rwx
and default:other::r-x
are both valid default ACL entries.
Given a documents
directory, its default ACL can be set to default:user:alluxiouser:rwx
.
The user alluxiouser
will have full access to any new files created in the documents
directory.
These new files will have an access ACL entry of user:alluxiouser:rwx
.
Note that the ACL does not grant the user alluxiouser
any additional permissions to the directory.
Managing ACL entries
ACLs can be managed by two ways:
- User application invokes the
setFacl(...)
method ofFileSystem API
orHadoop API
to change the ACL and invokes thegetFacl(...)
to obtain the current ACL. - CLI command in shell. See getfacl setfacl,
The ACL of a file or directory can only be changed by super user or its owner.
Data Path Authorization
In Alluxio Enterprise Edition, the access control on data transfer path (Client-Workers) is further enforced by an enhanced distributed authorization mechanism. Alluxio worker is able to check whether the client user has the right privilege to access the requested block, even though workers do not know about the file permission info.
This data path authorization feature is disabled by default.
It can be turned on with the following configuration alluxio.security.authorization.capability.enabled=true
on all
masters and workers.
When capability feature is enabled, Alluxio master verifies the permission and grants a signed capability to the client.
The capability is a token which grants the bearer specified access rights. The capability is verified by Alluxio workers
to see whether the granted permission matches with the client’s access request.
A capability is only valid for a short amount of time, which can be configured via Alluxio server configuration
alluxio.security.authorization.capability.lifetime.ms
(default to 1 hour) on all masters and workers.
Capabilities are generated using a scheme where the Master and all Workers share a secret key, called CapabilityKey.
Only Master and Workers know the key, no third party can forge the capabilities.
The capability key is generated and rotated by Alluxio master periodically.
To avoid bulk capability invalidation errors, during key rotation the old key is still valid for a short time period
to allow graceful key expiration.
The workers will accept old capabilities for a certain time period (by default 25% of the key life time) after receiving
a new version of capability key.
Capability key life time can be configured via Alluxio server configuration
alluxio.security.authorization.capability.key.lifetime.ms
(default to 1 day) on all masters.
Client-Side Hadoop Impersonation
When Alluxio is used in a Hadoop environment, a user, or identity, can be specified for both the Hadoop client and the Alluxio client. Since the Hadoop client user and the Alluxio client user can specified independently, the users could be different from each other. The Hadoop client user may even be in a separate namespace from the Alluxio client user.
Alluxio client-side Hadoop impersonation solves the issues when the Hadoop client user is different from the Alluxio client user. With this feature, the Alluxio client examines the Hadoop client user, and then attempts to impersonate as that Hadoop client user.
For example, a Hadoop application can be configured to run as the Hadoop client user foo
, but the
Alluxio client user is configured to be yarn
. This means any data interactions will be attributed
to user yarn
. With client-side Hadoop impersonation, the Alluxio client will detect the Hadoop
client user is foo
, and then connect to Alluxio servers as user yarn
impersonating as user
foo
. With this impersonation, the data interactions will be attributed to user foo
.
This feature is only applicable when using the hadoop compatible client to access Alluxio.
In order to configure Alluxio for client-side Hadoop impersonation, both client and server configurations (master and worker) are required.
If ASYNC_THROUGH
is used, Alluxio will persist data in async way, and a special directory is used to
store the temporary file. The admin should create a temporary directory at the deployment time in HDFS.
The temporary directory path can be set via Alluxio server configuration alluxio.underfs.persistence.async.temp.dir
(default .alluxio_ufs_persistence). If the “alluxio” user is granted a superuser permission, the “alluxio” user can create directory
automatically and the manual step is not necessary.
The default requires to create a temporary directory in ufs_mount/.alluxio_ufs_persistence, and the mod should be 0777.
Server Configuration
To enable a particular Alluxio client user to impersonate other users server (master and worker)
configuration are required.
Set the alluxio.master.security.impersonation.<USERNAME>.users
property,
where <USERNAME>
is the name of the Alluxio client user.
The property value is a comma-separated list of users that <USERNAME>
is allowed to impersonate.
The wildcard value *
can be used to indicate the user can impersonate any other user.
Some examples:
alluxio.master.security.impersonation.alluxio_user.users=user1,user2
- the Alluxio client user
alluxio_user
is allowed to impersonateuser1
anduser2
- the Alluxio client user
alluxio.master.security.impersonation.client.users=*
- the Alluxio client user
client
is allowed to impersonate any user
- the Alluxio client user
To enable a particular user to impersonate other groups, set the
alluxio.master.security.impersonation.<USERNAME>.groups
property, where again <USERNAME>
is
the name of the Alluxio client user.
Similar to above, the value is a comma-separated list of groups and the wildcard value *
can be used to indicate all groups.
Some examples:
alluxio.master.security.impersonation.alluxio_user.groups=group1,group2
- the Alluxio client user
alluxio_user
is allowed to impersonate any users from groupsgroup1
andgroup2
- the Alluxio client user
alluxio.master.security.impersonation.client.groups=*
- the Alluxio client user
client
is allowed to impersonate users from any group
- the Alluxio client user
In summary, to enable an Alluxio client user to impersonate other users, at least one of the two impersonation properties must be set on servers; setting both are allowed for the same Alluxio client user.
Client Configuration
After enabling impersonation on the servers for a given Alluxio client user,
the client must indicate which user it wants to impersonate.
This is configured by the alluxio.security.login.impersonation.username
property.
If the property is set to an empty string or _NONE_
, impersonation is disabled, and the Alluxio
client will interact with Alluxio servers as the Alluxio client user.
If the property is set to _HDFS_USER_
, the Alluxio client will connect to Alluxio servers as the
Alluxio client user, but impersonate as the Hadoop client user when using the Hadoop compatible
client. The default value is _HDFS_USER_
.
Common Exceptions
The most common impersonation error applications may see is something like
Failed to authenticate client user="yarn" connecting to Alluxio server and impersonating as
impersonationUser="foo" to access Alluxio file system. User "yarn" is not configured to
allow any impersonation.
This message means a user yarn
is connecting to Alluxio servers trying to impersonate as user
foo
, but the Alluxio servers are not configured to allow impersonation for user yarn
.
This is most likely due to the fact that the Alluxio servers have not been configured to enable
impersonation for that user.
To fix this, the Alluxio servers must be configured to enable impersonation for the user
in question (yarn
in the example error message).
Please read this blog post for more tips.
Auditing
Alluxio supports audit logging to allow system administrators to track users’ access to file metadata.
The audit log file at master_audit.log
contains entries corresponding to file metadata access
operations.
The format of Alluxio audit log entry is shown in the table below:
key | value |
---|---|
succeeded | True if the command has succeeded. To succeed, it must also have been allowed. |
allowed | True if the command has been allowed. Note that a command can still fail even if it has been allowed. |
ugi | User group information, including username, primary group, and authentication type. |
ip | Client IP address. |
cmd | Command issued by the user. |
src | Path of the source file or directory. |
dst | Path of the destination file or directory. If not applicable, the value is null. |
perm | User:group:mask or null if not applicable. |
This is similar to the format of HDFS audit log.
To enable Alluxio audit logging, set the JVM property
alluxio.master.audit.logging.enabled
to true
in alluxio-env.sh
.
See Configuration settings.
Encryption
Alluxio supports encryption of the network communication between services with TLS. It also supports encryption of data stored by the workers, abbreviated as encryption at rest.
TLS Encryption for Network Communication
TLS is a cryptographic protocol that provides end-to-end security of data sent between applications over the Internet. It ensures the secure delivery of data over the Internet, avoiding possible eavesdropping and/or alteration of the content. For Alluxio network communication (RPCs, data transfers), Alluxio supports TLS encryption. In order to configure Alluxio to use TLS encryption, keystores and truststores must be created for Alluxio. A keystore is used by the server side of the TLS connection, and the truststore is used by the client side of the TLS connection.
Keystore
Alluxio servers (masters and workers) require a keystore in order to enable TLS. The keystore typically stores the key and certificate for the server. This keystore file must be readable by the OS user which launches the Alluxio server processes.
An example, self-signed keystore can be created like:
$ keytool -genkeypair -alias key -keyalg RSA -keysize 2048 -dname "cn=localhost, ou=Department, o=Company, l=City, st=State, c=US" -keystore /alluxio/keystore.jks -keypass keypass -storepass storepass
This will generate a keystore file to /alluxio/keystore.jks
, with a key password of keypass
and the keystore password as storepass
.
Truststore
All clients of a TLS connection must have access to a truststore to trust all the certificates of the servers. Clients include Alluxio clients, as well as Alluxio workers (since Alluxio workers create client connections to the Alluxio master). The truststore stores the trusted certificates, and must be readable by the process initiating the client connection (clients, workers).
An example truststore (based on the previous keystore) can be created like:
$ keytool -export -alias key -keystore /alluxio/keystore.jks -storepass storepass -rfc -file selfsigned.cer
$ keytool -import -alias key -noprompt -file selfsigned.cer -keystore /alluxio/truststore.jks -storepass trustpass
The first command extracts the certificate from the previously created keystore (using the keystore
password storepass
). Then, the second command creates a truststore file using that extracted
certificate, and saves the truststore to /alluxio/truststore.jks
, with a truststore password of
trustpass
.
Configuring Alluxio servers and clients
Once the keystores and truststores are created for all the machines involved, Alluxio needs to be configured to understand how to access those files.
On Alluxio servers (masters and workers), you must add these properties to
alluxio-site.properties
:
# enables TLS
alluxio.network.tls.enabled=true
# keystore properties for the server side of connections
alluxio.network.tls.keystore.path=/alluxio/keystore.jks
alluxio.network.tls.keystore.password=storepass
alluxio.network.tls.keystore.key.password=keypass
# truststore properties for the client side of connections (worker to master, or master to master for embedded journal)
alluxio.network.tls.truststore.path=/alluxio/truststore.jks
alluxio.network.tls.truststore.password=trustpass
The Alluxio servers can explicitly specify which TLS protocols to use with the parameter
alluxio.network.tls.server.protocols
.
This can be set to a comma-separated list of TLS protocol names, for example:
alluxio.network.tls.server.protocols=TLSv1.1,TLSv1.2
.
This is useful for restricting the servers from enabling certain TLS protocols, since by default,
Java and Netty enable all supported protocols.
The Alluxio servers will use the secret key in the keystore, but sometimes, keystores contain multiple keys.
If there are multiple keys in the keystore, the key to use must be specified by providing the alias name via
alluxio.network.tls.keystore.alias
.
For example, if you want the servers to use the key with alias name serverkey
, then the configuration can be set like
alluxio.network.tls.keystore.alias=serverkey
.
For the embedded journal, only a single certificate can be loaded from the truststore. If there truststore only has
a single certificate, it will work. However, if the truststore contains multiple certificates, then the alias must be
specified with alluxio.network.tls.truststore.alias
. For example, if the alias name you want to use is cacert
,
the parameter should be set like alluxio.network.tls.truststore.alias=cacert
.
Once the servers are configured, additional Alluxio clients need to be configured with the client side properties:
# enables TLS
alluxio.network.tls.enabled=true
# truststore properties for the client side of connections (worker to master)
alluxio.network.tls.truststore.path=/alluxio/truststore.jks
alluxio.network.tls.truststore.password=trustpass
Setting these configuration properties will be dependent on the specific application or computation framework you are using.
Once the servers and clients are configured, all network communication will be encrypted with TLS.
TLS Encryption on Kubernetes
To enable TLS encryption on Alluxio network traffic on Kubernetes, see TLS Encryption on Alluxio in Kubernetes
Configuring Spark with Alluxio TLS enabled client
Spark users can use JVM system properties to set Alluxio properties on to Spark jobs by
adding "-Dproperty=value"
to spark.executor.extraJavaOptions
for Spark executors and
spark.driver.extraJavaOptions
for Spark drivers. To enable the TLS connection for Alluxio client
in Spark, you can set the client side properties in spark-default.conf
as below:
spark.driver.extraJavaOptions -Dalluxio.network.tls.enabled=true -Dalluxio.network.tls.truststore.path=<TRUSTSTORE_PATH> -Dalluxio.network.tls.truststore.password=<TRUSTSTORE_PASSWORD>
spark.executor.extraJavaOptions -Dalluxio.network.tls.enabled=true -Dalluxio.network.tls.truststore.path=<TRUSTSTORE_PATH> -Dalluxio.network.tls.truststore.password=<TRUSTSTORE_PASSWORD>
Encryption at rest
Alluxio supports encryption at rest. The data is encrypted when it is stored by the worker in its designated medium and decrypted after it is read from those mediums. In conjunction with TLS encryption for network communication, this feature provides end-to-end server-side security. In order to configure Alluxio to use encryption at rest, a Hashicorp Vault server should be configured as key store (Alluxio also supports to store keys in the journal for test). Encryption at rest is configured along a particular file path in the Alluxio namespace, which will be referred to as an encryption zone. There are several steps to use encryption at rest:
- Set the property values for encryption and the key store.
- Use the
fsadmin createZone
command to define an encryption zone along an Alluxio URI. - Any data whose path is along an encryption zone’s URI will be encrypted when stored by a worker.
Configuring Alluxio with encryption at rest
To enable encryption at rest, the following configuration properties in conf/alluxio-site.properties
must be set:
alluxio.security.tier.storage.encryption.enabled=true
alluxio.security.tier.storage.encryption.cipher.bit.length=256
alluxio.security.tier.storage.encryption.cipher.type=<CIPHER_TYPE>
alluxio.worker.data.encrypted.block.chunk.size=131072
alluxio.worker.data.encryption.method=<ENCRYPTION_METHOD>
<ENCRYPTION_METHOD>
is the encryption method.
The following are all supported methods:
ENCRYPTED_BY_BLOCK
ENCRYPTED_BY_CHUNK_JCE
ENCRYPTED_BY_CHUNK_OPENSSL
The ENCRYPTED_BY_BLOCK
will treat the whole block as an encryption unit, while ENCRYPTED_BY_CHUNK_JCE
and ENCRYPTED_BY_CHUNK_OPENSSL
will
encrypt the data according to the chunk size defined by alluxio.worker.data.encrypted.block.chunk.size
.
ENCRYPTED_BY_BLOCK
does not perform well for random reads compared to the chunk based methods.
ENCRYPTED_BY_CHUNK_JCE
will use the JCE library to encrypt.
ENCRYPTED_BY_CHUNK_OPENSSL
will use the openssl library to encrypt. The library is expected to be located at /usr/lib64/libcrypto.so
on each worker. Depending on the environment, it may be necessary to create a symlink targeting the specific version of the file. The following command is an example:
ln -s /usr/lib64/libcrypto.so.1.0.2k /usr/lib64/libcrypto.so
<CIPHER_TYPE>
is the encryption algorithm type.
The following are all the supported algorithms:
AES/CBC/NoPadding
AES/GCM/NoPadding
AES/CTR/NoPadding
Based on these recommendations with respect to performance and security level, the default encryption method is ENCRYPTED_BY_CHUNK_JCE
and the default cipher type is AES/GCM/NoPadding
.
Because the en/decrption requires extra data transformation, it takes more cpu and increases access time. With the above default settings, an apple-to-apple comparison between encryption and non-encryption, the sql time is increased roughly 20% when encrypted.
Key Store
The following is about the Hashicorp Vault key store config:
// Enable Hashicorp Vault key store provider
alluxio.security.tier.keystore.hashicorp.vault.enabled=true
alluxio.security.secret.store.hashicorp.vault.address=<VAULT_SERVER_URI>
alluxio.security.secret.store.hashicorp.vault.ca.cert=<CA_CERT>
alluxio.security.secret.store.hashicorp.vault.client.cert=<CLIENT_SIDE_CERT>
alluxio.security.secret.store.hashicorp.vault.client.key=<CLIENT_SIDE_KEY>
alluxio.security.tier.storage.encryption.keystore.type=KEY_VALUE_STORE_SYSTEM
alluxio.security.secret.store.hashicorp.vault.token=<VAULT_TOKEN>
<VAULT_SERVER_URI>
is the connection URI of the vault server, e.g. https://127.0.0.1:8200/
.
<CA_CERT>
is the path to the CA certificate that issued the server certificate. This is required if the CA is self-signed.
Otherwise, set alluxio.security.secret.store.hashicorp.vault.tls.verify.cert
to false
to disable server certificate
verification against a CA. This is insecure in production and should be used only for testing.
<CLIENT_SIDE_CERT>
and <CLIENT_SIDE_KEY>
are paths to the client side certificate and private key.
These are required if TLS client authentication is enabled to connect to the vault server.
<VAULT_TOKEN>
is the Hashicorp Vault token for accessing the Hashicorp Vault server.
Alluxio also supports storing the encryption key in the journal. This can be used in a test environment to experiment with the feature, but it is not safe for a production environment.
alluxio.security.tier.storage.encryption.keystore.type=KEY_STORE_JOURNAL
Encryption Zone
The concept of encryption zone is similar to the HDFS Encryption Zone. Alluxio’s encryption zone is Alluxio URI based. The number of keys can be specified for each encryption zone and Alluxio will assign the key in a round-robin way to each newly created block. Alluxio supports the following encryption zone operations:
./bin/alluxio fsadmin encryptionZone
Usage: encryptionZone [createZone] [listKey] [listZone] [removeZone]
Encryption Zone related operations.
createZone <zonePath> <numKeys>
create an encryption zone on zonePath, and create numKeys keys.
listZone
list the existing zones
listKey
list the existing keys
removeZone <zonePath>
remove the encryption zone indicated by zonePath
Nested encryption zones are also supported, eg:
/ez1
/ez1/nested1
Files in the directories under /ez1
but not in /ez1/nested1
belong to encryption zone ez1
. Files in /ez1/nested1
and its sub-directories belong to encryption zone /ez1/nested1
. This implies that are encrypted and decrypted with different sets of keys.
So far Alluxio can support several number of encryption zones, it is expected to support lots of encryption zones in the future.
Credential Management
Starting from version 2.4, Alluxio enterprise edition supports using a trusted secret store to manage credentials. Users with secret stores can manage the lifecycle of their access tokens and passwords without needing to reconfigure Alluxio after rotations. Administrators no longer need to provide sensitive information directly to the Alluxio system. For example, you can store the AWS keys in Hashicorp Vault server, Alluxio can fetch the AWS keys from Hashicorp Vault Server, instead of storing the AWS keys in the Alluxio configuration file.
There are the basic configuration parameters and ways to use the credential management function.
You can create new secret store connection as follows:
// Enable Alluxio Credential Management
alluxio.security.secret.store.enabled=true
// Enable Hashicorp Vault secret provider
alluxio.security.secret.store.hashicorp.vault.enabled=true
alluxio.security.secret.store.hashicorp.vault.address=<VAULT_SERVER_ADDRESS>:<VAULT_SERVER_PORT>
alluxio.security.secret.store.hashicorp.vault.auth=<VAULT_AUTH_TYPE>
<ALLUXIO_CONF>=${secrets.<SECRET_PROVIDER_TYPE>.<SECRET_KEY_NAME>}
alluxio.security.secret.vault.<SECRET_KEY_NAME>.key=<SECRET_KEY>
alluxio.security.secret.vault.<SECRET_KEY_NAME>.path=<SECRET_PATH>
alluxio.security.secret.vault.<SECRET_KEY_NAME>.version=<VAULT_CLIENT_VERSION>
alluxio.security.secret.vault.<SECRET_KEY_NAME>.type=<VAULT_SECRET_TYPE>
<VAULT_SERVER_ADDRESS>
is the address of the vault server.
<VAULT_SERVER_PORT>
is the port of the vault server.
<VAULT_AUTH_TYPE>
is the authentication credential type of vault, TOKEN
is supported.
<ALLUXIO_CONF>
is the Alluxio property that you want to apply the credential from the secret store.
<SECRET_PROVIDER_TYPE>
is the secret provider type, vault
is supported.
<SECRET_KEY_NAME>
is the name of the secret. You need to specify a unique name for each secret credential.
<SECRET_KEY>
is the secret key of the secret. Alluxio secret store uses this key to fetch credential
value from a secret provider.
<SECRET_PATH>
is the secret path to the secret KV store. Alluxio secret store use this path to
fetch secret credentials.
alluxio.security.secret.vault.<SECRET_KEY_NAME>.version
is the Vault Client version, Vault Client
version 1 support K/V store engine 1 and AWS Secrets Engine, Vault Client Version 2 support K/V store
engine 2. This property is optional. If you don’t set Vault Client version, Alluxio will use Vault
Client 2 as default. If multiple secrets are under the same path of Vault, you just need to set secret
version to one secret.
alluxio.security.secret.vault.<SECRET_KEY_NAME>.type
is the type of Vault secrets, you can specify
this configuration to KV
or AWS_IAM_USER
. If you set S3 access key ID and secret key to AWS_IAM_USER
,
Alluxio S3 under storage system will try to auto rotate AWS access key ID and secret key once get
permission denied, which may be caused by secret expiration or secret revoke. If you don’t set the
secret type, Alluxio will set this configuration to KV
by default.
If multiple secrets are under the same path of Vault, you just need to set secret type to one secret.
For example, if you have two paths: path#1(secret#1, secret#2) with VAULT_CLIENT_VERSION#1
and
VAULT_SECRET_TYPE#1
, and path#2(secret#3, secret#4) with VAULT_CLIENT_VERSION#2
and VAULT_SECRET_TYPE#2
,
each of them has two different secrets. You only need to specify the type to one of the secret that
under path individually as follows:
# Path 1 configurations
alluxio.security.secret.vault.<SECRET_KEY_NAME#1>.key=<SECRET_KEY#1>
alluxio.security.secret.vault.<SECRET_KEY_NAME#1>.path=<SECRET_PATH#1>
alluxio.security.secret.vault.<SECRET_KEY_NAME#2>.key=<SECRET_KEY#2>
alluxio.security.secret.vault.<SECRET_KEY_NAME#2>.path=<SECRET_PATH#1>
alluxio.security.secret.vault.<SECRET_KEY_NAME#1>.version=VAULT_CLIENT_VERSION#1
alluxio.security.secret.vault.<SECRET_KEY_NAME#1>.type=VAULT_SECRET_TYPE#1
# Path 2 configurations
alluxio.security.secret.vault.<SECRET_KEY_NAME#3>.key=<SECRET_KEY#3>
alluxio.security.secret.vault.<SECRET_KEY_NAME#3>.path=<SECRET_PATH#2>
alluxio.security.secret.vault.<SECRET_KEY_NAME#4>.key=<SECRET_KEY#4>
alluxio.security.secret.vault.<SECRET_KEY_NAME#4>.path=<SECRET_PATH#2>
alluxio.security.secret.vault.<SECRET_KEY_NAME#3>.version=VAULT_CLIENT_VERSION#2
alluxio.security.secret.vault.<SECRET_KEY_NAME#3>.type=VAULT_SECRET_TYPE#2
For example, if you want to configure the S3 access key and secret key with secret store with Vault KV store version 2, you can add configurations as follows:
alluxio.security.secret.store.enabled=true
alluxio.security.secret.store.vault.enabled=true
alluxio.security.secret.store.hashicorp.vault.address=<VAULT_SERVER_ADDRESS>:<VAULT_SERVER_PORT>
alluxio.security.secret.store.hashicorp.vault.auth=<VAULT_AUTH_TYPE>
alluxio.security.secret.store.hashicorp.vault.token=<VAULT_TOKEN>
aws.accessKeyId=${secrets.vault.alluxio_vault_aws_accessKey}
alluxio.security.secret.vault.alluxio_vault_aws_accessKey.key=awsAccessKey
alluxio.security.secret.vault.alluxio_vault_aws_accessKey.path=secret/creds
aws.accessKeyId=${secrets.vault.alluxio_vault_aws_secretKey}
alluxio.security.secret.vault.alluxio_vault_aws_secretKey.key=awsSecretKey
alluxio.security.secret.vault.alluxio_vault_aws_secretKey.path=secret/creds
Alluxio secret store will fetch the AWS access key and secret key from secret/creds/awsAccessKey
and secret/creds/awsSecretKey
individually.
Secrets Rotation Support
Secrets Rotation enables the dynamic secrets rotation for S3 UFS. If the credential of AWS is expired or gets revoked for some reason, Alluxio S3 under storage system will catch the error and try to get new credentials from the underlay secret storage. Currently, Alluxio supports secrets rotation with HashiCorp Vault for S3 under storage system. In order to use the secrets Rotation support, you have to enable the AWS Secrets Engine in Vault server, for the detailed setup, you can check Hashicorp Vault AWS Secrets Engine documentation. Currently, Alluxio Credential Manager supports the secrets rotation with AWS IAM user. For example, if you want to user the Secrets Rotation with Alluxio S3 under storage system, you can add configurations as follows:
alluxio.security.secret.store.enabled=true
alluxio.security.secret.store.vault.enabled=true
alluxio.security.secret.store.hashicorp.vault.address=<VAULT_SERVER_ADDRESS>:<VAULT_SERVER_PORT>
alluxio.security.secret.store.hashicorp.vault.auth=<VAULT_AUTH_TYPE>
alluxio.security.secret.store.hashicorp.vault.token=<VAULT_TOKEN>
aws.accessKeyId=${secrets.vault.alluxio_vault_aws_accessKey}
alluxio.security.secret.vault.alluxio_vault_aws_accessKey.key=access_key
alluxio.security.secret.vault.alluxio_vault_aws_accessKey.path=secret/creds
alluxio.security.secret.vault.alluxio_vault_aws_accessKey.version=1
alluxio.security.secret.vault.alluxio_vault_aws_accessKey.type=AWS_IAM_USER
aws.accessKeyId=${secrets.vault.alluxio_vault_aws_secretKey}
alluxio.security.secret.vault.alluxio_vault_aws_secretKey.key=secret_key
alluxio.security.secret.vault.alluxio_vault_aws_secretKey.path=secret/creds
Since AWS IAM uses the eventual consistency model, the new credential is not immediately visible, Alluxio will wait until the AWS credentials are ready to be used before reach the 20 seconds timeout during credential rotation. AWS credentials usually delay 5-10 seconds after fetching AWS credentials until they are ready to use.
Storage Integration Access Token Framework
When Alluxio is integrating with different storages, the credentials to access different storages are configured on every Alluxio server node (both masters and workers). In some environments, the customer may be very concerned about persisting credentials on workers. Alluxio provides a framework for masters to distribute the tokens for workers to use as temporary credentials. These tokens are stored in memory and will not be persisted by the workers. This framework has an example implementation so far, it can apply the AWS S3 AssumeRole Token in the Master per user, and then propagate to related Workers. See Assume Role Propagation/Refresh From the Master for more details. In the future, this framework can be extended to support more under storage access token propagation.
Deployment
It is required to start Alluxio masters and workers using the same operating system user.
In the case where there is a user mismatch, standby master health check,
the command alluxio-start.sh all
, and certain file operations may fail because of
permission checks.