The Alluxio S3 API should be used by applications designed to communicate with an S3-like storage
and would benefit from the other features provided by Alluxio, such as data caching, data
sharing with file system based applications, and storage system abstraction (e.g., using Ceph
instead of S3 as the backing store). For example, a simple application that downloads reports
generated by analytic tasks can use the S3 API instead of the more complex file system API.
Only top-level Alluxio directories are treated as buckets by the S3 API.
Alluxio S3 will overwrite the existing key and the temporary directory for multipart upload.
All sub-directories in Alluxio will be returned in ListObjects(V2) as 0-byte folders. This behavior is in accordance
with if you used the AWS S3 console to create all parent folders for each object.
The maximum size for user-defined metadata in PUT-requests is 2KB by default in accordance with S3 object metadata restrictions.
Property Name | Default | Description |
alluxio.proxy.s3.api.nocache.ufs.read.through.enabled |
false |
(Experimental) If enabled, reading files with a read type of NO_CACHE will be directly read from UFS. |
alluxio.proxy.s3.api.noprefix.enabled |
false |
(Experimental) remove the /api/v1/s3 prefix and support /bucket/object way to access proxy. |
alluxio.proxy.s3.bucket.naming.restrictions.enabled |
false |
Toggles whether or not the Alluxio S3 API will enforce AWS S3 bucket naming restrictions. See https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucketnamingrules.html. |
alluxio.proxy.s3.bucketpathcache.timeout |
0min |
Expire bucket path statistics in cache for this time period. Set 0min to disable the cache. If enabling the cache, be careful that Alluxio S3 API will behave differently from AWS S3 API if bucket path cache entries become stale. |
alluxio.proxy.s3.complete.multipart.upload.keepalive.enabled |
false |
Whether or not to enabled sending whitespace characters as a keepalive message during CompleteMultipartUpload. Enabling this will cause any errors to be silently ignored. However, the errors will appear in the Proxy logs. |
alluxio.proxy.s3.complete.multipart.upload.keepalive.time.interval |
30sec |
The complete multipart upload maximum keepalive time. The keepalive whitespace characters will be sent after 1 second, exponentially increasing in duration up to the configured value. |
alluxio.proxy.s3.complete.multipart.upload.min.part.size |
5MB |
The minimum required file size of parts for multipart uploads. Parts which are smaller than this limit aside from the final part will result in an EntityTooSmall error code. Set to 0 to disable size requirements. |
alluxio.proxy.s3.complete.multipart.upload.pool.size |
20 |
The complete multipart upload thread pool size. |
alluxio.proxy.s3.deletetype |
ALLUXIO_AND_UFS |
Delete type when deleting buckets and objects through S3 API. Valid options are `ALLUXIO_AND_UFS` (delete both in Alluxio and UFS), `ALLUXIO_ONLY` (delete only the buckets or objects in Alluxio namespace). |
alluxio.proxy.s3.global.read.rate.limit.mb |
0 |
Limit the maximum read speed for all connections. Set value less than or equal to 0 to disable rate limits. |
alluxio.proxy.s3.header.metadata.max.size |
2KB |
The maximum size to allow for user-defined metadata in S3 PUTrequest headers. Set to 0 to disable size limits. |
alluxio.proxy.s3.multipart.upload.cleaner.enabled |
false |
Enable automatic cleanup of long-running multipart uploads. |
alluxio.proxy.s3.multipart.upload.cleaner.pool.size |
1 |
The abort multipart upload cleaner pool size. |
alluxio.proxy.s3.multipart.upload.cleaner.retry.count |
3 |
The retry count when aborting a multipart upload fails. |
alluxio.proxy.s3.multipart.upload.cleaner.retry.delay |
10sec |
The retry delay time when aborting a multipart upload fails. |
alluxio.proxy.s3.multipart.upload.cleaner.timeout |
10min |
The timeout for aborting proxy s3 multipart upload automatically. |
alluxio.proxy.s3.multipart.upload.stream.through |
true |
The complete multipart upload write type. |
alluxio.proxy.s3.multipart.upload.write.through |
false |
The complete multipart upload write type. |
alluxio.proxy.s3.single.connection.read.rate.limit.mb |
0 |
Limit the maximum read speed for each connection. Set value less than or equal to 0 to disable rate limits. |
alluxio.proxy.s3.tagging.restrictions.enabled |
true |
Toggles whether or not the Alluxio S3 API will enforce AWS S3 tagging restrictions (10 tags, 128 character keys, 256 character values) See https://docs.aws.amazon.com/AmazonS3/latest/userguide/tagging-managing.html. |
alluxio.proxy.s3.throttle.max.wait.time.ms |
60000 |
The maximum waiting time when the request is throttled. |
alluxio.proxy.s3.use.position.read.range.size |
0 |
When the requested range length is less than this value, the S3 proxy will use 'positionRead' to read data from the worker. Setting a value less than or equal to 0 indicates disabling this feature. In the current implementation, each request for a position read uses a byte array of the same size as the range to temporarily store data, which consumes additional memory. Therefore, in practical use, we limit this value to 4MB. This means that if a value exceeding 4MB is configured, it will be modified to 4MB. |
alluxio.proxy.s3.v2.async.context.timeout.ms |
30000 |
Timeout(in milliseconds) for async context. Set zero or less indicates no timeout. |
alluxio.proxy.s3.v2.async.heavy.pool.core.thread.number |
8 |
Core thread number for async heavy thread pool. |
alluxio.proxy.s3.v2.async.heavy.pool.maximum.thread.number |
64 |
Maximum thread number for async heavy thread pool. |
alluxio.proxy.s3.v2.async.heavy.pool.queue.size |
65536 |
Queue size for async heavy thread pool. |
alluxio.proxy.s3.v2.async.light.pool.core.thread.number |
8 |
Core thread number for async light thread pool. |
alluxio.proxy.s3.v2.async.light.pool.maximum.thread.number |
64 |
Maximum thread number for async light thread pool. |
alluxio.proxy.s3.v2.async.light.pool.queue.size |
65536 |
Queue size for async light thread pool. |
alluxio.proxy.s3.v2.async.processing.enabled |
false |
(Experimental) If enabled, handle S3 request in async mode when v2 version of Alluxio s3 proxy service is enabled. |
alluxio.proxy.s3.v2.version.enabled |
false |
(Experimental) V2, an optimized version of Alluxio s3 proxy service. |
alluxio.proxy.s3.writetype |
CACHE_THROUGH |
Write type when creating buckets and objects through S3 API. Valid options are `MUST_CACHE` (write will only go to Alluxio and must be stored in Alluxio), `CACHE_THROUGH` (try to cache, write to UnderFS synchronously), `THROUGH` (no cache, write to UnderFS synchronously). |
You can use the AWS command line interface
to send S3 API requests to the Alluxio S3 API. Note that you will have to provide the --endpoint
parameter
to specify the location of the Alluxio S3 REST API with the server’s base URI included
(i.e: --endpoint "http://{alluxio.proxy.web.hostname}:{alluxio.proxy.web.port}/api/v1/s3/"
).
As a pre-requisite for operations which involve the Authorization
header you may need to
configure AWS credentials.
$ aws configure --profile alluxio-s3
AWS Access Key ID [None]: {user}
AWS Secret Access Key [None]: {dummy value}
Default region name [None]:
Default output format [None]:
You can directly use any HTTP client to send S3 API requests to the Alluxio S3 API.
Note that the base URI for the Alluxio S3 API’s REST server is /api/v1/s3/
(i.e: your requests should be directed to "http://{alluxio.proxy.web.hostname}:{alluxio.proxy.web.port}/api/v1/s3/"
).
At the moment, access key and secret key validation does not exist for the Alluxio S3 API.
Therefore the Authorization header
is used purely to specify the intended user to perform a request. The header follows the
AWS Signature Version 4
format.
$ curl -i -H "Authorization: AWS4-HMAC-SHA256 Credential=testuser/... SignedHeaders=... Signature=..." ...