S3 Client

Slack Docker Pulls GitHub edit source

Alluxio支持RESTful API,兼容Amazon S3 API 的基本操作。

REST API 手册会在Alluxio构建时生成并且可以通过${ALLUXIO_HOME}/core/server/proxy/target/miredot/index.html获得。

使用HTTP代理会带来一些性能的影响,尤其是在使用代理的时候会增加一个额外的跳计数。为了达到最优的性能,推荐代理服务和一个Alluxio worker运行在一个计算节点上。或者,推荐将所有的代理服务器放到load balancer之后。

特性支持

下表描述了对当前Amazon S3基础特性的支持情况:

S3 FeatureStatus
List Buckets Supported
List Objects Supported
List ObjectsV2 Supported
Delete Buckets Supported
Create Bucket Supported
Bucket Lifecycle Not Supported
Policy (Buckets, Objects) Not Supported
Bucket ACLs (Get, Put) Not Supported
Bucket Location Not Supported
Bucket Notification Not Supported
Bucket Object Versions Not Supported
Get Bucket Info (HEAD) Not Supported
Put Object Supported
Delete Object Supported
Delete Objects Supported
Get Object Supported
Get Object Info (HEAD) Supported
Get Object (Range Query) Supported
Object ACLs (Get, Put) Not Supported
POST Object Not Supported
Copy Object Supported
Multipart Uploads Supported

语言支持

Alluxio S3 客户端支持各种编程语言,比如C++、Java、Python、Golang、Ruby等。在这个文档中,我们使用curl REST调用和python S3 client作为使用示例。

使用示例

REST API

举个例子,你可以使用如下的RESTful API调用方式在本地运行一个Alluxio集群。Alluxio代理会默认在39999端口监听。

创建bucket

$ curl -i -X PUT http://localhost:39999/api/v1/s3/testbucket
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:34:41 GMT
Content-Length: 0
Server: Jetty(9.2.z-SNAPSHOT)

获取bucket(objects列表)

$ curl -i -X GET http://localhost:39999/api/v1/s3/testbucket
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:35:00 GMT
Content-Type: application/xml
Content-Length: 200
Server: Jetty(9.2.z-SNAPSHOT)

<ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken/><KeyCount>0</KeyCount><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated></ListBucketResult>

加入object

假定本地现存一个文件LICENSE

$ curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/testobject
HTTP/1.1 100 Continue

HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:36:03 GMT
ETag: "9347237b67b0be183499e5893128704e"
Content-Length: 0
Server: Jetty(9.2.z-SNAPSHOT)

获取object

$ curl -i -X GET http://localhost:39999/api/v1/s3/testbucket/testobject
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:37:34 GMT
Last-Modified: Tue, 29 Aug 2017 22:36:03 GMT
Content-Type: application/xml
Content-Length: 26847
Server: Jetty(9.2.z-SNAPSHOT)

.................. Content of the test file ...................

列出含有单个object的bucket

$ curl -i -X GET http://localhost:39999/api/v1/s3/testbucket
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:38:48 GMT
Content-Type: application/xml
Content-Length: 363
Server: Jetty(9.2.z-SNAPSHOT)

<ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken/><KeyCount>1</KeyCount><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>testobject</Key><LastModified>2017-08-29T15:36:03.613Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>

列出含有多个objects的bucket

你可以上传更多的文件并且使用max-keyscontinuation-token作为GET bucket request参数,比如:

$ curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key1
$ curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key2
$ curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key3
$ curl -i -X GET http://localhost:39999/api/v1/s3/testbucket\?max-keys\=2
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:40:45 GMT
Content-Type: application/xml
Content-Length: 537
Server: Jetty(9.2.z-SNAPSHOT)

<ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken>key3</NextContinuationToken><KeyCount>2</KeyCount><MaxKeys>2</MaxKeys><IsTruncated>true</IsTruncated><Contents><Key>key1</Key><LastModified>2017-08-29T15:40:42.213Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>key2</Key><LastModified>2017-08-29T15:40:43.269Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>

# curl -i -X GET http://localhost:39999/api/v1/s3/testbucket\?max-keys\=2\&continuation-token\=key3
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:41:18 GMT
Content-Type: application/xml
Content-Length: 540
Server: Jetty(9.2.z-SNAPSHOT)

<ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken>key3</ContinuationToken><NextContinuationToken/><KeyCount>2</KeyCount><MaxKeys>2</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>key3</Key><LastModified>2017-08-29T15:40:44.002Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>testobject</Key><LastModified>2017-08-29T15:36:03.613Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>

你还可以验证这些对象是否为Alluxio文件,在/testbucket目录下。

$ ./bin/alluxio fs ls -R /testbucket

删除objects

$ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key1
HTTP/1.1 204 No Content
Date: Tue, 29 Aug 2017 22:43:22 GMT
Server: Jetty(9.2.z-SNAPSHOT)
$ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key2
$ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key3
$ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/testobject

初始化multipart upload

$ curl -i -X POST http://localhost:39999/api/v1/s3/testbucket/testobject?uploads
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:43:22 GMT
Content-Length: 197
Server: Jetty(9.2.z-SNAPSHOT)

<?xml version="1.0" encoding="UTF-8"?>
<InitiateMultipartUploadResult xmlns="">
  <Bucket>testbucket</Bucket>
  <Key>testobject</Key>
  <UploadId>2</UploadId>
</InitiateMultipartUploadResult>

上传分块

$ curl -i -X PUT http://localhost:39999/api/v1/s3/testbucket/testobject?partNumber=1&uploadId=2
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:43:22 GMT
ETag: "b54357faf0632cce46e942fa68356b38"
Server: Jetty(9.2.z-SNAPSHOT)

罗列已上传的分块

$ curl -i -X GET http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=2
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:43:22 GMT
Content-Length: 985
Server: Jetty(9.2.z-SNAPSHOT)

<?xml version="1.0" encoding="UTF-8"?>
<ListPartsResult xmlns="">
  <Bucket>testbucket</Bucket>
  <Key>testobject</Key>
  <UploadId>2</UploadId>
  <StorageClass>STANDARD</StorageClass>
  <IsTruncated>false</IsTruncated>
  <Part>
    <PartNumber>1</PartNumber>
    <LastModified>2017-08-29T20:48:34.000Z</LastModified>
    <ETag>"b54357faf0632cce46e942fa68356b38"</ETag>
    <Size>10485760</Size>
  </Part>
</ListPartsResult>

完成multipart upload

$ curl -i -X POST http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=2 -d '
<CompleteMultipartUpload>
  <Part>
    <PartNumber>1</PartNumber>
    <ETag>"b54357faf0632cce46e942fa68356b38"</ETag>
  </Part>
</CompleteMultipartUpload>'

HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:43:22 GMT
Server: Jetty(9.2.z-SNAPSHOT)

<?xml version="1.0" encoding="UTF-8"?>
<CompleteMultipartUploadResult xmlns="">
  <Location>/testbucket/testobjectLocation>
  <Bucket>testbucket</Bucket>
  <Key>testobject</Key>
  <ETag>"b54357faf0632cce46e942fa68356b38"</ETag>
</CompleteMultipartUploadResult>

中止multipart upload

$ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=2
HTTP/1.1 204 OK
Date: Tue, 29 Aug 2017 22:43:22 GMT
Content-Length: 0
Server: Jetty(9.2.z-SNAPSHOT)

删除空bucket

$ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket
HTTP/1.1 204 No Content
Date: Tue, 29 Aug 2017 22:45:19 GMT

Python S3 Client

创建连接

import boto
import boto.s3.connection

conn = boto.connect_s3(
    aws_access_key_id = '',
    aws_secret_access_key = '',
    host = 'localhost',
    port = 39999,
    path = '/api/v1/s3',
    is_secure=False,
    calling_format = boto.s3.connection.OrdinaryCallingFormat(),
)

创建bucket

bucketName = 'bucket-for-testing'
bucket = conn.create_bucket(bucketName)

加入small object

smallObjectKey = 'small.txt'
smallObjectContent = 'Hello World!'

key = bucket.new_key(smallObjectKey)
key.set_contents_from_string(smallObjectContent)

获取small object

assert smallObjectContent == key.get_contents_as_string()

上传large object

在本地文件系统创建一个8MB文件

$ dd if=/dev/zero of=8mb.data bs=1048576 count=8

使用python S3 client把它作为object上传

largeObjectKey = 'large.txt'
largeObjectFile = '8mb.data'

key = bucket.new_key(largeObjectKey)
with open(largeObjectFile, 'rb') as f:
    key.set_contents_from_file(f)
with open(largeObjectFile, 'rb') as f:
    largeObject = f.read()

获取large objecy

assert largeObject == key.get_contents_as_string()

删除objects

bucket.delete_key(smallObjectKey)
bucket.delete_key(largeObjectKey)

初始化multipart upload

mp = bucket.initiate_multipart_upload(largeObjectFile)

上传分块

import math, os

from filechunkio import FileChunkIO

# Use a chunk size of 1MB (feel free to change this)
sourceSize = os.stat(largeObjectFile).st_size
chunkSize = 1048576
chunkCount = int(math.ceil(sourceSize / float(chunkSize)))

for i in range(chunkCount):
    offset = chunkSize * i
    bytes = min(chunkSize, sourceSize - offset)
    with FileChunkIO(largeObjectFile, 'r', offset=offset, bytes=bytes) as fp:
        mp.upload_part_from_file(fp, part_num=i + 1)

完成multipart upload

mp.complete_upload()

中止multipart upload

mp.cancel_upload()

删除bucket

conn.delete_bucket(bucketName)