S3 Client
Alluxio支持RESTful API,兼容Amazon S3 API 的基本操作。
REST API 手册会在Alluxio构建时生成并且可以通过${ALLUXIO_HOME}/core/server/proxy/target/miredot/index.html获得。
使用HTTP代理会带来一些性能的影响,尤其是在使用代理的时候会增加一个额外的跳计数。为了达到最优的性能,推荐代理服务和一个Alluxio worker运行在一个计算节点上。或者,推荐将所有的代理服务器放到load balancer之后。
特性支持
下表描述了对当前Amazon S3基础特性的支持情况:
| S3 Feature | Status |
|---|---|
| List Buckets | Supported |
| Delete Buckets | Supported |
| Create Bucket | Supported |
| Bucket Lifecycle | Not Supported |
| Policy (Buckets, Objects) | Not Supported |
| Bucket ACLs (Get, Put) | Not Supported |
| Bucket Location | Not Supported |
| Bucket Notification | Not Supported |
| Bucket Object Versions | Not Supported |
| Get Bucket Info (HEAD) | Not Supported |
| Put Object | Supported |
| Delete Object | Supported |
| Get Object | Supported |
| Get Object Info (HEAD) | Supported |
| Get Object (Range Query) | Not Supported [ALLUXIO-3321] |
| Object ACLs (Get, Put) | Not Supported |
| POST Object | Not Supported |
| Copy Object | Not Supported |
| Multipart Uploads | Supported |
语言支持
Alluxio S3 客户端支持各种编程语言,比如C++、Java、Python、Golang、Ruby等。在这个文档中,我们使用curl REST调用和python S3 client作为使用示例。
使用示例
REST API
举个例子,你可以使用如下的RESTful API调用方式在本地运行一个Alluxio集群。Alluxio代理会默认在39999端口监听。
创建bucket
$ curl -i -X PUT http://localhost:39999/api/v1/s3/testbucketHTTP/1.1 200 OKDate: Tue, 29 Aug 2017 22:34:41 GMTContent-Length: 0Server: Jetty(9.2.z-SNAPSHOT)
获取bucket(objects列表)
$ curl -i -X GET http://localhost:39999/api/v1/s3/testbucketHTTP/1.1 200 OKDate: Tue, 29 Aug 2017 22:35:00 GMTContent-Type: application/xmlContent-Length: 200Server: Jetty(9.2.z-SNAPSHOT)<ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken/><KeyCount>0</KeyCount><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated></ListBucketResult>
加入object
假定本地现存一个文件LICENSE。
$ curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/testobjectHTTP/1.1 100 ContinueHTTP/1.1 200 OKDate: Tue, 29 Aug 2017 22:36:03 GMTETag: "9347237b67b0be183499e5893128704e"Content-Length: 0Server: Jetty(9.2.z-SNAPSHOT)
获取object
$ curl -i -X GET http://localhost:39999/api/v1/s3/testbucket/testobjectHTTP/1.1 200 OKDate: Tue, 29 Aug 2017 22:37:34 GMTLast-Modified: Tue, 29 Aug 2017 22:36:03 GMTContent-Type: application/xmlContent-Length: 26847Server: Jetty(9.2.z-SNAPSHOT).................. Content of the test file ...................
列出含有单个object的bucket
$ curl -i -X GET http://localhost:39999/api/v1/s3/testbucketHTTP/1.1 200 OKDate: Tue, 29 Aug 2017 22:38:48 GMTContent-Type: application/xmlContent-Length: 363Server: Jetty(9.2.z-SNAPSHOT)<ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken/><KeyCount>1</KeyCount><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>testobject</Key><LastModified>2017-08-29T15:36:03.613Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>
列出含有多个objects的bucket
你可以上传更多的文件并且使用max-keys和continuation-token作为GET bucket request参数,比如:
$ curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key1$ curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key2$ curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key3$ curl -i -X GET http://localhost:39999/api/v1/s3/testbucket\?max-keys\=2HTTP/1.1 200 OKDate: Tue, 29 Aug 2017 22:40:45 GMTContent-Type: application/xmlContent-Length: 537Server: Jetty(9.2.z-SNAPSHOT)<ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken>key3</NextContinuationToken><KeyCount>2</KeyCount><MaxKeys>2</MaxKeys><IsTruncated>true</IsTruncated><Contents><Key>key1</Key><LastModified>2017-08-29T15:40:42.213Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>key2</Key><LastModified>2017-08-29T15:40:43.269Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult># curl -i -X GET http://localhost:39999/api/v1/s3/testbucket\?max-keys\=2\&continuation-token\=key3HTTP/1.1 200 OKDate: Tue, 29 Aug 2017 22:41:18 GMTContent-Type: application/xmlContent-Length: 540Server: Jetty(9.2.z-SNAPSHOT)<ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken>key3</ContinuationToken><NextContinuationToken/><KeyCount>2</KeyCount><MaxKeys>2</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>key3</Key><LastModified>2017-08-29T15:40:44.002Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>testobject</Key><LastModified>2017-08-29T15:36:03.613Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>
你还可以验证这些对象是否为Alluxio文件,在/testbucket目录下。
$ ./bin/alluxio fs ls -R /testbucket
删除objects
$ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key1HTTP/1.1 204 No ContentDate: Tue, 29 Aug 2017 22:43:22 GMTServer: Jetty(9.2.z-SNAPSHOT)
$ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key2$ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key3$ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/testobject
初始化multipart upload
$ curl -i -X POST http://localhost:39999/api/v1/s3/testbucket/testobject?uploadsHTTP/1.1 200 OKDate: Tue, 29 Aug 2017 22:43:22 GMTContent-Length: 197Server: Jetty(9.2.z-SNAPSHOT)<?xml version="1.0" encoding="UTF-8"?><InitiateMultipartUploadResult xmlns=""><Bucket>testbucket</Bucket><Key>testobject</Key><UploadId>2</UploadId></InitiateMultipartUploadResult>
上传分块
$ curl -i -X PUT http://localhost:39999/api/v1/s3/testbucket/testobject?partNumber=1&uploadId=2HTTP/1.1 200 OKDate: Tue, 29 Aug 2017 22:43:22 GMTETag: "b54357faf0632cce46e942fa68356b38"Server: Jetty(9.2.z-SNAPSHOT)
罗列已上传的分块
$ curl -i -X GET http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=2HTTP/1.1 200 OKDate: Tue, 29 Aug 2017 22:43:22 GMTContent-Length: 985Server: Jetty(9.2.z-SNAPSHOT)<?xml version="1.0" encoding="UTF-8"?><ListPartsResult xmlns=""><Bucket>testbucket</Bucket><Key>testobject</Key><UploadId>2</UploadId><StorageClass>STANDARD</StorageClass><IsTruncated>false</IsTruncated><Part><PartNumber>1</PartNumber><LastModified>2017-08-29T20:48:34.000Z</LastModified><ETag>"b54357faf0632cce46e942fa68356b38"</ETag><Size>10485760</Size></Part></ListPartsResult>
完成multipart upload
$ curl -i -X POST http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=2 -d '<CompleteMultipartUpload><Part><PartNumber>1</PartNumber><ETag>"b54357faf0632cce46e942fa68356b38"</ETag></Part></CompleteMultipartUpload>'HTTP/1.1 200 OKDate: Tue, 29 Aug 2017 22:43:22 GMTServer: Jetty(9.2.z-SNAPSHOT)<?xml version="1.0" encoding="UTF-8"?><CompleteMultipartUploadResult xmlns=""><Location>/testbucket/testobjectLocation><Bucket>testbucket</Bucket><Key>testobject</Key><ETag>"b54357faf0632cce46e942fa68356b38"</ETag></CompleteMultipartUploadResult>
中止multipart upload
$ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=2HTTP/1.1 204 OKDate: Tue, 29 Aug 2017 22:43:22 GMTContent-Length: 0Server: Jetty(9.2.z-SNAPSHOT)
删除空bucket
$ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucketHTTP/1.1 204 No ContentDate: Tue, 29 Aug 2017 22:45:19 GMT
Python S3 Client
创建连接
import botoimport boto.s3.connectionconn = boto.connect_s3(aws_access_key_id = '',aws_secret_access_key = '',host = 'localhost',port = 39999,path = '/api/v1/s3',is_secure=False,calling_format = boto.s3.connection.OrdinaryCallingFormat(),)
创建bucket
bucketName = 'bucket-for-testing'bucket = conn.create_bucket(bucketName)
加入small object
smallObjectKey = 'small.txt'smallObjectContent = 'Hello World!'key = bucket.new_key(smallObjectKey)key.set_contents_from_string(smallObjectContent)
获取small object
assert smallObjectContent == key.get_contents_as_string()
上传large object
在本地文件系统创建一个8MB文件
$ dd if=/dev/zero of=8mb.data bs=1048576 count=8
使用python S3 client把它作为object上传
largeObjectKey = 'large.txt'largeObjectFile = '8mb.data'key = bucket.new_key(largeObjectKey)with open(largeObjectFile, 'rb') as f:key.set_contents_from_file(f)with open(largeObjectFile, 'rb') as f:largeObject = f.read()
获取large objecy
assert largeObject == key.get_contents_as_string()
删除objects
bucket.delete_key(smallObjectKey)bucket.delete_key(largeObjectKey)
初始化multipart upload
mp = bucket.initiate_multipart_upload(largeObjectFile)
上传分块
import math, osfrom filechunkio import FileChunkIO# Use a chunk size of 1MB (feel free to change this)sourceSize = os.stat(largeObjectFile).st_sizechunkSize = 1048576chunkCount = int(math.ceil(sourceSize / float(chunkSize)))for i in range(chunkCount):offset = chunkSize * ibytes = min(chunkSize, sourceSize - offset)with FileChunkIO(largeObjectFile, 'r', offset=offset, bytes=bytes) as fp:mp.upload_part_from_file(fp, part_num=i + 1)
完成multipart upload
mp.complete_upload()
中止multipart upload
mp.cancel_upload()
删除bucket
conn.delete_bucket(bucketName)

