S3 Client

Slack Docker Pulls GitHub edit source

Alluxio支持RESTful API,兼容Amazon S3 API 的基本操作。

REST API 手册会在Alluxio构建时生成并且可以通过${ALLUXIO_HOME}/core/server/proxy/target/miredot/index.html获得。

使用HTTP代理会带来一些性能的影响,尤其是在使用代理的时候会增加一个额外的跳计数。为了达到最优的性能,推荐代理服务和一个Alluxio worker运行在一个计算节点上。或者,推荐将所有的代理服务器放到load balancer之后。

特性支持

下表描述了对当前Amazon S3基础特性的支持情况:

S3 FeatureStatus
List BucketsSupported
Delete BucketsSupported
Create BucketSupported
Bucket LifecycleNot Supported
Policy (Buckets, Objects)Not Supported
Bucket ACLs (Get, Put)Not Supported
Bucket LocationNot Supported
Bucket NotificationNot Supported
Bucket Object VersionsNot Supported
Get Bucket Info (HEAD)Not Supported
Put ObjectSupported
Delete ObjectSupported
Get ObjectSupported
Get Object Info (HEAD)Supported
Get Object (Range Query)Not Supported [ALLUXIO-3321]
Object ACLs (Get, Put)Not Supported
POST ObjectNot Supported
Copy ObjectNot Supported
Multipart UploadsSupported

语言支持

Alluxio S3 客户端支持各种编程语言,比如C++、Java、Python、Golang、Ruby等。在这个文档中,我们使用curl REST调用和python S3 client作为使用示例。

使用示例

REST API

举个例子,你可以使用如下的RESTful API调用方式在本地运行一个Alluxio集群。Alluxio代理会默认在39999端口监听。

创建bucket

  1. # curl -i -X PUT http://localhost:39999/api/v1/s3/testbucket
  2. HTTP/1.1 200 OK
  3. Date: Tue, 29 Aug 2017 22:34:41 GMT
  4. Content-Length: 0
  5. Server: Jetty(9.2.z-SNAPSHOT)

获取bucket(objects列表)

  1. # curl -i -X GET http://localhost:39999/api/v1/s3/testbucket
  2. HTTP/1.1 200 OK
  3. Date: Tue, 29 Aug 2017 22:35:00 GMT
  4. Content-Type: application/xml
  5. Content-Length: 200
  6. Server: Jetty(9.2.z-SNAPSHOT)
  7. <ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken/><KeyCount>0</KeyCount><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated></ListBucketResult>

加入object

假定本地现存一个文件LICENSE

  1. # curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/testobject
  2. HTTP/1.1 100 Continue
  3. HTTP/1.1 200 OK
  4. Date: Tue, 29 Aug 2017 22:36:03 GMT
  5. ETag: "9347237b67b0be183499e5893128704e"
  6. Content-Length: 0
  7. Server: Jetty(9.2.z-SNAPSHOT)

获取object

  1. # curl -i -X GET http://localhost:39999/api/v1/s3/testbucket/testobject
  2. HTTP/1.1 200 OK
  3. Date: Tue, 29 Aug 2017 22:37:34 GMT
  4. Last-Modified: Tue, 29 Aug 2017 22:36:03 GMT
  5. Content-Type: application/xml
  6. Content-Length: 26847
  7. Server: Jetty(9.2.z-SNAPSHOT)
  8. .................. Content of the test file ...................

列出含有单个object的bucket

  1. # curl -i -X GET http://localhost:39999/api/v1/s3/testbucket
  2. HTTP/1.1 200 OK
  3. Date: Tue, 29 Aug 2017 22:38:48 GMT
  4. Content-Type: application/xml
  5. Content-Length: 363
  6. Server: Jetty(9.2.z-SNAPSHOT)
  7. <ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken/><KeyCount>1</KeyCount><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>testobject</Key><LastModified>2017-08-29T15:36:03.613Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>

列出含有多个objects的bucket

你可以上传更多的文件并且使用max-keyscontinuation-token作为GET bucket request参数,比如:

  1. # curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key1
  2. # curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key2
  3. # curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key3
  4. # curl -i -X GET http://localhost:39999/api/v1/s3/testbucket\?max-keys\=2
  5. HTTP/1.1 200 OK
  6. Date: Tue, 29 Aug 2017 22:40:45 GMT
  7. Content-Type: application/xml
  8. Content-Length: 537
  9. Server: Jetty(9.2.z-SNAPSHOT)
  10. <ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken>key3</NextContinuationToken><KeyCount>2</KeyCount><MaxKeys>2</MaxKeys><IsTruncated>true</IsTruncated><Contents><Key>key1</Key><LastModified>2017-08-29T15:40:42.213Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>key2</Key><LastModified>2017-08-29T15:40:43.269Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>
  11. # curl -i -X GET http://localhost:39999/api/v1/s3/testbucket\?max-keys\=2\&continuation-token\=key3
  12. HTTP/1.1 200 OK
  13. Date: Tue, 29 Aug 2017 22:41:18 GMT
  14. Content-Type: application/xml
  15. Content-Length: 540
  16. Server: Jetty(9.2.z-SNAPSHOT)
  17. <ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken>key3</ContinuationToken><NextContinuationToken/><KeyCount>2</KeyCount><MaxKeys>2</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>key3</Key><LastModified>2017-08-29T15:40:44.002Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>testobject</Key><LastModified>2017-08-29T15:36:03.613Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>

你还可以验证这些对象是否为Alluxio文件,在/testbucket目录下。

  1. ./bin/alluxio fs ls -R /testbucket

删除objects

  1. # curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key1
  2. HTTP/1.1 204 No Content
  3. Date: Tue, 29 Aug 2017 22:43:22 GMT
  4. Server: Jetty(9.2.z-SNAPSHOT)
  1. # curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key2
  2. # curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key3
  3. # curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/testobject

初始化multipart upload

  1. # curl -i -X POST http://localhost:39999/api/v1/s3/testbucket/testobject?uploads
  2. HTTP/1.1 200 OK
  3. Date: Tue, 29 Aug 2017 22:43:22 GMT
  4. Content-Length: 197
  5. Server: Jetty(9.2.z-SNAPSHOT)
  6. <?xml version="1.0" encoding="UTF-8"?>
  7. <InitiateMultipartUploadResult xmlns="">
  8. <Bucket>testbucket</Bucket>
  9. <Key>testobject</Key>
  10. <UploadId>2</UploadId>
  11. </InitiateMultipartUploadResult>

上传分块

  1. # curl -i -X PUT http://localhost:39999/api/v1/s3/testbucket/testobject?partNumber=1&uploadId=2
  2. HTTP/1.1 200 OK
  3. Date: Tue, 29 Aug 2017 22:43:22 GMT
  4. ETag: "b54357faf0632cce46e942fa68356b38"
  5. Server: Jetty(9.2.z-SNAPSHOT)

罗列已上传的分块

  1. # curl -i -X GET http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=2
  2. HTTP/1.1 200 OK
  3. Date: Tue, 29 Aug 2017 22:43:22 GMT
  4. Content-Length: 985
  5. Server: Jetty(9.2.z-SNAPSHOT)
  6. <?xml version="1.0" encoding="UTF-8"?>
  7. <ListPartsResult xmlns="">
  8. <Bucket>testbucket</Bucket>
  9. <Key>testobject</Key>
  10. <UploadId>2</UploadId>
  11. <StorageClass>STANDARD</StorageClass>
  12. <IsTruncated>false</IsTruncated>
  13. <Part>
  14. <PartNumber>1</PartNumber>
  15. <LastModified>2017-08-29T20:48:34.000Z</LastModified>
  16. <ETag>"b54357faf0632cce46e942fa68356b38"</ETag>
  17. <Size>10485760</Size>
  18. </Part>
  19. </ListPartsResult>

完成multipart upload

  1. # curl -i -X POST http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=2 -d '
  2. <CompleteMultipartUpload>
  3. <Part>
  4. <PartNumber>1</PartNumber>
  5. <ETag>"b54357faf0632cce46e942fa68356b38"</ETag>
  6. </Part>
  7. </CompleteMultipartUpload>'
  8. HTTP/1.1 200 OK
  9. Date: Tue, 29 Aug 2017 22:43:22 GMT
  10. Server: Jetty(9.2.z-SNAPSHOT)
  11. <?xml version="1.0" encoding="UTF-8"?>
  12. <CompleteMultipartUploadResult xmlns="">
  13. <Location>/testbucket/testobjectLocation>
  14. <Bucket>testbucket</Bucket>
  15. <Key>testobject</Key>
  16. <ETag>"b54357faf0632cce46e942fa68356b38"</ETag>
  17. </CompleteMultipartUploadResult>

中止multipart upload

  1. # curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=2
  2. HTTP/1.1 204 OK
  3. Date: Tue, 29 Aug 2017 22:43:22 GMT
  4. Content-Length: 0
  5. Server: Jetty(9.2.z-SNAPSHOT)

删除空bucket

  1. # curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket
  2. HTTP/1.1 204 No Content
  3. Date: Tue, 29 Aug 2017 22:45:19 GMT

Python S3 Client

创建连接

  1. import boto
  2. import boto.s3.connection
  3. conn = boto.connect_s3(
  4. aws_access_key_id = '',
  5. aws_secret_access_key = '',
  6. host = 'localhost',
  7. port = 39999,
  8. path = '/api/v1/s3',
  9. is_secure=False,
  10. calling_format = boto.s3.connection.OrdinaryCallingFormat(),
  11. )

创建bucket

  1. bucketName = 'bucket-for-testing'
  2. bucket = conn.create_bucket(bucketName)

加入small object

  1. smallObjectKey = 'small.txt'
  2. smallObjectContent = 'Hello World!'
  3. key = bucket.new_key(smallObjectKey)
  4. key.set_contents_from_string(smallObjectContent)

获取small object

  1. assert smallObjectContent == key.get_contents_as_string()

上传large object

在本地文件系统创建一个8MB文件

  1. # dd if=/dev/zero of=8mb.data bs=1048576 count=8

使用python S3 client把它作为object上传

  1. largeObjectKey = 'large.txt'
  2. largeObjectFile = '8mb.data'
  3. key = bucket.new_key(largeObjectKey)
  4. with open(largeObjectFile, 'rb') as f:
  5. key.set_contents_from_file(f)
  6. with open(largeObjectFile, 'rb') as f:
  7. largeObject = f.read()

获取large objecy

  1. assert largeObject == key.get_contents_as_string()

删除objects

  1. bucket.delete_key(smallObjectKey)
  2. bucket.delete_key(largeObjectKey)

初始化multipart upload

  1. mp = bucket.initiate_multipart_upload(largeObjectFile)

上传分块

  1. import math, os
  2. from filechunkio import FileChunkIO
  3. # Use a chunk size of 1MB (feel free to change this)
  4. sourceSize = os.stat(largeObjectFile).st_size
  5. chunkSize = 1048576
  6. chunkCount = int(math.ceil(sourceSize / float(chunkSize)))
  7. for i in range(chunkCount):
  8. offset = chunkSize * i
  9. bytes = min(chunkSize, sourceSize - offset)
  10. with FileChunkIO(largeObjectFile, 'r', offset=offset, bytes=bytes) as fp:
  11. mp.upload_part_from_file(fp, part_num=i + 1)

完成multipart upload

  1. mp.complete_upload()

中止multipart upload

  1. mp.cancel_upload()

删除bucket

  1. conn.delete_bucket(bucketName)