S3 API

Slack Docker Pulls GitHub edit source

Alluxio supports a RESTFul API that is compatible with the basic operations of the Amazon S3 API.

The REST API documentation is generated as part of Alluxio build and accessible through ${ALLUXIO_HOME}/core/server/proxy/target/miredot/index.html.

The Alluxio S3 API should be used by applications designed to communicate with an S3-like storage and would benefit from the other features provided by Alluxio, such as data caching, data sharing with file system based applications, and storage system abstraction (e.g., using Ceph instead of S3 as the backing store). For example, a simple application that downloads reports generated by analytic tasks can use the S3 API instead of the more complex file system API.

There are performance implications of using the S3 API. The S3 API leverages the Alluxio proxy, introducing an extra hop. For optimal performance, it is recommended to run the proxy server and an Alluxio worker on each compute node. It is also recommended to put all the proxy servers behind a load balancer.

Features support

The following table describes the support status for current Amazon S3 functional features:

S3 FeatureStatus
List BucketsSupported
Delete BucketsSupported
Create BucketSupported
Bucket LifecycleNot Supported
Policy (Buckets, Objects)Not Supported
Bucket ACLs (Get, Put)Not Supported
Bucket LocationNot Supported
Bucket NotificationNot Supported
Bucket Object VersionsNot Supported
Get Bucket Info (HEAD)Not Supported
Put ObjectSupported
Delete ObjectSupported
Get ObjectSupported
Get Object Info (HEAD)Supported
Get Object (Range Query)Not Supported [ALLUXIO-3321]
Object ACLs (Get, Put)Not Supported
POST ObjectNot Supported
Copy ObjectNot Supported
Multipart UploadsSupported

Language support

Alluxio S3 client supports various programming languages, such as C++, Java, Python, Golang, and Ruby. In this documentation, we use curl REST calls and python S3 client as usage examples.

Example Usage

REST API

For example, you can run the following RESTFul API calls to an Alluxio cluster running on localhost. The Alluxio proxy is listening at port 39999 by default.

Create a bucket

  1. # curl -i -X PUT http://localhost:39999/api/v1/s3/testbucket
  2. HTTP/1.1 200 OK
  3. Date: Tue, 29 Aug 2017 22:34:41 GMT
  4. Content-Length: 0
  5. Server: Jetty(9.2.z-SNAPSHOT)

Get the bucket (listing objects)

  1. # curl -i -X GET http://localhost:39999/api/v1/s3/testbucket
  2. HTTP/1.1 200 OK
  3. Date: Tue, 29 Aug 2017 22:35:00 GMT
  4. Content-Type: application/xml
  5. Content-Length: 200
  6. Server: Jetty(9.2.z-SNAPSHOT)
  7. <ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken/><KeyCount>0</KeyCount><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated></ListBucketResult>

Put an object

Assuming there is an existing file on local file system called LICENSE:

  1. # curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/testobject
  2. HTTP/1.1 100 Continue
  3. HTTP/1.1 200 OK
  4. Date: Tue, 29 Aug 2017 22:36:03 GMT
  5. ETag: "9347237b67b0be183499e5893128704e"
  6. Content-Length: 0
  7. Server: Jetty(9.2.z-SNAPSHOT)

Get the object:

  1. # curl -i -X GET http://localhost:39999/api/v1/s3/testbucket/testobject
  2. HTTP/1.1 200 OK
  3. Date: Tue, 29 Aug 2017 22:37:34 GMT
  4. Last-Modified: Tue, 29 Aug 2017 22:36:03 GMT
  5. Content-Type: application/xml
  6. Content-Length: 26847
  7. Server: Jetty(9.2.z-SNAPSHOT)
  8. .................. Content of the test file ...................

Listing a bucket with one object

  1. # curl -i -X GET http://localhost:39999/api/v1/s3/testbucket
  2. HTTP/1.1 200 OK
  3. Date: Tue, 29 Aug 2017 22:38:48 GMT
  4. Content-Type: application/xml
  5. Content-Length: 363
  6. Server: Jetty(9.2.z-SNAPSHOT)
  7. <ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken/><KeyCount>1</KeyCount><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>testobject</Key><LastModified>2017-08-29T15:36:03.613Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>

Listing a bucket with multiple objects

You can upload more files and use the max-keys and continuation-token as the GET bucket request param. For example:

  1. # curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key1
  2. # curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key2
  3. # curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key3
  4. # curl -i -X GET http://localhost:39999/api/v1/s3/testbucket\?max-keys\=2
  5. HTTP/1.1 200 OK
  6. Date: Tue, 29 Aug 2017 22:40:45 GMT
  7. Content-Type: application/xml
  8. Content-Length: 537
  9. Server: Jetty(9.2.z-SNAPSHOT)
  10. <ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken>key3</NextContinuationToken><KeyCount>2</KeyCount><MaxKeys>2</MaxKeys><IsTruncated>true</IsTruncated><Contents><Key>key1</Key><LastModified>2017-08-29T15:40:42.213Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>key2</Key><LastModified>2017-08-29T15:40:43.269Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>
  11. # curl -i -X GET http://localhost:39999/api/v1/s3/testbucket\?max-keys\=2\&continuation-token\=key3
  12. HTTP/1.1 200 OK
  13. Date: Tue, 29 Aug 2017 22:41:18 GMT
  14. Content-Type: application/xml
  15. Content-Length: 540
  16. Server: Jetty(9.2.z-SNAPSHOT)
  17. <ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken>key3</ContinuationToken><NextContinuationToken/><KeyCount>2</KeyCount><MaxKeys>2</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>key3</Key><LastModified>2017-08-29T15:40:44.002Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>testobject</Key><LastModified>2017-08-29T15:36:03.613Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>

You can also verify those objects are represented as Alluxio files, under /testbucket directory.

  1. ./bin/alluxio fs ls -R /testbucket

Delete objects

  1. # curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key1
  2. HTTP/1.1 204 No Content
  3. Date: Tue, 29 Aug 2017 22:43:22 GMT
  4. Server: Jetty(9.2.z-SNAPSHOT)
  1. # curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key2
  2. # curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key3
  3. # curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/testobject

Initiate a multipart upload

  1. # curl -i -X POST http://localhost:39999/api/v1/s3/testbucket/testobject?uploads
  2. HTTP/1.1 200 OK
  3. Date: Tue, 29 Aug 2017 22:43:22 GMT
  4. Content-Length: 197
  5. Server: Jetty(9.2.z-SNAPSHOT)
  6. <?xml version="1.0" encoding="UTF-8"?>
  7. <InitiateMultipartUploadResult xmlns="">
  8. <Bucket>testbucket</Bucket>
  9. <Key>testobject</Key>
  10. <UploadId>2</UploadId>
  11. </InitiateMultipartUploadResult>

Upload part

  1. # curl -i -X PUT 'http://localhost:39999/api/v1/s3/testbucket/testobject?partNumber=1&uploadId=2'
  2. HTTP/1.1 200 OK
  3. Date: Tue, 29 Aug 2017 22:43:22 GMT
  4. ETag: "b54357faf0632cce46e942fa68356b38"
  5. Server: Jetty(9.2.z-SNAPSHOT)

List parts

  1. # curl -i -X GET http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=2
  2. HTTP/1.1 200 OK
  3. Date: Tue, 29 Aug 2017 22:43:22 GMT
  4. Content-Length: 985
  5. Server: Jetty(9.2.z-SNAPSHOT)
  6. <?xml version="1.0" encoding="UTF-8"?>
  7. <ListPartsResult xmlns="">
  8. <Bucket>testbucket</Bucket>
  9. <Key>testobject</Key>
  10. <UploadId>2</UploadId>
  11. <StorageClass>STANDARD</StorageClass>
  12. <IsTruncated>false</IsTruncated>
  13. <Part>
  14. <PartNumber>1</PartNumber>
  15. <LastModified>2017-08-29T20:48:34.000Z</LastModified>
  16. <ETag>"b54357faf0632cce46e942fa68356b38"</ETag>
  17. <Size>10485760</Size>
  18. </Part>
  19. </ListPartsResult>

Complete a multipart upload

  1. # curl -i -X POST http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=2
  2. <CompleteMultipartUpload>
  3. <Part>
  4. <PartNumber>1</PartNumber>
  5. <ETag>"b54357faf0632cce46e942fa68356b38"</ETag>
  6. </Part>
  7. </CompleteMultipartUpload>'
  8. HTTP/1.1 200 OK
  9. Date: Tue, 29 Aug 2017 22:43:22 GMT
  10. Server: Jetty(9.2.z-SNAPSHOT)
  11. <?xml version="1.0" encoding="UTF-8"?>
  12. <CompleteMultipartUploadResult xmlns="">
  13. <Location>/testbucket/testobjectLocation>
  14. <Bucket>testbucket</Bucket>
  15. <Key>testobject</Key>
  16. <ETag>"b54357faf0632cce46e942fa68356b38"</ETag>
  17. </CompleteMultipartUploadResult>

Abort a multipart upload

  1. # curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=2
  2. HTTP/1.1 204 OK
  3. Date: Tue, 29 Aug 2017 22:43:22 GMT
  4. Content-Length: 0
  5. Server: Jetty(9.2.z-SNAPSHOT)

Delete an empty bucket

  1. # curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket
  2. HTTP/1.1 204 No Content
  3. Date: Tue, 29 Aug 2017 22:45:19 GMT

Python S3 Client

Create a connection:

  1. import boto
  2. import boto.s3.connection
  3. conn = boto.connect_s3(
  4. aws_access_key_id = '',
  5. aws_secret_access_key = '',
  6. host = 'localhost',
  7. port = 39999,
  8. path = '/api/v1/s3',
  9. is_secure=False,
  10. calling_format = boto.s3.connection.OrdinaryCallingFormat(),
  11. )

Create a bucket

  1. bucketName = 'bucket-for-testing'
  2. bucket = conn.create_bucket(bucketName)

PUT a small object

  1. smallObjectKey = 'small.txt'
  2. smallObjectContent = 'Hello World!'
  3. key = bucket.new_key(smallObjectKey)
  4. key.set_contents_from_string(smallObjectContent)

Get the small object

  1. assert smallObjectContent == key.get_contents_as_string()

Upload a large object

Create a 8MB file on local file system.

  1. # dd if=/dev/zero of=8mb.data bs=1048576 count=8

Then use python S3 client to upload this as an object

  1. largeObjectKey = 'large.txt'
  2. largeObjectFile = '8mb.data'
  3. key = bucket.new_key(largeObjectKey)
  4. with open(largeObjectFile, 'rb') as f:
  5. key.set_contents_from_file(f)
  6. with open(largeObjectFile, 'rb') as f:
  7. largeObject = f.read()

Get the large object

  1. assert largeObject == key.get_contents_as_string()

Delete the objects

  1. bucket.delete_key(smallObjectKey)
  2. bucket.delete_key(largeObjectKey)

Initiate a multipart upload

  1. mp = bucket.initiate_multipart_upload(largeObjectFile)

Upload parts

  1. import math, os
  2. from filechunkio import FileChunkIO
  3. # Use a chunk size of 1MB (feel free to change this)
  4. sourceSize = os.stat(largeObjectFile).st_size
  5. chunkSize = 1048576
  6. chunkCount = int(math.ceil(sourceSize / float(chunkSize)))
  7. for i in range(chunkCount):
  8. offset = chunkSize * i
  9. bytes = min(chunkSize, sourceSize - offset)
  10. with FileChunkIO(largeObjectFile, 'r', offset=offset, bytes=bytes) as fp:
  11. mp.upload_part_from_file(fp, part_num=i + 1)

Complete the multipart upload

  1. mp.complete_upload()

Abort the multipart upload

  1. mp.cancel_upload()

Delete the bucket

  1. conn.delete_bucket(bucketName)