S3 API
Alluxio supports a RESTful API that is compatible with the basic operations of the Amazon S3 API.
The Alluxio S3 API should be used by applications designed to communicate with an S3-like storage and would benefit from the other features provided by Alluxio, such as data caching, data sharing with file system based applications, and storage system abstraction (e.g., using Ceph instead of S3 as the backing store). For example, a simple application that downloads reports generated by analytic tasks can use the S3 API instead of the more complex file system API.
There are performance implications of using the S3 API. The S3 API leverages the Alluxio proxy, introducing an extra hop. For optimal performance, it is recommended to run the proxy server and an Alluxio worker on each compute node. It is also recommended to put all the proxy servers behind a load balancer.
Features support
The following table describes the support status for current Amazon S3 functional features:
| S3 Feature | Status |
|---|---|
| List Buckets | Supported |
| Delete Buckets | Supported |
| Create Bucket | Supported |
| Bucket Lifecycle | Not Supported |
| Policy (Buckets, Objects) | Not Supported |
| Bucket ACLs (Get, Put) | Not Supported |
| Bucket Location | Not Supported |
| Bucket Notification | Not Supported |
| Bucket Object Versions | Not Supported |
| Get Bucket Info (HEAD) | Not Supported |
| Put Object | Supported |
| Delete Object | Supported |
| Get Object | Supported |
| Get Object Info (HEAD) | Supported |
| Get Object (Range Query) | Not Supported [ALLUXIO-3321] |
| Object ACLs (Get, Put) | Not Supported |
| POST Object | Not Supported |
| Copy Object | Not Supported |
| Multipart Uploads | Supported |
Language support
Alluxio S3 client supports various programming languages, such as C++, Java, Python, Golang, and Ruby. In this documentation, we use curl REST calls and python S3 client as usage examples.
Example Usage
REST API
For example, you can run the following RESTful API calls to an Alluxio cluster running on localhost. The Alluxio proxy is listening at port 39999 by default.
Create a bucket
$ curl -i -X PUT http://localhost:39999/api/v1/s3/testbucketHTTP/1.1 200 OKDate: Tue, 18 Jun 2019 21:23:18 GMTContent-Length: 0Server: Jetty(9.2.z-SNAPSHOT)
Get the bucket (listing objects)
$ curl -i -X GET http://localhost:39999/api/v1/s3/testbucketHTTP/1.1 200 OKDate: Tue, 18 Jun 2019 21:23:56 GMTContent-Type: application/xmlContent-Length: 191Server: Jetty(9.2.z-SNAPSHOT)<ListBucketResult><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken/><KeyCount>0</KeyCount><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated></ListBucketResult>
Put an object
Assuming there is an existing file on local file system called LICENSE:
$ curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/testobjectHTTP/1.1 100 ContinueHTTP/1.1 200 OKDate: Tue, 18 Jun 2019 21:24:32 GMTETag: "911df44b7ff57801ca8d74568e4ebfbe"Content-Length: 0Server: Jetty(9.2.z-SNAPSHOT)
Get the object:
$ curl -i -X GET http://localhost:39999/api/v1/s3/testbucket/testobjectHTTP/1.1 200 OKDate: Tue, 18 Jun 2019 21:24:57 GMTLast-Modified: Tue, 18 Jun 2019 21:24:33 GMTContent-Type: application/xmlContent-Length: 27040Server: Jetty(9.2.z-SNAPSHOT).................. Content of the test file ...................
Listing a bucket with one object
$ curl -i -X GET http://localhost:39999/api/v1/s3/testbucketHTTP/1.1 200 OKDate: Tue, 18 Jun 2019 21:25:27 GMTContent-Type: application/xmlContent-Length: 354Server: Jetty(9.2.z-SNAPSHOT)<ListBucketResult><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken/><KeyCount>1</KeyCount><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>testobject</Key><LastModified>2019-06-18T14:24:33.029Z</LastModified><ETag></ETag><Size>27040</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>
Listing a bucket with multiple objects
You can upload more files and use the max-keys and continuation-token as the GET bucket request parameter. For example:
$ curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key1HTTP/1.1 100 ContinueHTTP/1.1 200 OKDate: Tue, 18 Jun 2019 21:26:05 GMTETag: "911df44b7ff57801ca8d74568e4ebfbe"Content-Length: 0Server: Jetty(9.2.z-SNAPSHOT)# curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key2HTTP/1.1 100 ContinueHTTP/1.1 200 OKDate: Tue, 18 Jun 2019 21:26:28 GMTETag: "911df44b7ff57801ca8d74568e4ebfbe"Content-Length: 0Server: Jetty(9.2.z-SNAPSHOT)# curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key3HTTP/1.1 100 ContinueHTTP/1.1 200 OKDate: Tue, 18 Jun 2019 21:26:43 GMTETag: "911df44b7ff57801ca8d74568e4ebfbe"Content-Length: 0Server: Jetty(9.2.z-SNAPSHOT)# curl -i -X GET http://localhost:39999/api/v1/s3/testbucket\?max-keys\=2HTTP/1.1 200 OKDate: Tue, 18 Jun 2019 21:26:57 GMTContent-Type: application/xmlContent-Length: 528Server: Jetty(9.2.z-SNAPSHOT)<ListBucketResult><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken>key3</NextContinuationToken><KeyCount>2</KeyCount><MaxKeys>2</MaxKeys><IsTruncated>true</IsTruncated><Contents><Key>key1</Key><LastModified>2019-06-18T14:26:05.694Z</LastModified><ETag></ETag><Size>27040</Size><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>key2</Key><LastModified>2019-06-18T14:26:28.153Z</LastModified><ETag></ETag><Size>27040</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult># curl -i -X GET http://localhost:39999/api/v1/s3/testbucket\?max-keys\=2\&continuation-token\=key3HTTP/1.1 200 OKDate: Tue, 18 Jun 2019 21:28:14 GMTContent-Type: application/xmlContent-Length: 531Server: Jetty(9.2.z-SNAPSHOT)<ListBucketResult><Name>/testbucket</Name><Prefix/><ContinuationToken>key3</ContinuationToken><NextContinuationToken/><KeyCount>2</KeyCount><MaxKeys>2</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>key3</Key><LastModified>2019-06-18T14:26:43.081Z</LastModified><ETag></ETag><Size>27040</Size><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>testobject</Key><LastModified>2019-06-18T14:24:33.029Z</LastModified><ETag></ETag><Size>27040</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>
You can also verify those objects are represented as Alluxio files, under /testbucket directory.
$ ./bin/alluxio fs ls -R /testbucket-rw-r--r-- alluxio staff 27040 PERSISTED 06-18-2019 14:26:05:694 100% /testbucket/key1-rw-r--r-- alluxio staff 27040 PERSISTED 06-18-2019 14:26:28:153 100% /testbucket/key2-rw-r--r-- alluxio staff 27040 PERSISTED 06-18-2019 14:26:43:081 100% /testbucket/key3-rw-r--r-- alluxio staff 27040 PERSISTED 06-18-2019 14:24:33:029 100% /testbucket/testobject
Delete objects
$ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key1HTTP/1.1 204 No ContentDate: Tue, 18 Jun 2019 21:31:27 GMTServer: Jetty(9.2.z-SNAPSHOT)
$ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key2HTTP/1.1 204 No ContentDate: Tue, 18 Jun 2019 21:31:44 GMTServer: Jetty(9.2.z-SNAPSHOT)# curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key3HTTP/1.1 204 No ContentDate: Tue, 18 Jun 2019 21:31:58 GMTServer: Jetty(9.2.z-SNAPSHOT)# curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/testobjectHTTP/1.1 204 No ContentDate: Tue, 18 Jun 2019 21:32:08 GMTServer: Jetty(9.2.z-SNAPSHOT)
Initiate a multipart upload
Since we deleted the testobject in the previous command, you have to create another testobject before initiating a multipart upload.
$ curl -i -X POST http://localhost:39999/api/v1/s3/testbucket/testobject?uploadsHTTP/1.1 200 OKDate: Tue, 18 Jun 2019 21:32:36 GMTContent-Type: application/xmlContent-Length: 133Server: Jetty(9.2.z-SNAPSHOT)<InitiateMultipartUploadResult><Bucket>testbucket</Bucket><Key>testobject</Key><UploadId>3</UploadId></InitiateMultipartUploadResult>
Note that the commands below related to multipart upload need the upload ID shown above, it’s not necessarily 3.
Upload part
$ curl -i -X PUT 'http://localhost:39999/api/v1/s3/testbucket/testobject?partNumber=1&uploadId=3'HTTP/1.1 200 OKDate: Tue, 18 Jun 2019 21:33:36 GMTETag: "d41d8cd98f00b204e9800998ecf8427e"Content-Length: 0Server: Jetty(9.2.z-SNAPSHOT)
List parts
$ curl -i -X GET http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=3HTTP/1.1 200 OKDate: Tue, 18 Jun 2019 21:35:10 GMTContent-Type: application/xmlContent-Length: 296Server: Jetty(9.2.z-SNAPSHOT)<ListPartsResult><Bucket>/testbucket</Bucket><Key>testobject</Key><UploadId>3</UploadId><StorageClass>STANDARD</StorageClass><IsTruncated>false</IsTruncated><Part><PartNumber>1</PartNumber><LastModified>2019-06-18T14:33:36.373Z</LastModified><ETag>""</ETag><Size>0</Size></Part></ListPartsResult>
Complete a multipart upload
$ curl -i -X POST http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=3HTTP/1.1 200 OKDate: Tue, 18 Jun 2019 21:35:47 GMTContent-Type: application/xmlContent-Length: 201Server: Jetty(9.2.z-SNAPSHOT)<CompleteMultipartUploadResult><Location>/testbucket/testobject</Location><Bucket>testbucket</Bucket><Key>testobject</Key><ETag>"d41d8cd98f00b204e9800998ecf8427e"</ETag></CompleteMultipartUploadResult>
Abort a multipart upload
A non-completed upload can be aborted:
$ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=3HTTP/1.1 204 No ContentDate: Tue, 18 Jun 2019 21:37:27 GMTServer: Jetty(9.2.z-SNAPSHOT)
Delete an empty bucket
$ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucketHTTP/1.1 204 No ContentDate: Tue, 18 Jun 2019 21:38:38 GMTServer: Jetty(9.2.z-SNAPSHOT)
Python S3 Client
Tested for Python 2.7.
Create a connection:
Please note you have to install boto package first.
pip install boto
import botoimport boto.s3.connectionconn = boto.connect_s3(aws_access_key_id = '',aws_secret_access_key = '',host = 'localhost',port = 39999,path = '/api/v1/s3',is_secure=False,calling_format = boto.s3.connection.OrdinaryCallingFormat(),)
Create a bucket
bucketName = 'bucket-for-testing'bucket = conn.create_bucket(bucketName)
PUT a small object
smallObjectKey = 'small.txt'smallObjectContent = 'Hello World!'key = bucket.new_key(smallObjectKey)key.set_contents_from_string(smallObjectContent)
Get the small object
assert smallObjectContent == key.get_contents_as_string()
Upload a large object
Create a 8MB file on local file system.
$ dd if=/dev/zero of=8mb.data bs=1048576 count=8
Then use python S3 client to upload this as an object
largeObjectKey = 'large.txt'largeObjectFile = '8mb.data'key = bucket.new_key(largeObjectKey)with open(largeObjectFile, 'rb') as f:key.set_contents_from_file(f)with open(largeObjectFile, 'rb') as f:largeObject = f.read()
Get the large object
assert largeObject == key.get_contents_as_string()
Delete the objects
bucket.delete_key(smallObjectKey)bucket.delete_key(largeObjectKey)
Initiate a multipart upload
mp = bucket.initiate_multipart_upload(largeObjectKey)
Upload parts
import math, osfrom filechunkio import FileChunkIO# Use a chunk size of 1MB (feel free to change this)sourceSize = os.stat(largeObjectFile).st_sizechunkSize = 1048576chunkCount = int(math.ceil(sourceSize / float(chunkSize)))for i in range(chunkCount):offset = chunkSize * ibytes = min(chunkSize, sourceSize - offset)with FileChunkIO(largeObjectFile, 'r', offset=offset, bytes=bytes) as fp:mp.upload_part_from_file(fp, part_num=i + 1)
Complete the multipart upload
mp.complete_upload()
Abort the multipart upload
Non-completed uploads can be aborted.
mp.cancel_upload()
Delete the bucket
bucket.delete_key(largeObjectKey)conn.delete_bucket(bucketName)
