547 lines
18 KiB
ReStructuredText
547 lines
18 KiB
ReStructuredText
.. _s3_guide:
|
|
|
|
S3
|
|
==
|
|
|
|
By following this guide, you will learn how to use features of S3 client that
|
|
are unique to the SDK, specifically the generation and use of pre-signed URLs,
|
|
pre-signed POSTs, and the use of the transfer manager. You will also learn how
|
|
to use a few common, but important, settings specific to S3.
|
|
|
|
|
|
Changing the Addressing Style
|
|
-----------------------------
|
|
|
|
S3 supports two different ways to address a bucket, Virtual Host Style and Path
|
|
Style. This guide won't cover all the details of `virtual host addressing`_, but
|
|
you can read up on that in S3's docs. In general, the SDK will handle the
|
|
decision of what style to use for you, but there are some cases where you may
|
|
want to set it yourself. For instance, if you have a CORS configured bucket
|
|
that is only a few hours old, you may need to use path style addressing for
|
|
generating pre-signed POSTs and URLs until the necessary DNS changes have time
|
|
to propagagte.
|
|
|
|
Note: if you set the addressing style to path style, you HAVE to set the correct
|
|
region.
|
|
|
|
The preferred way to set the addressing style is to use the ``addressing_style``
|
|
config parameter when you create your client or resource.::
|
|
|
|
import boto3
|
|
from botocore.client import Config
|
|
|
|
# Other valid options here are 'auto' (default) and 'virtual'
|
|
s3 = boto3.client('s3', 'us-west-2', config=Config(s3={'addressing_style': 'path'}))
|
|
|
|
|
|
Using the Transfer Manager
|
|
--------------------------
|
|
|
|
``boto3`` provides interfaces for managing various types of transfers with
|
|
S3. Functionality includes:
|
|
|
|
* Automatically managing multipart and non-multipart uploads
|
|
* Automatically managing multipart and non-multipart downloads
|
|
* Automatically managing multipart and non-multipart copies
|
|
* Uploading from:
|
|
|
|
* a file name
|
|
* a readable file-like object
|
|
|
|
* Downloading to:
|
|
|
|
* a file name
|
|
* a writeable file-like object
|
|
|
|
* Tracking progress of individual transfers
|
|
* Managing retries of transfers
|
|
* Configuring various transfer settings such as:
|
|
|
|
* Max request concurrency
|
|
* Multipart transfer thresholds
|
|
* Multipart transfer part sizes
|
|
* Number of download retry attempts
|
|
|
|
|
|
Uploads
|
|
~~~~~~~
|
|
The managed upload methods are exposed in both the client and resource
|
|
interfaces of ``boto3``:
|
|
|
|
* :py:class:`S3.Client` method to upload a file by name: :py:meth:`S3.Client.upload_file`
|
|
* :py:class:`S3.Client` method to upload a readable file-like object: :py:meth:`S3.Client.upload_fileobj`
|
|
* :py:class:`S3.Bucket` method to upload a file by name: :py:meth:`S3.Bucket.upload_file`
|
|
* :py:class:`S3.Bucket` method to upload a readable file-like object: :py:meth:`S3.Bucket.upload_fileobj`
|
|
* :py:class:`S3.Object` method to upload a file by name: :py:meth:`S3.Object.upload_file`
|
|
* :py:class:`S3.Object` method to upload a readable file-like object: :py:meth:`S3.Object.upload_fileobj`
|
|
|
|
.. note::
|
|
|
|
Even though there is an ``upload_file`` and ``upload_fileobj`` method for
|
|
a variety of classes, they all share the exact same functionality.
|
|
Other than for convenience, there are no benefits from using one method from
|
|
one class over using the same method for a different class.
|
|
|
|
|
|
To upload a file by name, use one of the ``upload_file`` methods::
|
|
|
|
import boto3
|
|
|
|
# Get the service client
|
|
s3 = boto3.client('s3')
|
|
|
|
# Upload tmp.txt to bucket-name at key-name
|
|
s3.upload_file("tmp.txt", "bucket-name", "key-name")
|
|
|
|
|
|
To upload a readable file-like object, use one of the ``upload_fileobj``
|
|
methods. Note that this file-like object **must** produce binary when read
|
|
from, **not** text::
|
|
|
|
import boto3
|
|
|
|
# Get the service client
|
|
s3 = boto3.client('s3')
|
|
|
|
# Upload a file-like object to bucket-name at key-name
|
|
with open("tmp.txt", "rb") as f:
|
|
s3.upload_fileobj(f, "bucket-name", "key-name")
|
|
|
|
|
|
To upload a file using any extra parameters such as user metadata, use the
|
|
``ExtraArgs`` parameter::
|
|
|
|
|
|
import boto3
|
|
|
|
# Get the service client
|
|
s3 = boto3.client('s3')
|
|
|
|
# Upload tmp.txt to bucket-name at key-name
|
|
s3.upload_file(
|
|
"tmp.txt", "bucket-name", "key-name",
|
|
ExtraArgs={"Metadata": {"mykey": "myvalue"}}
|
|
)
|
|
|
|
|
|
All valid ``ExtraArgs`` are listed at :py:attr:`boto3.s3.transfer.S3Transfer.ALLOWED_UPLOAD_ARGS`
|
|
|
|
To track the progess of a transfer, a progress callback can be provided such
|
|
that the callback gets invoked each time progress is made on the transfer::
|
|
|
|
import os
|
|
import sys
|
|
import threading
|
|
|
|
import boto3
|
|
|
|
class ProgressPercentage(object):
|
|
def __init__(self, filename):
|
|
self._filename = filename
|
|
self._size = float(os.path.getsize(filename))
|
|
self._seen_so_far = 0
|
|
self._lock = threading.Lock()
|
|
def __call__(self, bytes_amount):
|
|
# To simplify we'll assume this is hooked up
|
|
# to a single filename.
|
|
with self._lock:
|
|
self._seen_so_far += bytes_amount
|
|
percentage = (self._seen_so_far / self._size) * 100
|
|
sys.stdout.write(
|
|
"\r%s %s / %s (%.2f%%)" % (
|
|
self._filename, self._seen_so_far, self._size,
|
|
percentage))
|
|
sys.stdout.flush()
|
|
|
|
|
|
# Get the service client
|
|
s3 = boto3.client('s3')
|
|
|
|
# Upload tmp.txt to bucket-name at key-name
|
|
s3.upload_file(
|
|
"tmp.txt", "bucket-name", "key-name",
|
|
Callback=ProgressPercentage("tmp.txt"))
|
|
|
|
|
|
Downloads
|
|
~~~~~~~~~
|
|
The managed download methods are exposed in both the client and resource
|
|
interfaces of ``boto3``:
|
|
|
|
* :py:class:`S3.Client` method to download an object to a file by name: :py:meth:`S3.Client.download_file`
|
|
* :py:class:`S3.Client` method to download an object to a writeable file-like object: :py:meth:`S3.Client.download_fileobj`
|
|
* :py:class:`S3.Bucket` method to download an object to a file by name: :py:meth:`S3.Bucket.download_file`
|
|
* :py:class:`S3.Bucket` method to download an object to a writeable file-like object: :py:meth:`S3.Bucket.download_fileobj`
|
|
* :py:class:`S3.Object` method to download an object to a file by name: :py:meth:`S3.Object.download_file`
|
|
* :py:class:`S3.Object` method to download an object to a writeable file-like object: :py:meth:`S3.Object.download_fileobj`
|
|
|
|
.. note::
|
|
|
|
Even though there is a ``download_file`` and ``download_fileobj`` method for
|
|
a variety of classes, they all share the exact same functionality.
|
|
Other than for convenience, there are no benefits from using one method from
|
|
one class over using the same method for a different class.
|
|
|
|
|
|
To download to a file by name, use one of the ``download_file``
|
|
methods::
|
|
|
|
import boto3
|
|
|
|
# Get the service client
|
|
s3 = boto3.client('s3')
|
|
|
|
# Download object at bucket-name with key-name to tmp.txt
|
|
s3.download_file("bucket-name", "key-name", "tmp.txt")
|
|
|
|
|
|
To download to a writeable file-like object, use one of the
|
|
``download_fileobj`` methods. Note that this file-like object **must**
|
|
allow binary to be written to it, **not** just text::
|
|
|
|
import boto3
|
|
|
|
# Get the service client
|
|
s3 = boto3.client('s3')
|
|
|
|
# Download object at bucket-name with key-name to file-like object
|
|
with open("tmp.txt", "wb") as f:
|
|
s3.download_fileobj("bucket-name", "key-name", f)
|
|
|
|
|
|
To download using any extra parameters such as version ids, use the
|
|
``ExtraArgs`` parameter::
|
|
|
|
|
|
import boto3
|
|
|
|
# Get the service client
|
|
s3 = boto3.client('s3')
|
|
|
|
# Download object at bucket-name with key-name to tmp.txt
|
|
s3.download_file(
|
|
"bucket-name", "key-name", "tmp.txt",
|
|
ExtraArgs={"VersionId": "my-version-id"}
|
|
)
|
|
|
|
|
|
All valid ``ExtraArgs`` are listed at :py:attr:`boto3.s3.transfer.S3Transfer.ALLOWED_DOWNLOAD_ARGS`
|
|
|
|
To track the progess of a transfer, a progress callback can be provided such
|
|
that the callback gets invoked each time progress is made on the transfer::
|
|
|
|
import sys
|
|
import threading
|
|
|
|
import boto3
|
|
|
|
class ProgressPercentage(object):
|
|
def __init__(self, filename):
|
|
self._filename = filename
|
|
self._seen_so_far = 0
|
|
self._lock = threading.Lock()
|
|
def __call__(self, bytes_amount):
|
|
# To simplify we'll assume this is hooked up
|
|
# to a single filename.
|
|
with self._lock:
|
|
self._seen_so_far += bytes_amount
|
|
sys.stdout.write(
|
|
"\r%s --> %s bytes transferred" % (
|
|
self._filename, self._seen_so_far))
|
|
sys.stdout.flush()
|
|
|
|
# Get the service client
|
|
s3 = boto3.client('s3')
|
|
|
|
# Download object at bucket-name with key-name to tmp.txt
|
|
s3.download_file(
|
|
"bucket-name", "key-name", "tmp.txt",
|
|
Callback=ProgressPercentage("tmp.txt"))
|
|
|
|
|
|
Copies
|
|
~~~~~~
|
|
The managed copy methods are exposed in both the client and resource
|
|
interfaces of ``boto3``:
|
|
|
|
* :py:class:`S3.Client` method to copy an s3 object: :py:meth:`S3.Client.copy`
|
|
* :py:class:`S3.Bucket` method to copy an s3 object: :py:meth:`S3.Client.copy`
|
|
* :py:class:`S3.Object` method to copy an s3 object: :py:meth:`S3.Object.copy`
|
|
|
|
|
|
.. note::
|
|
|
|
Even though there is a ``copy`` method for a variety of classes,
|
|
they all share the exact same functionality.
|
|
Other than for convenience, there are no benefits from using one method from
|
|
one class over using the same method for a different class.
|
|
|
|
|
|
To do a managed copy, use one of the ``copy`` methods::
|
|
|
|
import boto3
|
|
|
|
# Get the service client
|
|
s3 = boto3.client('s3')
|
|
|
|
# Copies object located in mybucket at mykey
|
|
# to the location otherbucket at otherkey
|
|
copy_source = {
|
|
'Bucket': 'mybucket',
|
|
'Key': 'mykey'
|
|
}
|
|
s3.copy(copy_source, 'otherbucket', 'otherkey')
|
|
|
|
|
|
To do a managed copy where the region of the source bucket is different than
|
|
the region of the final bucket, provide a ``SourceClient`` that shares the
|
|
same region as the source bucket::
|
|
|
|
import boto3
|
|
|
|
# Get a service client for us-west-2 region
|
|
s3 = boto3.client('s3', 'us-west-2')
|
|
# Get a service client for the eu-central-1 region
|
|
source_client = boto3.client('s3', 'eu-central-1')
|
|
|
|
# Copies object located in mybucket at mykey in eu-central-1 region
|
|
# to the location otherbucket at otherkey in the us-west-2 region
|
|
copy_source = {
|
|
'Bucket': 'mybucket',
|
|
'Key': 'mykey'
|
|
}
|
|
s3.copy(copy_source, 'otherbucket', 'otherkey', SourceClient=source_client)
|
|
|
|
|
|
|
|
To copy using any extra parameters such as replacing user metadata on an
|
|
existing object, use the ``ExtraArgs`` parameter::
|
|
|
|
|
|
import boto3
|
|
|
|
# Get the service client
|
|
s3 = boto3.client('s3')
|
|
|
|
# Copies object located in mybucket at mykey
|
|
# to the location otherbucket at otherkey
|
|
copy_source = {
|
|
'Bucket': 'mybucket',
|
|
'Key': 'mykey'
|
|
}
|
|
s3.copy(
|
|
copy_source, 'bucket', 'mykey',
|
|
ExtraArgs={
|
|
"Metadata": {
|
|
"my-new-key": "my-new-value"
|
|
},
|
|
"MetadataDirective": "REPLACE"
|
|
}
|
|
)
|
|
|
|
|
|
To track the progess of a transfer, a progress callback can be provided such
|
|
that the callback gets invoked each time progress is made on the transfer::
|
|
|
|
import sys
|
|
import threading
|
|
|
|
import boto3
|
|
|
|
class ProgressPercentage(object):
|
|
def __init__(self, filename):
|
|
self._filename = filename
|
|
self._seen_so_far = 0
|
|
self._lock = threading.Lock()
|
|
def __call__(self, bytes_amount):
|
|
# To simplify we'll assume this is hooked up
|
|
# to a single filename.
|
|
with self._lock:
|
|
self._seen_so_far += bytes_amount
|
|
sys.stdout.write(
|
|
"\r%s --> %s bytes transferred" % (
|
|
self._filename, self._seen_so_far))
|
|
sys.stdout.flush()
|
|
|
|
# Get the service client
|
|
s3 = boto3.client('s3')
|
|
|
|
# Copies object located in mybucket at mykey
|
|
# to the location otherbucket at otherkey
|
|
copy_source = {
|
|
'Bucket': 'mybucket',
|
|
'Key': 'mykey'
|
|
}
|
|
s3.copy(copy_source, 'otherbucket', 'otherkey',
|
|
Callback=ProgressPercentage("otherbucket/otherkey"))
|
|
|
|
|
|
Note that the grainularity of these callbacks will be much larger than the
|
|
upload and download methods because copies are all done server side and so
|
|
there is no local file to track the streaming of data.
|
|
|
|
|
|
Configuration Settings
|
|
~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
To configure the various managed transfer methods, a
|
|
:py:class:`boto3.s3.transfer.TransferConfig` object can be provided to
|
|
the ``Config`` parameter. Please note that the default configuration should
|
|
be well-suited for most scenarios and a ``Config`` should only be provided
|
|
for specific use cases. Here are some common use cases for configuring the
|
|
managed s3 transfer methods:
|
|
|
|
To ensure that multipart uploads only happen when absolutely necessary, you
|
|
can use the ``multipart_threshold`` configuration parameter::
|
|
|
|
import boto3
|
|
from boto3.s3.transfer import TransferConfig
|
|
|
|
# Get the service client
|
|
s3 = boto3.client('s3')
|
|
|
|
GB = 1024 ** 3
|
|
# Ensure that multipart uploads only happen if the size of a transfer
|
|
# is larger than S3's size limit for nonmultipart uploads, which is 5 GB.
|
|
config = TransferConfig(multipart_threshold=5 * GB)
|
|
|
|
# Upload tmp.txt to bucket-name at key-name
|
|
s3.upload_file("tmp.txt", "bucket-name", "key-name", Config=config)
|
|
|
|
|
|
Sometimes depending on your connection speed, it is desired to limit or
|
|
increase potential bandwidth usage. Setting the ``max_concurrency`` can help
|
|
tune the potential bandwidth usage by decreasing or increasing the maximum
|
|
amount of concurrent S3 transfer-related API requests::
|
|
|
|
import boto3
|
|
from boto3.s3.transfer import TransferConfig
|
|
|
|
# Get the service client
|
|
s3 = boto3.client('s3')
|
|
|
|
# Decrease the max concurrency from 10 to 5 to potentially consume
|
|
# less downstream bandwidth.
|
|
config = TransferConfig(max_concurrency=5)
|
|
|
|
# Download object at bucket-name with key-name to tmp.txt with the
|
|
# set configuration
|
|
s3.download_file("bucket-name", "key-name", "tmp.txt", Config=config)
|
|
|
|
# Increase the max concurrency to 20 to potentially consume more
|
|
# downstream bandwidth.
|
|
config = TransferConfig(max_concurrency=20)
|
|
|
|
# Download object at bucket-name with key-name to tmp.txt with the
|
|
# set configuration
|
|
s3.download_file("bucket-name", "key-name", "tmp.txt", Config=config)
|
|
|
|
|
|
Generating Presigned URLs
|
|
-------------------------
|
|
|
|
Pre-signed URLs allow you to give your users access to a specific object in your
|
|
bucket without requiring them to have AWS security credentials or permissions.
|
|
To generate a pre-signed URL, use the
|
|
:py:meth:`S3.Client.generate_presigned_url` method::
|
|
|
|
import boto3
|
|
import requests
|
|
|
|
# Get the service client.
|
|
s3 = boto3.client('s3')
|
|
|
|
# Generate the URL to get 'key-name' from 'bucket-name'
|
|
url = s3.generate_presigned_url(
|
|
ClientMethod='get_object',
|
|
Params={
|
|
'Bucket': 'bucket-name',
|
|
'Key': 'key-name'
|
|
}
|
|
)
|
|
|
|
# Use the URL to perform the GET operation. You can use any method you like
|
|
# to send the GET, but we will use requests here to keep things simple.
|
|
response = requests.get(url)
|
|
|
|
If your bucket requires the use of signature version 4, you can elect to use it
|
|
to sign your URL. This does not fundamentally change how you use generator,
|
|
you only need to make sure that the client used has signature version 4
|
|
configured.::
|
|
|
|
import boto3
|
|
from botocore.client import Config
|
|
|
|
# Get the service client with sigv4 configured
|
|
s3 = boto3.client('s3', config=Config(signature_version='s3v4'))
|
|
|
|
# Generate the URL to get 'key-name' from 'bucket-name'
|
|
url = s3.generate_presigned_url(
|
|
ClientMethod='get_object',
|
|
Params={
|
|
'Bucket': 'bucket-name',
|
|
'Key': 'key-name'
|
|
}
|
|
)
|
|
|
|
Note: if your bucket is new and you require CORS, it is advised that
|
|
you use path style addressing (which is set by default in signature version 4).
|
|
|
|
|
|
Generating Presigned POSTs
|
|
--------------------------
|
|
|
|
Much like pre-signed URLs, pre-signed POSTs allow you to give write access to a
|
|
user without giving them AWS credentials. The information you need to make the
|
|
POST is returned by the :py:meth:`S3.Client.generate_presigned_post` method::
|
|
|
|
import boto3
|
|
import requests
|
|
|
|
# Get the service client
|
|
s3 = boto3.client('s3')
|
|
|
|
# Generate the POST attributes
|
|
post = s3.generate_presigned_post(
|
|
Bucket='bucket-name',
|
|
Key='key-name'
|
|
)
|
|
|
|
# Use the returned values to POST an object. Note that you need to use ALL
|
|
# of the returned fields in your post. You can use any method you like to
|
|
# send the POST, but we will use requests here to keep things simple.
|
|
files = {"file": "file_content"}
|
|
response = requests.post(post["url"], data=post["fields"], files=files)
|
|
|
|
When generating these POSTs, you may wish to auto fill certain fields or
|
|
constrain what your users submit. You can do this by providing those fields and
|
|
conditions when you generate the POST data.::
|
|
|
|
import boto3
|
|
|
|
# Get the service client
|
|
s3 = boto3.client('s3')
|
|
|
|
# Make sure everything posted is publicly readable
|
|
fields = {"acl": "public-read"}
|
|
|
|
# Ensure that the ACL isn't changed and restrict the user to a length
|
|
# between 10 and 100.
|
|
conditions = [
|
|
{"acl": "public-read"},
|
|
["content-length-range", 10, 100]
|
|
]
|
|
|
|
# Generate the POST attributes
|
|
post = s3.generate_presigned_post(
|
|
Bucket='bucket-name',
|
|
Key='key-name',
|
|
Fields=fields,
|
|
Conditions=conditions
|
|
)
|
|
|
|
Note: if your bucket is new and you require CORS, it is advised that
|
|
you use path style addressing (which is set by default in signature version 4).
|
|
|
|
.. _virtual host addressing: http://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html
|