python-boto3/docs/source/guide/s3.rst
2016-11-08 18:23:44 -06:00

547 lines
18 KiB
ReStructuredText

.. _s3_guide:
S3
==
By following this guide, you will learn how to use features of S3 client that
are unique to the SDK, specifically the generation and use of pre-signed URLs,
pre-signed POSTs, and the use of the transfer manager. You will also learn how
to use a few common, but important, settings specific to S3.
Changing the Addressing Style
-----------------------------
S3 supports two different ways to address a bucket, Virtual Host Style and Path
Style. This guide won't cover all the details of `virtual host addressing`_, but
you can read up on that in S3's docs. In general, the SDK will handle the
decision of what style to use for you, but there are some cases where you may
want to set it yourself. For instance, if you have a CORS configured bucket
that is only a few hours old, you may need to use path style addressing for
generating pre-signed POSTs and URLs until the necessary DNS changes have time
to propagagte.
Note: if you set the addressing style to path style, you HAVE to set the correct
region.
The preferred way to set the addressing style is to use the ``addressing_style``
config parameter when you create your client or resource.::
import boto3
from botocore.client import Config
# Other valid options here are 'auto' (default) and 'virtual'
s3 = boto3.client('s3', 'us-west-2', config=Config(s3={'addressing_style': 'path'}))
Using the Transfer Manager
--------------------------
``boto3`` provides interfaces for managing various types of transfers with
S3. Functionality includes:
* Automatically managing multipart and non-multipart uploads
* Automatically managing multipart and non-multipart downloads
* Automatically managing multipart and non-multipart copies
* Uploading from:
* a file name
* a readable file-like object
* Downloading to:
* a file name
* a writeable file-like object
* Tracking progress of individual transfers
* Managing retries of transfers
* Configuring various transfer settings such as:
* Max request concurrency
* Multipart transfer thresholds
* Multipart transfer part sizes
* Number of download retry attempts
Uploads
~~~~~~~
The managed upload methods are exposed in both the client and resource
interfaces of ``boto3``:
* :py:class:`S3.Client` method to upload a file by name: :py:meth:`S3.Client.upload_file`
* :py:class:`S3.Client` method to upload a readable file-like object: :py:meth:`S3.Client.upload_fileobj`
* :py:class:`S3.Bucket` method to upload a file by name: :py:meth:`S3.Bucket.upload_file`
* :py:class:`S3.Bucket` method to upload a readable file-like object: :py:meth:`S3.Bucket.upload_fileobj`
* :py:class:`S3.Object` method to upload a file by name: :py:meth:`S3.Object.upload_file`
* :py:class:`S3.Object` method to upload a readable file-like object: :py:meth:`S3.Object.upload_fileobj`
.. note::
Even though there is an ``upload_file`` and ``upload_fileobj`` method for
a variety of classes, they all share the exact same functionality.
Other than for convenience, there are no benefits from using one method from
one class over using the same method for a different class.
To upload a file by name, use one of the ``upload_file`` methods::
import boto3
# Get the service client
s3 = boto3.client('s3')
# Upload tmp.txt to bucket-name at key-name
s3.upload_file("tmp.txt", "bucket-name", "key-name")
To upload a readable file-like object, use one of the ``upload_fileobj``
methods. Note that this file-like object **must** produce binary when read
from, **not** text::
import boto3
# Get the service client
s3 = boto3.client('s3')
# Upload a file-like object to bucket-name at key-name
with open("tmp.txt", "rb") as f:
s3.upload_fileobj(f, "bucket-name", "key-name")
To upload a file using any extra parameters such as user metadata, use the
``ExtraArgs`` parameter::
import boto3
# Get the service client
s3 = boto3.client('s3')
# Upload tmp.txt to bucket-name at key-name
s3.upload_file(
"tmp.txt", "bucket-name", "key-name",
ExtraArgs={"Metadata": {"mykey": "myvalue"}}
)
All valid ``ExtraArgs`` are listed at :py:attr:`boto3.s3.transfer.S3Transfer.ALLOWED_UPLOAD_ARGS`
To track the progess of a transfer, a progress callback can be provided such
that the callback gets invoked each time progress is made on the transfer::
import os
import sys
import threading
import boto3
class ProgressPercentage(object):
def __init__(self, filename):
self._filename = filename
self._size = float(os.path.getsize(filename))
self._seen_so_far = 0
self._lock = threading.Lock()
def __call__(self, bytes_amount):
# To simplify we'll assume this is hooked up
# to a single filename.
with self._lock:
self._seen_so_far += bytes_amount
percentage = (self._seen_so_far / self._size) * 100
sys.stdout.write(
"\r%s %s / %s (%.2f%%)" % (
self._filename, self._seen_so_far, self._size,
percentage))
sys.stdout.flush()
# Get the service client
s3 = boto3.client('s3')
# Upload tmp.txt to bucket-name at key-name
s3.upload_file(
"tmp.txt", "bucket-name", "key-name",
Callback=ProgressPercentage("tmp.txt"))
Downloads
~~~~~~~~~
The managed download methods are exposed in both the client and resource
interfaces of ``boto3``:
* :py:class:`S3.Client` method to download an object to a file by name: :py:meth:`S3.Client.download_file`
* :py:class:`S3.Client` method to download an object to a writeable file-like object: :py:meth:`S3.Client.download_fileobj`
* :py:class:`S3.Bucket` method to download an object to a file by name: :py:meth:`S3.Bucket.download_file`
* :py:class:`S3.Bucket` method to download an object to a writeable file-like object: :py:meth:`S3.Bucket.download_fileobj`
* :py:class:`S3.Object` method to download an object to a file by name: :py:meth:`S3.Object.download_file`
* :py:class:`S3.Object` method to download an object to a writeable file-like object: :py:meth:`S3.Object.download_fileobj`
.. note::
Even though there is a ``download_file`` and ``download_fileobj`` method for
a variety of classes, they all share the exact same functionality.
Other than for convenience, there are no benefits from using one method from
one class over using the same method for a different class.
To download to a file by name, use one of the ``download_file``
methods::
import boto3
# Get the service client
s3 = boto3.client('s3')
# Download object at bucket-name with key-name to tmp.txt
s3.download_file("bucket-name", "key-name", "tmp.txt")
To download to a writeable file-like object, use one of the
``download_fileobj`` methods. Note that this file-like object **must**
allow binary to be written to it, **not** just text::
import boto3
# Get the service client
s3 = boto3.client('s3')
# Download object at bucket-name with key-name to file-like object
with open("tmp.txt", "wb") as f:
s3.download_fileobj("bucket-name", "key-name", f)
To download using any extra parameters such as version ids, use the
``ExtraArgs`` parameter::
import boto3
# Get the service client
s3 = boto3.client('s3')
# Download object at bucket-name with key-name to tmp.txt
s3.download_file(
"bucket-name", "key-name", "tmp.txt",
ExtraArgs={"VersionId": "my-version-id"}
)
All valid ``ExtraArgs`` are listed at :py:attr:`boto3.s3.transfer.S3Transfer.ALLOWED_DOWNLOAD_ARGS`
To track the progess of a transfer, a progress callback can be provided such
that the callback gets invoked each time progress is made on the transfer::
import sys
import threading
import boto3
class ProgressPercentage(object):
def __init__(self, filename):
self._filename = filename
self._seen_so_far = 0
self._lock = threading.Lock()
def __call__(self, bytes_amount):
# To simplify we'll assume this is hooked up
# to a single filename.
with self._lock:
self._seen_so_far += bytes_amount
sys.stdout.write(
"\r%s --> %s bytes transferred" % (
self._filename, self._seen_so_far))
sys.stdout.flush()
# Get the service client
s3 = boto3.client('s3')
# Download object at bucket-name with key-name to tmp.txt
s3.download_file(
"bucket-name", "key-name", "tmp.txt",
Callback=ProgressPercentage("tmp.txt"))
Copies
~~~~~~
The managed copy methods are exposed in both the client and resource
interfaces of ``boto3``:
* :py:class:`S3.Client` method to copy an s3 object: :py:meth:`S3.Client.copy`
* :py:class:`S3.Bucket` method to copy an s3 object: :py:meth:`S3.Client.copy`
* :py:class:`S3.Object` method to copy an s3 object: :py:meth:`S3.Object.copy`
.. note::
Even though there is a ``copy`` method for a variety of classes,
they all share the exact same functionality.
Other than for convenience, there are no benefits from using one method from
one class over using the same method for a different class.
To do a managed copy, use one of the ``copy`` methods::
import boto3
# Get the service client
s3 = boto3.client('s3')
# Copies object located in mybucket at mykey
# to the location otherbucket at otherkey
copy_source = {
'Bucket': 'mybucket',
'Key': 'mykey'
}
s3.copy(copy_source, 'otherbucket', 'otherkey')
To do a managed copy where the region of the source bucket is different than
the region of the final bucket, provide a ``SourceClient`` that shares the
same region as the source bucket::
import boto3
# Get a service client for us-west-2 region
s3 = boto3.client('s3', 'us-west-2')
# Get a service client for the eu-central-1 region
source_client = boto3.client('s3', 'eu-central-1')
# Copies object located in mybucket at mykey in eu-central-1 region
# to the location otherbucket at otherkey in the us-west-2 region
copy_source = {
'Bucket': 'mybucket',
'Key': 'mykey'
}
s3.copy(copy_source, 'otherbucket', 'otherkey', SourceClient=source_client)
To copy using any extra parameters such as replacing user metadata on an
existing object, use the ``ExtraArgs`` parameter::
import boto3
# Get the service client
s3 = boto3.client('s3')
# Copies object located in mybucket at mykey
# to the location otherbucket at otherkey
copy_source = {
'Bucket': 'mybucket',
'Key': 'mykey'
}
s3.copy(
copy_source, 'bucket', 'mykey',
ExtraArgs={
"Metadata": {
"my-new-key": "my-new-value"
},
"MetadataDirective": "REPLACE"
}
)
To track the progess of a transfer, a progress callback can be provided such
that the callback gets invoked each time progress is made on the transfer::
import sys
import threading
import boto3
class ProgressPercentage(object):
def __init__(self, filename):
self._filename = filename
self._seen_so_far = 0
self._lock = threading.Lock()
def __call__(self, bytes_amount):
# To simplify we'll assume this is hooked up
# to a single filename.
with self._lock:
self._seen_so_far += bytes_amount
sys.stdout.write(
"\r%s --> %s bytes transferred" % (
self._filename, self._seen_so_far))
sys.stdout.flush()
# Get the service client
s3 = boto3.client('s3')
# Copies object located in mybucket at mykey
# to the location otherbucket at otherkey
copy_source = {
'Bucket': 'mybucket',
'Key': 'mykey'
}
s3.copy(copy_source, 'otherbucket', 'otherkey',
Callback=ProgressPercentage("otherbucket/otherkey"))
Note that the grainularity of these callbacks will be much larger than the
upload and download methods because copies are all done server side and so
there is no local file to track the streaming of data.
Configuration Settings
~~~~~~~~~~~~~~~~~~~~~~
To configure the various managed transfer methods, a
:py:class:`boto3.s3.transfer.TransferConfig` object can be provided to
the ``Config`` parameter. Please note that the default configuration should
be well-suited for most scenarios and a ``Config`` should only be provided
for specific use cases. Here are some common use cases for configuring the
managed s3 transfer methods:
To ensure that multipart uploads only happen when absolutely necessary, you
can use the ``multipart_threshold`` configuration parameter::
import boto3
from boto3.s3.transfer import TransferConfig
# Get the service client
s3 = boto3.client('s3')
GB = 1024 ** 3
# Ensure that multipart uploads only happen if the size of a transfer
# is larger than S3's size limit for nonmultipart uploads, which is 5 GB.
config = TransferConfig(multipart_threshold=5 * GB)
# Upload tmp.txt to bucket-name at key-name
s3.upload_file("tmp.txt", "bucket-name", "key-name", Config=config)
Sometimes depending on your connection speed, it is desired to limit or
increase potential bandwidth usage. Setting the ``max_concurrency`` can help
tune the potential bandwidth usage by decreasing or increasing the maximum
amount of concurrent S3 transfer-related API requests::
import boto3
from boto3.s3.transfer import TransferConfig
# Get the service client
s3 = boto3.client('s3')
# Decrease the max concurrency from 10 to 5 to potentially consume
# less downstream bandwidth.
config = TransferConfig(max_concurrency=5)
# Download object at bucket-name with key-name to tmp.txt with the
# set configuration
s3.download_file("bucket-name", "key-name", "tmp.txt", Config=config)
# Increase the max concurrency to 20 to potentially consume more
# downstream bandwidth.
config = TransferConfig(max_concurrency=20)
# Download object at bucket-name with key-name to tmp.txt with the
# set configuration
s3.download_file("bucket-name", "key-name", "tmp.txt", Config=config)
Generating Presigned URLs
-------------------------
Pre-signed URLs allow you to give your users access to a specific object in your
bucket without requiring them to have AWS security credentials or permissions.
To generate a pre-signed URL, use the
:py:meth:`S3.Client.generate_presigned_url` method::
import boto3
import requests
# Get the service client.
s3 = boto3.client('s3')
# Generate the URL to get 'key-name' from 'bucket-name'
url = s3.generate_presigned_url(
ClientMethod='get_object',
Params={
'Bucket': 'bucket-name',
'Key': 'key-name'
}
)
# Use the URL to perform the GET operation. You can use any method you like
# to send the GET, but we will use requests here to keep things simple.
response = requests.get(url)
If your bucket requires the use of signature version 4, you can elect to use it
to sign your URL. This does not fundamentally change how you use generator,
you only need to make sure that the client used has signature version 4
configured.::
import boto3
from botocore.client import Config
# Get the service client with sigv4 configured
s3 = boto3.client('s3', config=Config(signature_version='s3v4'))
# Generate the URL to get 'key-name' from 'bucket-name'
url = s3.generate_presigned_url(
ClientMethod='get_object',
Params={
'Bucket': 'bucket-name',
'Key': 'key-name'
}
)
Note: if your bucket is new and you require CORS, it is advised that
you use path style addressing (which is set by default in signature version 4).
Generating Presigned POSTs
--------------------------
Much like pre-signed URLs, pre-signed POSTs allow you to give write access to a
user without giving them AWS credentials. The information you need to make the
POST is returned by the :py:meth:`S3.Client.generate_presigned_post` method::
import boto3
import requests
# Get the service client
s3 = boto3.client('s3')
# Generate the POST attributes
post = s3.generate_presigned_post(
Bucket='bucket-name',
Key='key-name'
)
# Use the returned values to POST an object. Note that you need to use ALL
# of the returned fields in your post. You can use any method you like to
# send the POST, but we will use requests here to keep things simple.
files = {"file": "file_content"}
response = requests.post(post["url"], data=post["fields"], files=files)
When generating these POSTs, you may wish to auto fill certain fields or
constrain what your users submit. You can do this by providing those fields and
conditions when you generate the POST data.::
import boto3
# Get the service client
s3 = boto3.client('s3')
# Make sure everything posted is publicly readable
fields = {"acl": "public-read"}
# Ensure that the ACL isn't changed and restrict the user to a length
# between 10 and 100.
conditions = [
{"acl": "public-read"},
["content-length-range", 10, 100]
]
# Generate the POST attributes
post = s3.generate_presigned_post(
Bucket='bucket-name',
Key='key-name',
Fields=fields,
Conditions=conditions
)
Note: if your bucket is new and you require CORS, it is advised that
you use path style addressing (which is set by default in signature version 4).
.. _virtual host addressing: http://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html