.. _s3_guide: S3 == By following this guide, you will learn how to use features of S3 client that are unique to the SDK, specifically the generation and use of pre-signed URLs, pre-signed POSTs, and the use of the transfer manager. You will also learn how to use a few common, but important, settings specific to S3. Changing the Addressing Style ----------------------------- S3 supports two different ways to address a bucket, Virtual Host Style and Path Style. This guide won't cover all the details of `virtual host addressing`_, but you can read up on that in S3's docs. In general, the SDK will handle the decision of what style to use for you, but there are some cases where you may want to set it yourself. For instance, if you have a CORS configured bucket that is only a few hours old, you may need to use path style addressing for generating pre-signed POSTs and URLs until the necessary DNS changes have time to propagate. Note: if you set the addressing style to path style, you HAVE to set the correct region. The preferred way to set the addressing style is to use the ``addressing_style`` config parameter when you create your client or resource.:: import boto3 from botocore.client import Config # Other valid options here are 'auto' (default) and 'virtual' s3 = boto3.client('s3', 'us-west-2', config=Config(s3={'addressing_style': 'path'})) Using the Transfer Manager -------------------------- ``boto3`` provides interfaces for managing various types of transfers with S3. Functionality includes: * Automatically managing multipart and non-multipart uploads * Automatically managing multipart and non-multipart downloads * Automatically managing multipart and non-multipart copies * Uploading from: * a file name * a readable file-like object * Downloading to: * a file name * a writeable file-like object * Tracking progress of individual transfers * Managing retries of transfers * Configuring various transfer settings such as: * Max request concurrency * Multipart transfer thresholds * Multipart transfer part sizes * Number of download retry attempts * Enabling/disabling the use of threads Uploads ~~~~~~~ The managed upload methods are exposed in both the client and resource interfaces of ``boto3``: * :py:class:`S3.Client` method to upload a file by name: :py:meth:`S3.Client.upload_file` * :py:class:`S3.Client` method to upload a readable file-like object: :py:meth:`S3.Client.upload_fileobj` * :py:class:`S3.Bucket` method to upload a file by name: :py:meth:`S3.Bucket.upload_file` * :py:class:`S3.Bucket` method to upload a readable file-like object: :py:meth:`S3.Bucket.upload_fileobj` * :py:class:`S3.Object` method to upload a file by name: :py:meth:`S3.Object.upload_file` * :py:class:`S3.Object` method to upload a readable file-like object: :py:meth:`S3.Object.upload_fileobj` .. note:: Even though there is an ``upload_file`` and ``upload_fileobj`` method for a variety of classes, they all share the exact same functionality. Other than for convenience, there are no benefits from using one method from one class over using the same method for a different class. To upload a file by name, use one of the ``upload_file`` methods:: import boto3 # Get the service client s3 = boto3.client('s3') # Upload tmp.txt to bucket-name at key-name s3.upload_file("tmp.txt", "bucket-name", "key-name") To upload a readable file-like object, use one of the ``upload_fileobj`` methods. Note that this file-like object **must** produce binary when read from, **not** text:: import boto3 # Get the service client s3 = boto3.client('s3') # Upload a file-like object to bucket-name at key-name with open("tmp.txt", "rb") as f: s3.upload_fileobj(f, "bucket-name", "key-name") When uploading, ``ExtraArgs`` can be used to specify a variety of additional parameters. For example, to supply user metadata:: s3.upload_file( "tmp.txt", "bucket-name", "key-name", ExtraArgs={"Metadata": {"mykey": "myvalue"}} ) To set a canned ACL:: s3.upload_file( 'tmp.txt', 'bucket-name', 'key-name', ExtraArgs={'ACL': 'public-read'} ) To set custom or multiple ACLs:: s3.upload_file( 'tmp.txt', 'bucket-name', 'key-name', ExtraArgs={ 'GrantRead': 'uri="http://acs.amazonaws.com/groups/global/AllUsers"', 'GrantFullControl': 'id="79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be"', } ) All valid ``ExtraArgs`` are listed at :py:attr:`boto3.s3.transfer.S3Transfer.ALLOWED_UPLOAD_ARGS` To track the progress of a transfer, a progress callback can be provided such that the callback gets invoked each time progress is made on the transfer:: import os import sys import threading import boto3 class ProgressPercentage(object): def __init__(self, filename): self._filename = filename self._size = float(os.path.getsize(filename)) self._seen_so_far = 0 self._lock = threading.Lock() def __call__(self, bytes_amount): # To simplify we'll assume this is hooked up # to a single filename. with self._lock: self._seen_so_far += bytes_amount percentage = (self._seen_so_far / self._size) * 100 sys.stdout.write( "\r%s %s / %s (%.2f%%)" % ( self._filename, self._seen_so_far, self._size, percentage)) sys.stdout.flush() # Get the service client s3 = boto3.client('s3') # Upload tmp.txt to bucket-name at key-name s3.upload_file( "tmp.txt", "bucket-name", "key-name", Callback=ProgressPercentage("tmp.txt")) Downloads ~~~~~~~~~ The managed download methods are exposed in both the client and resource interfaces of ``boto3``: * :py:class:`S3.Client` method to download an object to a file by name: :py:meth:`S3.Client.download_file` * :py:class:`S3.Client` method to download an object to a writeable file-like object: :py:meth:`S3.Client.download_fileobj` * :py:class:`S3.Bucket` method to download an object to a file by name: :py:meth:`S3.Bucket.download_file` * :py:class:`S3.Bucket` method to download an object to a writeable file-like object: :py:meth:`S3.Bucket.download_fileobj` * :py:class:`S3.Object` method to download an object to a file by name: :py:meth:`S3.Object.download_file` * :py:class:`S3.Object` method to download an object to a writeable file-like object: :py:meth:`S3.Object.download_fileobj` .. note:: Even though there is a ``download_file`` and ``download_fileobj`` method for a variety of classes, they all share the exact same functionality. Other than for convenience, there are no benefits from using one method from one class over using the same method for a different class. To download to a file by name, use one of the ``download_file`` methods:: import boto3 # Get the service client s3 = boto3.client('s3') # Download object at bucket-name with key-name to tmp.txt s3.download_file("bucket-name", "key-name", "tmp.txt") To download to a writeable file-like object, use one of the ``download_fileobj`` methods. Note that this file-like object **must** allow binary to be written to it, **not** just text:: import boto3 # Get the service client s3 = boto3.client('s3') # Download object at bucket-name with key-name to file-like object with open("tmp.txt", "wb") as f: s3.download_fileobj("bucket-name", "key-name", f) To download using any extra parameters such as version ids, use the ``ExtraArgs`` parameter:: import boto3 # Get the service client s3 = boto3.client('s3') # Download object at bucket-name with key-name to tmp.txt s3.download_file( "bucket-name", "key-name", "tmp.txt", ExtraArgs={"VersionId": "my-version-id"} ) All valid ``ExtraArgs`` are listed at :py:attr:`boto3.s3.transfer.S3Transfer.ALLOWED_DOWNLOAD_ARGS` To track the progress of a transfer, a progress callback can be provided such that the callback gets invoked each time progress is made on the transfer:: import sys import threading import boto3 class ProgressPercentage(object): def __init__(self, filename): self._filename = filename self._seen_so_far = 0 self._lock = threading.Lock() def __call__(self, bytes_amount): # To simplify we'll assume this is hooked up # to a single filename. with self._lock: self._seen_so_far += bytes_amount sys.stdout.write( "\r%s --> %s bytes transferred" % ( self._filename, self._seen_so_far)) sys.stdout.flush() # Get the service client s3 = boto3.client('s3') # Download object at bucket-name with key-name to tmp.txt s3.download_file( "bucket-name", "key-name", "tmp.txt", Callback=ProgressPercentage("tmp.txt")) Copies ~~~~~~ The managed copy methods are exposed in both the client and resource interfaces of ``boto3``: * :py:class:`S3.Client` method to copy an s3 object: :py:meth:`S3.Client.copy` * :py:class:`S3.Bucket` method to copy an s3 object: :py:meth:`S3.Client.copy` * :py:class:`S3.Object` method to copy an s3 object: :py:meth:`S3.Object.copy` .. note:: Even though there is a ``copy`` method for a variety of classes, they all share the exact same functionality. Other than for convenience, there are no benefits from using one method from one class over using the same method for a different class. To do a managed copy, use one of the ``copy`` methods:: import boto3 # Get the service client s3 = boto3.client('s3') # Copies object located in mybucket at mykey # to the location otherbucket at otherkey copy_source = { 'Bucket': 'mybucket', 'Key': 'mykey' } s3.copy(copy_source, 'otherbucket', 'otherkey') To do a managed copy where the region of the source bucket is different than the region of the final bucket, provide a ``SourceClient`` that shares the same region as the source bucket:: import boto3 # Get a service client for us-west-2 region s3 = boto3.client('s3', 'us-west-2') # Get a service client for the eu-central-1 region source_client = boto3.client('s3', 'eu-central-1') # Copies object located in mybucket at mykey in eu-central-1 region # to the location otherbucket at otherkey in the us-west-2 region copy_source = { 'Bucket': 'mybucket', 'Key': 'mykey' } s3.copy(copy_source, 'otherbucket', 'otherkey', SourceClient=source_client) To copy using any extra parameters such as replacing user metadata on an existing object, use the ``ExtraArgs`` parameter:: import boto3 # Get the service client s3 = boto3.client('s3') # Copies object located in mybucket at mykey # to the location otherbucket at otherkey copy_source = { 'Bucket': 'mybucket', 'Key': 'mykey' } s3.copy( copy_source, 'bucket', 'mykey', ExtraArgs={ "Metadata": { "my-new-key": "my-new-value" }, "MetadataDirective": "REPLACE" } ) To track the progress of a transfer, a progress callback can be provided such that the callback gets invoked each time progress is made on the transfer:: import sys import threading import boto3 class ProgressPercentage(object): def __init__(self, filename): self._filename = filename self._seen_so_far = 0 self._lock = threading.Lock() def __call__(self, bytes_amount): # To simplify we'll assume this is hooked up # to a single filename. with self._lock: self._seen_so_far += bytes_amount sys.stdout.write( "\r%s --> %s bytes transferred" % ( self._filename, self._seen_so_far)) sys.stdout.flush() # Get the service client s3 = boto3.client('s3') # Copies object located in mybucket at mykey # to the location otherbucket at otherkey copy_source = { 'Bucket': 'mybucket', 'Key': 'mykey' } s3.copy(copy_source, 'otherbucket', 'otherkey', Callback=ProgressPercentage("otherbucket/otherkey")) Note that the granularity of these callbacks will be much larger than the upload and download methods because copies are all done server side and so there is no local file to track the streaming of data. Configuration Settings ~~~~~~~~~~~~~~~~~~~~~~ To configure the various managed transfer methods, a :py:class:`boto3.s3.transfer.TransferConfig` object can be provided to the ``Config`` parameter. Please note that the default configuration should be well-suited for most scenarios and a ``Config`` should only be provided for specific use cases. Here are some common use cases for configuring the managed s3 transfer methods: To ensure that multipart uploads only happen when absolutely necessary, you can use the ``multipart_threshold`` configuration parameter:: import boto3 from boto3.s3.transfer import TransferConfig # Get the service client s3 = boto3.client('s3') GB = 1024 ** 3 # Ensure that multipart uploads only happen if the size of a transfer # is larger than S3's size limit for nonmultipart uploads, which is 5 GB. config = TransferConfig(multipart_threshold=5 * GB) # Upload tmp.txt to bucket-name at key-name s3.upload_file("tmp.txt", "bucket-name", "key-name", Config=config) Sometimes depending on your connection speed, it is desired to limit or increase potential bandwidth usage. Setting the ``max_concurrency`` can help tune the potential bandwidth usage by decreasing or increasing the maximum amount of concurrent S3 transfer-related API requests:: import boto3 from boto3.s3.transfer import TransferConfig # Get the service client s3 = boto3.client('s3') # Decrease the max concurrency from 10 to 5 to potentially consume # less downstream bandwidth. config = TransferConfig(max_concurrency=5) # Download object at bucket-name with key-name to tmp.txt with the # set configuration s3.download_file("bucket-name", "key-name", "tmp.txt", Config=config) # Increase the max concurrency to 20 to potentially consume more # downstream bandwidth. config = TransferConfig(max_concurrency=20) # Download object at bucket-name with key-name to tmp.txt with the # set configuration s3.download_file("bucket-name", "key-name", "tmp.txt", Config=config) Threads are used by default in the managed transfer methods. To ensure no threads are used in the transfer process, set ``use_threads`` to ``False``. Note that in setting ``use_threads`` to ``False``, the value for ``max_concurrency`` is ignored as the main thread will only ever be used:: import boto3 from boto3.s3.transfer import TransferConfig # Get the service client s3 = boto3.client('s3') # Ensure that no threads are used. config = TransferConfig(use_threads=False) # Download object at bucket-name with key-name to tmp.txt with the # set configuration s3.download_file("bucket-name", "key-name", "tmp.txt", Config=config) Generating Presigned URLs ------------------------- Pre-signed URLs allow you to give your users access to a specific object in your bucket without requiring them to have AWS security credentials or permissions. To generate a pre-signed URL, use the :py:meth:`S3.Client.generate_presigned_url` method:: import boto3 import requests # Get the service client. s3 = boto3.client('s3') # Generate the URL to get 'key-name' from 'bucket-name' url = s3.generate_presigned_url( ClientMethod='get_object', Params={ 'Bucket': 'bucket-name', 'Key': 'key-name' } ) # Use the URL to perform the GET operation. You can use any method you like # to send the GET, but we will use requests here to keep things simple. response = requests.get(url) If your bucket requires the use of signature version 4, you can elect to use it to sign your URL. This does not fundamentally change how you use generator, you only need to make sure that the client used has signature version 4 configured.:: import boto3 from botocore.client import Config # Get the service client with sigv4 configured s3 = boto3.client('s3', config=Config(signature_version='s3v4')) # Generate the URL to get 'key-name' from 'bucket-name' url = s3.generate_presigned_url( ClientMethod='get_object', Params={ 'Bucket': 'bucket-name', 'Key': 'key-name' } ) Note: if your bucket is new and you require CORS, it is advised that you use path style addressing (which is set by default in signature version 4). Generating Presigned POSTs -------------------------- Much like pre-signed URLs, pre-signed POSTs allow you to give write access to a user without giving them AWS credentials. The information you need to make the POST is returned by the :py:meth:`S3.Client.generate_presigned_post` method:: import boto3 import requests # Get the service client s3 = boto3.client('s3') # Generate the POST attributes post = s3.generate_presigned_post( Bucket='bucket-name', Key='key-name' ) # Use the returned values to POST an object. Note that you need to use ALL # of the returned fields in your post. You can use any method you like to # send the POST, but we will use requests here to keep things simple. files = {"file": "file_content"} response = requests.post(post["url"], data=post["fields"], files=files) When generating these POSTs, you may wish to auto fill certain fields or constrain what your users submit. You can do this by providing those fields and conditions when you generate the POST data.:: import boto3 # Get the service client s3 = boto3.client('s3') # Make sure everything posted is publicly readable fields = {"acl": "public-read"} # Ensure that the ACL isn't changed and restrict the user to a length # between 10 and 100. conditions = [ {"acl": "public-read"}, ["content-length-range", 10, 100] ] # Generate the POST attributes post = s3.generate_presigned_post( Bucket='bucket-name', Key='key-name', Fields=fields, Conditions=conditions ) Note: if your bucket is new and you require CORS, it is advised that you use path style addressing (which is set by default in signature version 4). .. _virtual host addressing: http://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html