2015-11-27 23:25:33 +01:00
|
|
|
.. _guide_collections:
|
|
|
|
|
|
|
|
Collections
|
|
|
|
===========
|
|
|
|
|
|
|
|
Overview
|
|
|
|
--------
|
|
|
|
A collection provides an iterable interface to a group of resources.
|
|
|
|
Collections behave similarly to
|
|
|
|
`Django QuerySets <https://docs.djangoproject.com/en/1.7/ref/models/querysets/>`_
|
|
|
|
and expose a similar API. A collection seamlessly handles pagination for
|
|
|
|
you, making it possible to easily iterate over all items from all pages of
|
|
|
|
data. Example of a collection::
|
|
|
|
|
|
|
|
# SQS list all queues
|
|
|
|
sqs = boto3.resource('sqs')
|
|
|
|
for queue in sqs.queues.all():
|
|
|
|
print(queue.url)
|
|
|
|
|
2021-09-22 18:34:33 +02:00
|
|
|
When collections make requests
|
2015-11-27 23:25:33 +01:00
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Collections can be created and manipulated without any request being made
|
|
|
|
to the underlying service. A collection makes a remote service request under
|
|
|
|
the following conditions:
|
|
|
|
|
|
|
|
* **Iteration**::
|
|
|
|
|
|
|
|
for bucket in s3.buckets.all():
|
|
|
|
print(bucket.name)
|
|
|
|
|
|
|
|
* **Conversion to list()**::
|
|
|
|
|
|
|
|
buckets = list(s3.buckets.all())
|
|
|
|
|
|
|
|
* **Batch actions (see below)**::
|
|
|
|
|
|
|
|
s3.Bucket('my-bucket').objects.delete()
|
|
|
|
|
|
|
|
Filtering
|
|
|
|
---------
|
|
|
|
Some collections support extra arguments to filter the returned data set,
|
|
|
|
which are passed into the underlying service operation. Use the
|
|
|
|
:py:meth:`~boto3.resources.collection.Collection.filter` method to filter
|
|
|
|
the results::
|
|
|
|
|
2016-11-09 01:23:44 +01:00
|
|
|
# S3 list all keys with the prefix 'photos/'
|
2015-11-27 23:25:33 +01:00
|
|
|
s3 = boto3.resource('s3')
|
|
|
|
for bucket in s3.buckets.all():
|
2016-11-09 01:23:44 +01:00
|
|
|
for obj in bucket.objects.filter(Prefix='photos/'):
|
2015-11-27 23:25:33 +01:00
|
|
|
print('{0}:{1}'.format(bucket.name, obj.key))
|
|
|
|
|
|
|
|
.. warning::
|
|
|
|
|
|
|
|
Behind the scenes, the above example will call ``ListBuckets``,
|
|
|
|
``ListObjects``, and ``HeadObject`` many times. If you have a large
|
|
|
|
number of S3 objects then this could incur a significant cost.
|
|
|
|
|
|
|
|
Chainability
|
|
|
|
------------
|
|
|
|
Collection methods are chainable. They return copies of the collection
|
|
|
|
rather than modifying the collection, including a deep copy of any
|
|
|
|
associated operation parameters. For example, this allows you
|
|
|
|
to build up multiple collections from a base which they all have
|
|
|
|
in common::
|
|
|
|
|
|
|
|
# EC2 find instances
|
|
|
|
ec2 = boto3.resource('ec2')
|
|
|
|
base = ec2.instances.filter(InstanceIds=['id1', 'id2', 'id3'])
|
|
|
|
|
|
|
|
filters = [{
|
2020-05-20 21:33:54 +02:00
|
|
|
'Name': 'tenancy',
|
|
|
|
'Values': ['dedicated']
|
2015-11-27 23:25:33 +01:00
|
|
|
}]
|
|
|
|
filtered1 = base.filter(Filters=filters)
|
|
|
|
|
|
|
|
# Note, this does NOT modify the filters in ``filtered1``!
|
|
|
|
filters.append({'name': 'instance-type', 'value': 't1.micro'})
|
|
|
|
filtered2 = base.filter(Filters=filters)
|
|
|
|
|
|
|
|
print('All instances:')
|
|
|
|
for instance in base:
|
|
|
|
print(instance.id)
|
|
|
|
|
|
|
|
print('Dedicated instances:')
|
|
|
|
for instance in filtered1:
|
|
|
|
print(instance.id)
|
|
|
|
|
|
|
|
print('Dedicated micro instances:')
|
|
|
|
for instance in filtered2:
|
|
|
|
print(instance.id)
|
|
|
|
|
2021-09-22 18:34:33 +02:00
|
|
|
Limiting results
|
2015-11-27 23:25:33 +01:00
|
|
|
----------------
|
|
|
|
It is possible to limit the number of items returned from a collection
|
|
|
|
by using either the
|
|
|
|
:py:meth:`~boto3.resources.collection.ResourceCollection.limit` method::
|
|
|
|
|
|
|
|
# S3 iterate over first ten buckets
|
|
|
|
for bucket in s3.buckets.limit(10):
|
|
|
|
print(bucket.name)
|
|
|
|
|
|
|
|
In both cases, up to 10 items total will be returned. If you do not
|
|
|
|
have 10 buckets, then all of your buckets will be returned.
|
|
|
|
|
2021-09-22 18:34:33 +02:00
|
|
|
Controlling page size
|
2015-11-27 23:25:33 +01:00
|
|
|
---------------------
|
|
|
|
Collections automatically handle paging through results, but you may want
|
|
|
|
to control the number of items returned from a single service operation
|
|
|
|
call. You can do so using the
|
|
|
|
:py:meth:`~boto3.resources.collection.ResourceCollection.page_size` method::
|
|
|
|
|
|
|
|
# S3 iterate over all objects 100 at a time
|
|
|
|
for obj in bucket.objects.page_size(100):
|
|
|
|
print(obj.key)
|
|
|
|
|
|
|
|
|
|
|
|
By default, S3 will return 1000 objects at a time, so the above code
|
|
|
|
would let you process the items in smaller batches, which could be
|
|
|
|
beneficial for slow or unreliable internet connections.
|
|
|
|
|
2021-09-22 18:34:33 +02:00
|
|
|
Batch actions
|
2015-11-27 23:25:33 +01:00
|
|
|
-------------
|
|
|
|
Some collections support batch actions, which are actions that operate
|
|
|
|
on an entire page of results at a time. They will automatically handle
|
|
|
|
pagination::
|
|
|
|
|
|
|
|
# S3 delete everything in `my-bucket`
|
|
|
|
s3 = boto3.resource('s3')
|
2018-07-11 07:39:36 +02:00
|
|
|
s3.Bucket('my-bucket').objects.delete()
|
2015-11-27 23:25:33 +01:00
|
|
|
|
|
|
|
.. danger::
|
|
|
|
|
|
|
|
The above example will **completely erase all data** in the ``my-bucket``
|
|
|
|
bucket! Please be careful with batch actions.
|