How to fill and empty an s3 bucket with Python

24 April 2017

Say you have an s3 bucket on amazon web services. You may want to programmatically empty it. In order to empty a bucket it must have items in it. So I’ll demonstrate how to put and remove items from a bucket. We’ll use the excellent boto3 library.

How to fill a bucket

# s3_fill.py

import boto3
from s3_constants import S3_ACCESS_KEY, S3_SECRET_KEY, S3_BUCKET

# fill bucket with objects, assumes bucket exists already
def fill_s3_bucket(num_of_objects):
  client = boto3.client(
    's3',
    aws_access_key_id=S3_ACCESS_KEY,
    aws_secret_access_key=S3_SECRET_KEY,
  )
  for i in range(num_of_objects):
    print('creating file', '{:04d}'.format(i))
    client.put_object(
      Bucket=S3_BUCKET,
      Body=b'foo',
      Key='file_' + '{:04d}'.format(i))

fill_s3_bucket(3000)

Here we are using the Client object to put many small files in the s3 bucket. There’s a Bucket object but I didn’t find it very useful for this task. The script prints out it’s progress as I found writing to s3 fairly slow.

Note I used the str.format() method to zero-pad the path names. This gives the very nice effect of seeing the numbers increase monotonically. .format will basically replace any region between two curly braces with the supplied argument, formatting it as specified. So {:04d} will be replaced by i to 4 characters of length and padded with the 0 character.

I pulled the constants from a separate file as that’s closer to what I was doing in django.

How to empty a bucket

# s3_empty.py

import boto3
from s3_constants import S3_ACCESS_KEY, S3_SECRET_KEY, S3_BUCKET

# empty existing bucket
def empty_s3_bucket():
  client = boto3.client(
    's3',
    aws_access_key_id=S3_ACCESS_KEY,
    aws_secret_access_key=S3_SECRET_KEY,
  )
  response = client.list_objects_v2(Bucket=S3_BUCKET)
  if 'Contents' in response:
    for item in response['Contents']:
      print('deleting file', item['Key'])
      client.delete_object(Bucket=S3_BUCKET, Key=item['Key'])
      while response['KeyCount'] == 1000:
        response = client.list_objects_v2(
          Bucket=S3_BUCKET,
          StartAfter=response['Contents'][0]['Key'],
        )
        for item in response['Contents']:
          print('deleting file', item['Key'])
          client.delete_object(Bucket=S3_BUCKET, Key=item['Key'])

empty_s3_bucket()

Here I empty the bucket. The list_objects_v2 function is designed to paginate and will return at most 1000 object paths. So you’ll see we issue requests for more object paths if the number of keys in a response is 1000. The loop above will delete and paginate until all objects are deleted from the bucket.

It is also idempotent and will not throw if the bucket already empty.

The scripts are available in a gist.

If you need help solving your business problems with software read how to hire me.



comments powered by Disqus