Amazon S3 Lifecycle Management for Versioned Objects

Today I would like to tell you about a powerful new AWS feature that
bridges a pair of existing AWS services and makes another pair of
existing features far more useful! Let’s start with a quick review.

S3 & Versioned Objects
I’m sure that you already know about
Amazon S3. First launched in 2006,
S3 now processes over a million requests per second and stores trillions
of documents, images, backups, and other data, all with high availability
and eleven 9’s (i.e. 99.999999999%) durability. Since the initial launch,
we have added many features and locations, and have also reduced the

price
(conveniently measured in pennies per Gigabyte per month) of storage repeatedly.
One notable and popular S3 feature is

object versioning
. After you enable versioning for an S3 bucket, successive uploads or
PUTs of a particular object will create distinct, named, individually addressable
versions of the object in order to provide you with protection against
overwrites and deletes. You can preserve, retrieve, and restore every
version of every object in an S3 bucket that has versioning enabled.

You can retrieve previous versions of the
object in order to recover from a human or programmatic error.

Glacier & Lifecycle Rules
You have probably heard about
Amazon Glacier
as well. Glacier shares eleven 9’s of data durability with S3, but offers
a lower price
per Gigabyte / month in exchange for a retrieval time that is typically
between three and five hours. Glacier is ideal for long-term storage of
important data that you don’t need to access within seconds or minutes.

S3’s
Lifecycle Management
integrates S3 and Glacier and makes the details visible via the
Storage Class of each object. The data for objects with a Storage Class
of Standard or RRS (Reduced Redundancy
Storage) is stored in S3. If the Storage Class is
Glacier, then the data is stored in Glacier. Regardless
of the Storage Class, the objects are accessible through the S3 API and
other S3 tools. Lifecycle Management allows you to define time-based rules that can
trigger Transition (changing the Storage Class to Glacier)
and Expiration (deletion of objects). The Expiration rules give you the
ability to delete objects (or versions of objects) that are older than a particular
age. You can use these rules to ensure that the objects remain available in case
of an accidental or planned delete while limiting your storage costs by
deleting them after they are older than your preferred rollback window.


S3 & Glacier & Versioned Objects & Lifecycle Rules

With all of that out of the way, I am finally ready to share today’s news! You can
now create and apply Lifecycle rules to buckets that use versioned objects. This
seemingly simple change makes S3, Glacier, and versioned objects a lot more useful.
For example, you can arrange to keep the current version of an object in S3,
and to transition older versions to Glacier. You can get to the current version
(the one that you are most likely to need) immediately, with older versions
accessible within three to five hours. Depending on your use case, you might
want to transition all of the versions, including the current one, to Glacier. You might
also want to expire each version a few days after it was created (using a rule for the current version)
or overwritten/expired (using a rule based on the successor time for previous versions).

In other words, this new feature combines
the flexibility of S3 versioned objects with the extremely low cost of storage
in Glacier, helping you to reduce your overall storage costs.

You can create and apply Lifecycle rules to an S3 bucket to take advantage
of this new feature. You can do this through the
S3 API,
an
AWS SDK,
or from within the
AWS Management Console.


Lifecycle Management in the Console

Let’s set up a simple Lifecycle rule using the AWS Management Console.
I will create a fresh bucket to store some backups:

In this example, my backup app is very simple-minded and generates its output to the same file
every time. I’ll enable versioning for the bucket. This will allow me to
upload fresh backups without having to move or rename any files, while gaining
all of the advantages of versioning including protection against overwrites
and deletions. It will also
allow me to archive the previous versions of the file in Glacier. Here’s
how I enable versioning:

Now I need to set up the appropriate Transition and Expiration rules:

The console now includes a wizard to simplify this process! In the first step, I can choose to
create a rule that addresses all of the objects in the bucket, or a subset of
objects that share a common name prefix within the bucket.

After choosing the objects that are addressed by the rule, I now specify the transitions
and expirations for the current and previous versions of the object. Let’s
say that I want to transition the current version of each backup to Glacier after a week,
and the previous versions two days after they have been overwritten. Further, I would like
to permanently delete the previous versions 100 days after they are no longer current.
Here’s how I would set that up (you can also click on
See an Example to get an even better understanding of the Lifecycle rules):

The console confirms my intent and then creates and activates the rule:

Once the rules have been established, transitions and expirations will happen automatically. I
can see the current state of each version of an object from the console:

Important Note:
In order to see the versions of my backup file, I clicked the Show button.

Learning More
The example shown above is a good starting point, but things are somewhat complex
behind the scenes and you should plan to spend some time learning more about this feature
before you start using it. In fact, you may want to create a bucket just for testing
and use it to try out your proposed rules.

Here are some things to think about when you design your strategy for versioning,
transitions, and expirations:

  • Versioning Status – This value is maintained on a per-bucket basis. Each bucket can be unversioned
    (the default) or versioned, and you also have the option to suspend versioning. With versioning suspended, you will
    stop accruing new version of an object. Also, deleting an object when versioning is suspended creates a special Delete
    Marker with a NULL version, and makes this the current version.
  • Actions – The current and previous versions of an object each have
    transition and expiration actions, each of which have behavior that is dependent on the
    versioning status of the associated bucket. Based on your use case, you can choose to just transition,
    just expire, or transition and then expire.
  • Days and Dates
    Rules can specify a date or a number of days since the creation of an object. Rules created in the console must be
    day-based; you must use the API to create a rule that includes a date. The lifecycle rules for previous versions
    take effect from the time that a current version is retained as a previous one. You can control the time that a superseded version
    remains in S3 before it is transitioned to Glacier or expired.
  • Existing Rules
    The rules that you created prior to the introduction of rules for versioning will still apply and
    will behave as expected. If they reference specific dates, you will need to use the API to edit
    them (you can still view, disable, and delete them in the console). You can set up rules for
    previous versions before you actually enable versioning for a
    bucket. The rules will become applicable only after you do so.

Give it a Try
This new feature is available now and you can start using it today. Give it a spin and let
me know what you think.

Jeff;

Related posts