S3 To Glacier

Backups rarely get the attention they deserve—until a bill spikes or a retention audit arrives. Moving archival data from Amazon S3 to Glacier is a common way to cut costs for infrequently accessed objects, but the mechanics are more nuanced than AWS marketing suggests. Here’s a closer look at how to actually execute S3-to-Glacier transitions with minimal friction and maximum predictability.

Problem:
A typical scenario—terabytes of log data or analytics rollups accumulate in S3, costing $23/TB/month (Standard). Most of it ages out of daily use within 30 days. Sticking this in Glacier reduces storage cost by ~80%, but only if object transitions and restores are handled correctly.

Transitioning S3 Data to Glacier

There are two patterns in production:

Lifecycle rules (preferred):
Configure S3 Lifecycle to migrate objects to Glacier (or Glacier Deep Archive) after a policy-defined period.
Manual transitions:
Use the AWS CLI or SDK to explicitly move objects to Glacier. Rare in steady-state environments; mostly for bulk migrations or one-off corrections.

Sample Lifecycle Rule (XML):

<LifecycleConfiguration>
  <Rule>
    <ID>ArchiveToGlacierAfter30Days</ID>
    <Filter><Prefix>logs/</Prefix></Filter>
    <Status>Enabled</Status>
    <Transition>
      <Days>30</Days>
      <StorageClass>GLACIER</StorageClass>
    </Transition>
  </Rule>
</LifecycleConfiguration>

Alternatively, JSON via CloudFormation:

{
  "Rules": [
    {
      "ID": "ArchiveToGlacierAfter30Days",
      "Prefix": "logs/",
      "Status": "Enabled",
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "GLACIER"
        }
      ]
    }
  ]
}

Note: Don’t set small objects (<128KB) to move to Glacier; AWS docs: “S3 Lifecycle transitions to the GLACIER storage class are not supported for objects smaller than 128 KB.”

Restoring Data from Glacier

Unlike changing storage class, restores from Glacier are asynchronous. Initiate a restore, then wait for the temporary Standard copy.

Example AWS CLI command:

aws s3api restore-object --bucket my-archive-bucket \
  --key logs/access-log-2022-05-01.gz \
  --restore-request Days=3,GlacierJobParameters={Tier=Standard}

"Days=3": Time the object remains in Standard after restore (temporary).
"Tier=Standard": Also supports Bulk (cheaper, slower) and Expedited (faster, $$$).

Restore status:
Monitor using:

aws s3api head-object --bucket my-archive-bucket --key logs/access-log-2022-05-01.gz

Look for the Restore field, e.g.:

"Restore": "ongoing-request=\"false\", expiry-date=\"Fri, 14 Jun 2024 00:00:00 GMT\""

Gotcha: Overlapping restore requests are silently ignored—wait before retrying.

Cost and Performance Traps

Storage Class	Typical Cost ($/TB/Month)	Restore Time (hrs)
S3 Standard	23	Immediate
S3 Glacier	4	1-12 (Standard)
S3 Glacier Deep	1	12+

Small files: Many tiny objects will cost more to index/restore than to store.
Versioned buckets: Lifecycle applies per-version. Stale delete markers accumulate.
Billing: Lifecycle transitions count as PUT/COPY operations; visible in your bill.

Practical Example

A backup pipeline pushes compressed database dumps:

aws s3 cp dump-2024-06-01.sql.gz s3://prod-backup-bucket/db-dumps/

A daily lifecycle rule moves all db-dumps/ files older than 30 days to Glacier. Restores (for audits) use the above s3api restore-object command, with most requests completing in 4 hours.

Non-obvious tip

Set a notification on the bucket that triggers on object restore completion (s3:ObjectRestore:Completed). Automate downstream pipelines to pick up data only after confirmatory notification—polling is unreliable for high-volume workflows.

Known issue:
Occasionally, lifecycle moves are delayed by bucket-level performance constraints. AWS provides little transparency. Expect rare day-scale lags in buckets with billions of objects.

Summary: Lifecycle-based S3 to Glacier is robust, but edge cases around object size, versioning, and restore concurrency can bite. Always automate monitoring for restore status and cost anomalies. If full certainty is required, consider triggering Glacier transitions from your backup/retention pipeline directly, not via S3 lifecycle.