Aws Lambda Write To S3

Aws Lambda Write To S3

Reading time1 min
#Cloud#AWS#Serverless#AWSLambda#AmazonS3#LambdaS3

Writing Data from AWS Lambda to S3: Techniques, Errors, and Best Practices

Efficient data workflows in AWS often depend on pushing output from event-driven computations straight to S3. Whether exporting user reports, archiving CSVs, or storing raw logs, Lambda-to-S3 is a common integration point. But what’s actually required for robust implementation?


Minimum Requirements

IAM Permissions
A Lambda function can’t access S3 by default. Assign an execution role with access scoped as tightly as possible. At minimum, for basic write:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:PutObject"],
      "Resource": ["arn:aws:s3:::my-lambda-output-bucket/*"]
    }
  ]
}

Replace my-lambda-output-bucket to match your actual S3 resource. Avoid wildcards unless unavoidable.

S3 Bucket
Bucket must exist before deployment. Name requirements: globally unique, lowercase, 3-63 characters. For example: app-log-exports-prod-us-east-1.


Lambda Implementation Example (Node.js 18.x)

The following Lambda writes an object to S3. Note the use of Date.now() for uniqueness. Move the S3 client instantiation outside the handler to avoid cold-start penalties in real workloads.

// Node.js 18.x runtime
const { S3 } = require('aws-sdk');

const s3 = new S3();
const BUCKET = process.env.OUTPUT_BUCKET || 'my-lambda-output-bucket';

exports.handler = async (event) => {
    // Defensive: fallback to UUID if required for filename collisions.
    const key = `lambda-output-${Date.now()}.txt`;
    const content = `Log snapshot at ${new Date().toISOString()}`;

    const params = {
        Bucket: BUCKET,
        Key: key,
        Body: content,
        ContentType: 'text/plain'
    };

    try {
        await s3.putObject(params).promise();
        return { statusCode: 200, body: `Wrote to S3: ${key}` };
    } catch (err) {
        console.error('S3 PUT error:', err);
        return { statusCode: 500, body: 'S3 write failed' };
    }
};

Tip:
Store the bucket name in an environment variable. Keeps code portable across dev/staging/prod.


Deployment and Testing

  1. Create the Lambda function (Node.js 18.x preferred for LTS).
  2. Assign your IAM role with s3:PutObject.
  3. Set the environment variable OUTPUT_BUCKET to your S3 bucket name.
  4. Paste the function code.
  5. Deploy.

Test in console with any sample event. Check the S3 bucket for the new .txt object.

Common Misconfiguration:
If you see AccessDenied in CloudWatch logs:

Error: AccessDenied: Access Denied

your role's policy is insufficient or mis-targeted (wrong bucket/resource). Double-check ARNs and role assignment.


Design Considerations & Trade-offs

  • Latency: S3 is eventually consistent. Written object may not be instantly available for subsequent GET.
  • Payload Size: Direct Lambda-to-S3 writes cap at Lambda’s memory (approx. 6 GiB), but performance drops for >10 MB objects.
  • Retries: Lambda will retry on certain errors, but not all. For idempotency, use unique object keys to avoid overwrites.
  • Metadata: Add descriptive metadata (e.g., user IDs, content hashes) in S3 object tags or metadata fields for downstream traceability.
  • Concurrency: No inherent write limits per bucket, but S3 may throttle on >3,500 PUTs/sec with non-random key prefixes.

Gotchas

  • Assigning an execution role that references arn:aws:s3:::bucket/*—but forgetting S3 “Block Public Access” settings—results in silent write failures in some runtime regions.
  • S3 PUT doesn’t atomically protect you from overwriting keys if parallel Lambdas race. Choose a sufficiently unique prefix or add a UUID.
  • Lambda’s ephemeral /tmp (512 MB, no persistent storage) often comes up when buffering large payloads; for moderate workloads, skip it.

Best Practices

ConcernRecommendation
IAM ScopeRestrict policy to the target bucket/prefix only
Key NamingUse timestamps and/or UUIDs for uniqueness
Error VisibilityOutput error details to CloudWatch (console.error)
Bulk OperationsConsider AWS S3 Batch or Athena for later analytics
Sensitive DataEnable SSE-S3 or SSE-KMS encryption at rest

Non-obvious tip:
For high-throughput or batched writes, avoid frequent short invocations. Aggregate data in memory during the Lambda’s run, then upload a single object at the end.


Related Patterns

  • For multi-format outputs, detect content type dynamically:
    ContentType: require('mime-types').lookup(filename) || 'application/octet-stream'
    
  • Consider triggering additional Lambdas from S3 PUT events for post-processing pipelines.
  • Use CloudTrail or S3 access logs if regulatory compliance or audits are a concern.

Summary

S3 remains the default choice for scalable AWS storage, and Lambda integration is mostly straightforward—but production readiness demands careful IAM scope, error management, and planning for throughput. The real friction comes from operational details: key naming, permissions, concurrency bottlenecks, and debugging subtle failures. Reviewing recent CloudWatch logs and S3 server access logs can be invaluable when troubleshooting subtle bugs.

Got a case that doesn’t fit this simple pattern? Multipart upload or Kinesis Firehose may be more appropriate for large/batched data.


Reference: