π¦ S3 & Storage Services#
Learning Objectives#
- Understand S3 storage classes, lifecycle policies, and versioning
- Implement S3 security (encryption, bucket policies, pre-signed URLs)
- Choose between S3, EBS, EFS, and S3 Glacier based on use case
- Design static website hosting and data lake architectures
1. Amazon S3 Overview#
Amazon Simple Storage Service (S3) is object storage built to store and retrieve any amount of data from anywhere. It’s 99.999999999% (11 9’s) durable.
S3 Fundamentals#
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β S3 (Global) β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Bucket (my-bucket) β β
β β β β
β β ββββββββββββββββββββββββββββββββββββββββββ β β
β β β Object: reports/2024/q1.pdf β β β
β β β - Key: reports/2024/q1.pdf β β β
β β β - Value: <binary data> β β β
β β β - Metadata: Content-Type, Size, ETag β β β
β β β - Version ID: abc123 β β β
β β ββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β β ββββββββββββββββββββββββββββββββββββββββββ β β
β β β Object: images/logo.png β β β
β β ββββββββββββββββββββββββββββββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββKey Concepts:
- Bucket: Container for objects (globally unique name)
- Object: File + metadata + version ID
- Key: Object’s unique identifier in a bucket (full path)
- Region: Buckets are created in a specific region
- ARN:
arn:aws:s3:::bucket-name/key
β‘ Exam Tip: S3 is a regional service β you create buckets in a specific region. But bucket names must be globally unique across all AWS accounts.
S3 Storage Classes#
| Storage Class | Durability | Availability | Min Storage | Retrieval | Use Case |
|---|---|---|---|---|---|
| S3 Standard | 11 9’s | 99.99% | None | Instant | Frequently accessed data |
| S3 Intelligent-Tiering | 11 9’s | 99.9% | 30 days | Instant | Unknown access patterns |
| S3 Standard-IA | 11 9’s | 99.9% | 30 days | Instant | Infrequent, quick access |
| S3 One Zone-IA | 11 9’s | 99.5% | 30 days | Instant | Recreatable data |
| S3 Glacier Instant Retrieval | 11 9’s | 99.9% | 90 days | ms | Archived data, instant access |
| S3 Glacier Flexible Retrieval | 11 9’s | 99.99% | 90 days | 1-5 min (expedited) 3-5 hrs (standard) 5-12 hrs (bulk) | Archived data, occasional access |
| S3 Glacier Deep Archive | 11 9’s | 99.99% | 180 days | 12 hrs (standard) 48 hrs (bulk) | Long-term archives, compliance |
Lifecycle Policy Example:
S3 Standard (30 days) β Standard-IA (60 days) β Glacier (365 days) β Deep Archive (delete after 7 years){"Rules": [ { "Id": "Archive old data", "Status": "Enabled", "Filter": {"Prefix": "logs/"},
"Transitions": [
{"Days": 30, "StorageClass": "STANDARD_IA"},
{"Days": 90, "StorageClass": "GLACIER"},
{"Days": 365, "StorageClass": "DEEP_ARCHIVE"}
],
"Expiration": {"Days": 2555}
}
]
}2. S3 Security#
2.1 Encryption Options#
ββββββββββββββββββββββββββββ
β S3 Encryption β
ββββββββββββββββββββββββββββ€
β Server-Side Encryption β
β ββββββββββββββββββββββ β
β β SSE-S3 (AES-256) β β AWS manages keys |
β ββββββββββββββββββββββ€ β
β β SSE-KMS β β KMS for audit/control|
β ββββββββββββββββββββββ€ β
β β SSE-C (Customer) β β You provide key |
β ββββββββββββββββββββββ β
β β
β Client-Side Encryption β
β ββββββββββββββββββββββ β
β β Encrypt before β β You manage keys |
β β uploading to S3 β β
β ββββββββββββββββββββββ β
ββββββββββββββββββββββββββββ| Option | Key Management | Audit Trail | Use Case |
|---|---|---|---|
| SSE-S3 | AWS manages | No | General encryption |
| SSE-KMS | AWS KMS | Yes (CloudTrail) | Compliance, key rotation |
| SSE-C | You manage | No | Regulatory requirements |
| Client-Side | You manage fully | No | End-to-end encryption |
2.2 Bucket Policies vs ACLs#
| Feature | Bucket Policy | ACL |
|---|---|---|
| Scope | Entire bucket or prefix | Individual objects |
| Cross-account | Yes | Yes (limited) |
| Conditions | Yes (IP, MFA, time) | No |
| Complexity | Simple JSON | Legacy |
| Recommendation | β Use this | β Avoid (legacy) |
2.3 Block Public Access Settings#
Four settings at bucket or account level:
1. Block public ACLs
2. Block public bucket policies β Most restrictive
3. Block public & cross-account access
4. Block public access through any policies β Completely blocks all public access2.4 Pre-Signed URLs#
Generate temporary URLs for time-limited access:
# Generate pre-signed URL valid for 1 hour
aws s3 presign s3://my-bucket/report.pdf --expires-in 3600import boto3
from datetime import timedelta
s3 = boto3.client('s3')
url = s3.generate_presigned_url(
'get_object',
Params={'Bucket': 'my-bucket', 'Key': 'report.pdf'},
ExpiresIn=3600 # 1 hour
)
print(url) # Temporary URL with credentials embeddedUse Case: Share private files with users who don’t have AWS credentials (e.g., premium content download)
3. S3 Features#
3.1 Versioning#
- Protect against accidental deletes
- Enable on bucket level (can’t be disabled, only suspended)
- Includes delete markers (not actual deletion)
# Enable versioning
aws s3api put-bucket-versioning --bucket my-bucket \
--versioning-configuration Status=Enabled
# List object versions
aws s3api list-object-versions --bucket my-bucket --prefix report.pdf3.2 Static Website Hosting#
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β S3 Bucket β β Route53 β β CloudFront β
β (website) βββββ>β DNS βββββ>β CDN + HTTPS β
β β β example.com β β (Optional) β
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ# Enable static website hosting
aws s3 website s3://my-website-bucket/ \
--index-document index.html --error-document error.html
# Block public access off, add bucket policy:
# {# "Effect": "Allow", # "Principal": "*", # "Action": "s3:GetObject", # "Resource": "arn:aws:s3:::my-website-bucket/*" # }3.3 S3 Object Lock#
WORM (Write Once Read Many) protection:
- Retention period: Fixed time (e.g., 7 years for compliance)
- Legal hold: Indefinite hold (no expiration)
- Modes: Governance (admins can override) vs Compliance (no one can override)
3.4 S3 Replication#
graph LR
subgraph Source["Source Bucket"]
SRC["my-bucket-source\nus-east-1"]
end
subgraph Dest["Destination Bucket"]
DST_SRR["my-bucket-srr\nus-east-1\nSame-Region"]
DST_CRR["my-bucket-dr\neu-west-1\nCross-Region"]
end
SRC -->|SRR: Log aggregation
Dev/Test sync| DST_SRR
SRC -->|CRR: Disaster recovery
Compliance / Latency| DST_CRR
style SRC fill:#ff9900,color:#fff
style DST_SRR fill:#01ab5c,color:#fff
style DST_CRR fill:#527fff,color:#fffReplication Rules (JSON):
{"Rules": [ { "Status": "Enabled", "Filter": {"Prefix": "production/"},
"Destination": {"Bucket": "arn:aws:s3:::my-bucket-dr", "StorageClass": "STANDARD_IA" },
"DeleteMarkerReplication": {"Status": "Disabled"}
}
]
}| Feature | Same-Region (SRR) | Cross-Region (CRR) |
|---|---|---|
| Use Case | Aggregate logs, prod/test sync | Disaster recovery, latency |
| Source/Dest | Same region | Different regions |
| Ownership | Same or different account | Same or different account |
| Requirements | Versioning enabled on both | Versioning enabled on both |
4. Storage Comparison: S3 vs EBS vs EFS#
| Feature | S3 (Object) | EBS (Block) | EFS (File) |
|---|---|---|---|
| Type | Object storage | Block storage | File storage (NFS) |
| Access | HTTP(S) API | Attached to one EC2 | Multiple EC2 instances |
| Use Case | Static files, backups, data lakes | Boot volumes, databases | Shared file systems |
| Performance | Up to 100 Gbps | Up to 256K IOPS | Up to 10+ GB/s |
| Max Size | 5 TB per object | 16 TB per volume | Unlimited (petabytes) |
| Persistence | 11 9’s durable | Replicated in AZ | Regional (Multi-AZ) |
| Pricing | Pay per GB stored | Pay per GB provisioned | Pay per GB used |
| Protocol | REST API | NVMe/SCSI | NFSv4 |
# EBS: Create and attach volume
aws ec2 create-volume --volume-type gp3 --size 100 --availability-zone us-east-1a
aws ec2 attach-volume --volume-id vol-123 --instance-id i-abc --device /dev/sdf
# EFS: Create file system
aws efs create-file-system --performance-mode generalPurpose
aws efs create-mount-target --file-system-id fs-123 --subnet-id subnet-abc
# S3: Upload object
aws s3 cp large-file.zip s3://my-bucket/backups/β‘ Exam Tip: Choose S3 for static content (images, videos, backups), EBS for databases/OS volumes, EFS for shared file systems across multiple EC2 instances.
5. Real-World Use Cases#
Use Case 1: Data Lake Architecture#
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
βOn-premiseββββ>β S3 (Raw) ββββ>βGlue ETL ββββ>β S3 (Clean)β
β Data β β Data Lakeβ β β β Data β
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
β
ββββββββ΄βββββββ
β β
ββββββββββ ββββββββββββ
β Athena β β Redshift β
β (SQL) β β Spectrum β
ββββββββββ ββββββββββββUse Case 2: Static Website with CloudFront + HTTPS#
User β Route53 β CloudFront (SSL/TLS) β S3 Bucket (Origin)
βββ Loads index.htmlUse Case 3: Backup Strategy with Lifecycle#
Daily Backups β S3 Standard (7 days) β S3 Glacier (1 year) β Deep Archive (7 years) β DeleteUse Case 4: VPC Endpoint for Private S3 Access#
Access S3 from EC2 without going through the internet:
ββββββββββββββββ Private Network ββββββββββββββββ
β EC2 (VPC) βββββββββββββββββββββββββ>β S3 Bucket β
β (No public β VPC Gateway Endpoint β (Private) β
β IP) β β β
ββββββββββββββββ ββββββββββββββββ6. β‘ Exam Tips#
- S3 durability: 99.999999999% (11 9’s)
- S3 consistency: Read-after-write for PUTS of new objects, eventual consistency for overwrites
- Multipart upload: For objects > 100 MB, required for > 5 GB
- S3 Transfer Acceleration: Fast uploads over long distances using Edge Locations
- S3 Select: Retrieve only subset of data from an object (SQL-like)
- S3 Inventory: Audit object storage and encryption status
- Requester Pays: Bucket owner pays for storage, requester pays for transfer
- Batch Operations: Perform actions on billions of objects at scale
β Chapter Quiz#
-
Which S3 storage class provides the lowest cost for long-term archival data that rarely needs access?
- A) S3 Standard
- B) S3 Glacier Flexible Retrieval
- C) S3 Glacier Deep Archive
- D) S3 One Zone-IA
-
What is the minimum retention period for objects stored in S3 Glacier Deep Archive?
- A) 30 days
- B) 90 days
- C) 180 days
- D) 365 days
-
Which S3 encryption option provides an audit trail of when keys were used?
- A) SSE-S3
- B) SSE-KMS
- C) SSE-C
- D) Client-Side
-
You need a shared file system accessible by hundreds of EC2 instances simultaneously. Which service?
- A) S3
- B) EBS
- C) EFS
- D) Instance Store
-
S3 Transfer Acceleration uses which AWS infrastructure to speed up uploads?
- A) Direct Connect
- B) VPN
- C) Edge Locations
- D) Availability Zones
-
A company wants to host a static website on S3 with a custom domain and HTTPS. Which combination of services should be used?
- A) S3 + CloudFront + Route53
- B) S3 + ALB + Route53
- C) S3 + EC2 + ELB
- D) S3 + Global Accelerator
-
A company needs to grant cross-account access to an S3 bucket. Which policy should be modified?
- A) IAM policy in the source account
- B) S3 bucket policy in the destination account
- C) S3 bucket policy in the source account
- D) IAM policy in the destination account
-
Which S3 feature can be used to automatically transition objects between storage tiers based on age?
- A) S3 Replication
- B) S3 Lifecycle policies
- C) S3 Batch Operations
- D) S3 Object Lock
-
A company uses S3 to store 10 TB of data that is accessed twice per month. After 30 days, access decreases to once per year. Which S3 storage class transition strategy is MOST cost-effective?
- A) Standard β Standard-IA β Glacier Deep Archive
- B) Standard β One Zone-IA β Glacier
- C) Standard β Glacier β Glacier Deep Archive
- D) Standard β Intelligent-Tiering
-
An application writes thousands of small objects to S3 every second. Users report high latency and occasional 503 SlowDown errors. What should be done to improve performance?
- A) Add a prefix to S3 object keys to distribute across partitions
- B) Use S3 Transfer Acceleration
- C) Increase the S3 bucket limit
- D) Enable S3 Versioning
-
A company wants to ensure that all objects uploaded to an S3 bucket are automatically encrypted. Which approach meets this requirement with minimal operational overhead?
- A) Use S3 bucket policies to deny PutObject requests without encryption headers
- B) Enable default encryption on the S3 bucket
- C) Require all applications to enable encryption in their PutObject calls
- D) Enable S3 Object Lock on the bucket
-
A solutions architect needs to share an S3 object with an external user who does not have AWS credentials. The object must only be accessible for a limited time. Which approach should be used?
- A) Change the bucket policy to allow public access to the object
- B) Generate a pre-signed URL with a defined expiration period
- C) Make the entire bucket publicly readable
- D) Copy the object to a public S3 bucket
-
An application needs to upload 6 GB video files to an S3 bucket. Which S3 feature is REQUIRED for objects larger than 5 GB?
- A) S3 Transfer Acceleration
- B) Multipart upload
- C) S3 Object Lock
- D) S3 Versioning
-
A company needs to replicate S3 objects from a bucket in us-east-1 to a bucket in eu-west-1 for disaster recovery. Which replication type should be configured?
- A) Same-Region Replication (SRR)
- B) Cross-Region Replication (CRR)
- C) Cross-Region Replication with S3 Batch Operations
- D) S3 Replication Time Control (RTC)
-
Which S3 storage class is the MOST cost-effective for data that is accessed infrequently but requires millisecond retrieval when requested?
- A) S3 Standard
- B) S3 Intelligent-Tiering
- C) S3 Standard-IA
- D) S3 Glacier Deep Archive
-
A financial services company needs to store records for 7 years with WORM (Write Once Read Many) protection. The data must be immutable and cannot be modified or deleted by anyone, including the root user. Which S3 feature should be used?
- A) S3 Versioning with MFA Delete
- B) S3 Object Lock in Compliance mode
- C) S3 Lifecycle policies with expiration
- D) S3 Replication with delete marker replication disabled
-
Which of the following are valid S3 server-side encryption options? (Select TWO)
- A) SSE-S3
- B) SSE-KMS
- C) SSE-CloudHSM
- D) SSE-ACM
- E) SSE-RDS
-
An application running on EC2 in a VPC must access an S3 bucket without traversing the public internet. How can this be achieved?
- A) Attach an Internet Gateway to the VPC
- B) Create a VPC Gateway Endpoint for S3
- C) Place a NAT Gateway in the public subnet
- D) Use AWS Direct Connect
-
What is the minimum storage duration for objects transitioned to S3 Glacier Deep Archive?
- A) 30 days
- B) 90 days
- C) 180 days
- D) 365 days
-
A company wants to host a static website with a custom domain and HTTPS. Which combination of services provides the MOST secure and scalable solution?
- A) S3 + Route 53 + CloudFront with SSL/TLS
- B) S3 + Application Load Balancer + EC2
- C) S3 + Route 53 + Direct Connect
- D) S3 + API Gateway + Lambda
-
A security team needs to be notified if any S3 bucket in the account is made publicly accessible. How can this be achieved?
- A) Enable S3 server access logs on all buckets
- B) Enable S3 Block Public Access at the account level
- C) Use AWS Config managed rules to detect public access
- D) Enable S3 Transfer Acceleration
-
A company stores sensitive data in S3 and requires an audit trail of KMS key usage when objects are decrypted. Which S3 encryption option should be selected?
- A) SSE-S3
- B) SSE-KMS
- C) SSE-C
- D) Client-side encryption
-
A data analyst needs to query a subset of data from a 2 GB CSV file stored in S3 without downloading the entire object. Which S3 feature supports this?
- A) S3 Select
- B) S3 Batch Operations
- C) S3 Inventory
- D) S3 Multipart Upload
-
What happens when an object is deleted in a version-enabled S3 bucket?
- A) The object and all its versions are permanently deleted
- B) A delete marker is placed, and previous versions remain accessible
- C) The object is moved to the S3 Glacier storage class
- D) The object is quarantined for 30 days
-
A company wants to automatically move objects between S3 storage classes based on changing access patterns without manual intervention. Which S3 feature should be used?
- A) S3 Intelligent-Tiering
- B) S3 Lifecycle policies
- C) S3 Replication
- D) S3 Batch Operations
π Answer Key
- C β Glacier Deep Archive is the cheapest at $0.001/GB/month.
- C β 180 days minimum retention for Deep Archive.
- B β SSE-KMS provides CloudTrail audit trail of key usage.
- C β EFS (Elastic File System) supports NFS shared across many EC2 instances.
- C β Edge Locations accelerate uploads by routing through AWS network.
- A β S3 + CloudFront + Route53 provides custom domain, HTTPS, and low-latency static website hosting.
- C β The S3 bucket policy in the source (resource-owning) account grants access to the destination account.
- B β S3 Lifecycle policies automate transitions between storage classes based on object age.
- A β Standard (30d) β Standard-IA (90d) β Glacier Deep Archive matches decreasing access frequency.
- A β Adding a prefix to S3 keys distributes write operations across multiple partitions, improving throughput.
- B β Default encryption on the bucket automatically encrypts all new objects without application changes.
- B β Pre-signed URLs grant temporary access to specific objects without making them publicly accessible.
- B β Multipart upload is required for objects larger than 5 GB and is recommended for objects over 100 MB.
- B β CRR (Cross-Region Replication) copies objects across different AWS regions for DR or compliance.
- C β S3 Standard-IA provides millisecond retrieval for infrequently accessed data at lower cost than Standard.
- B β S3 Object Lock in Compliance mode prevents any user, including root, from deleting or overwriting objects.
- A, B β SSE-S3 (AWS-managed keys) and SSE-KMS (customer-managed KMS keys) are the two main server-side options.
- B β A VPC Gateway Endpoint for S3 provides private connectivity without internet traffic or NAT Gateway costs.
- C β S3 Glacier Deep Archive has a minimum storage duration of 180 days.
- A β S3 + CloudFront + Route 53 provides secure HTTPS, low latency, and global scalability for static websites.
- C β AWS Config managed rules (e.g., s3-bucket-public-read-prohibited) detect publicly accessible buckets.
- B β SSE-KMS integrates with CloudTrail to provide an audit trail of when KMS keys were used for decryption.
- A β S3 Select retrieves a subset of data from an object using SQL expressions without downloading the full object.
- B β In a version-enabled bucket, deleting an object adds a delete marker; previous versions remain accessible.
- A β S3 Intelligent-Tiering automatically moves data between access tiers based on changing access patterns.
π Additional Resources#
Next β EC2 & Compute