🖥️ EC2 & Compute Services#

Learning Objectives#

  • Understand EC2 instance types, purchasing options, and placement groups
  • Configure EC2 with user data, security groups, and IAM roles
  • Implement Auto Scaling and load balancing
  • Choose between EC2, Lightsail, and Bare Metal

1. Amazon EC2 Overview#

Amazon Elastic Compute Cloud (EC2) provides virtual servers in the cloud. You can provision and scale compute capacity within minutes.

EC2 Architecture#

┌─────────────────────────────────────────────────────┐
│                    VPC / Subnet                      │
│                                                      │
│  ┌──────────────────────────────────────────────┐   │
│  │           EC2 Instance (i-abc123)             │   │
│  │                                               │   │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐    │   │
│  │  │  vCPU    │  │  Memory  │  │  EBS     │    │   │
│  │  │  (2-128) │  │ (1-384GB)│  │  Volume  │    │   │
│  │  └──────────┘  └──────────┘  └────┬─────┘    │   │
│  │                                    │          │   │
│  │  ┌──────────┐  ┌──────────┐       │          │   │
│  │  │  ENI     │  │ Instance │       │          │   │
│  │  │ (Network)│  │  Store   │       │          │   │
│  │  └──────────┘  └──────────┘       │          │   │
│  └────────────────────────────────────┼──────────┘   │
│                                       │              │
│  ┌────────────────────────────────────┴──────────┐   │
│  │           Security Group (Firewall)            │   │
│  └───────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────┘

1.1 EC2 Instance Types#

AWS provides instance families optimized for different workloads:

Family Name Use Case Example
General Purpose T3, T4g, M6i, M7g Web servers, small DBs t3.micro (Free tier)
Compute Optimized C6i, C7g Batch processing, gaming c6i.large
Memory Optimized R6i, X2iedn In-memory caches, large DBs r6i.large
Storage Optimized I3, D2 Data warehousing, logs i3.large
Accelerated P4, G4ad, F1 ML training, video transcoding p4d.24xlarge

Instance Naming Convention:

m5.large
│ │    │
│ │    └── Instance size (small, medium, large, xlarge, etc.)
│ └── Generation (5th gen)
└── Instance family (General purpose)

Exam Tip: Know which instance family for which workload: T/M = general, C = compute, R/X = memory, I/D = storage, P/G/F = GPU/FPGA.

1.2 EC2 Purchasing Options#

Option Pricing Commitment Use Case
On-Demand Highest ($/hr) None Short-term, spiky workloads
Reserved (RI) Up to 72% off 1 or 3 years Steady-state production
Savings Plans Up to 72% off 1 or 3 years ($/hr commitment) Flexible across instance families
Spot Up to 90% off None (can be terminated) Fault-tolerant, batch jobs
Dedicated Host Physical server 1 or 3 years Licensing, compliance
Dedicated Instance Single-tenant HW On-demand or RI Isolation requirements

Spot Instance Lifecycle:

Request Spot → Active → Spot Instance Interruption Notice (2 min)
         Provisioned → Running → Terminated (when spot price > bid)
# Request a Spot Instance
aws ec2 request-spot-instances \
  --spot-price "0.05" \
  --instance-count 2 \
  --launch-specification file://spot-config.json

# Check spot price history
aws ec2 describe-spot-price-history \
  --instance-types m5.large \
  --product-description "Linux/UNIX" \
  --availability-zone us-east-1a

Exam Tip: Spot Instances are NOT suitable for stateful workloads, databases, or anything that can’t handle interruption. Use them for batch processing, CI/CD workers, and stateless web servers.

1.3 Placement Groups#

Type Strategy Use Case
Cluster Low latency, high throughput (same rack) HPC, big data analytics
Spread Isolated hardware (max 7 instances per AZ) Critical applications
Partition Isolated racks (per partition) Cassandra, Kafka, Hadoop
graph LR
    subgraph Cluster["Cluster Placement Group"]
        C1["EC2 App #1"] --- C2["EC2 App #2"]
        C2 --- C3["EC2 App #3"]
    end
    
    subgraph Spread["Spread Placement Group"]
        S1["EC2 (Rack A)"] -.- S2["EC2 (Rack B)"]
        S2 -.- S3["EC2 (Rack C)"]
    end
    
    subgraph Partition["Partition Placement Group"]
        P1["Partition 1: [EC2][EC2]"] --- P2["Partition 2: [EC2][EC2]"]
        P2 --- P3["Partition 3: [EC2][EC2]"]
    end
Type Key Characteristics
Cluster ✅ Same rack — lowest latency ⚠️ Single rack failure risk
Spread ✅ Different hardware — fault isolation ⚠️ Max 7 per AZ
Partition ✅ Per-partition isolation ✅ Good for Cassandra, Kafka

2. EC2 Networking & Security#

2.1 Security Groups (Stateful Firewall)#

Security groups act as a virtual firewall for EC2 instances:

Feature Security Group NACL
State Stateful Stateless
Rules Allow only Allow + Deny
Evaluation All rules evaluated Rule number order
Scope Instance-level Subnet-level
# Create security group
aws ec2 create-security-group \
  --group-name web-sg \
  --description "Web server security group" \
  --vpc-id vpc-abc123

# Add inbound rules
aws ec2 authorize-security-group-ingress \
  --group-id sg-abc123 \
  --protocol tcp --port 80 --cidr 0.0.0.0/0

aws ec2 authorize-security-group-ingress \
  --group-id sg-abc123 \
  --protocol tcp --port 443 --cidr 0.0.0.0/0

aws ec2 authorize-security-group-ingress \
  --group-id sg-abc123 \
  --protocol tcp --port 22 --cidr 203.0.113.0/24  # SSH only from office

2.2 EC2 User Data (Bootstrapping)#

Run scripts at instance launch:

#!/bin/bash
yum update -y
yum install -y httpd
systemctl enable httpd
systemctl start httpd
echo "<h1>Hello from $(hostname -f)</h1>" > /var/www/html/index.html
# Launch with user data
aws ec2 run-instances \
  --image-id ami-0abcdef1234567890 \
  --instance-type t3.micro \
  --user-data file://bootstrap.sh \
  --security-group-ids sg-abc123 \
  --subnet-id subnet-abc123 \
  --iam-instance-profile Name=EC2-WebRole

2.3 EC2 Instance Metadata (IMDS)#

Access instance metadata from within the instance:

# Get instance metadata (IMDSv1)
curl http://169.254.169.254/latest/meta-data/

# Get instance ID
curl http://169.254.169.254/latest/meta-data/instance-id

# Get public IP
curl http://169.254.169.254/latest/meta-data/public-ipv4

# Get IAM role credentials  
curl http://169.254.169.254/latest/meta-data/iam/security-credentials/MyRole

# IMDSv2 (token-based, more secure)
TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" \
  -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
curl -H "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/

Exam Tip: IMDSv2 is session-based and more secure. Always prefer IMDSv2 over IMDSv1. You can enforce IMDSv2 at instance launch.


3. Elastic Block Store (EBS)#

3.1 EBS Volume Types#

Type Max IOPS Max Throughput Use Case
gp3 (SSD) 16,000 1,000 MB/s General purpose, boot volumes
io2 Block Express (SSD) 256,000 4,000 MB/s Large databases, SAP
st1 (HDD) 500 500 MB/s Big data, logs (throughput-optimized)
sc1 (HDD) 250 250 MB/s Cold data, infrequent access

3.2 EBS Snapshots#

# Create snapshot
aws ec2 create-snapshot \
  --volume-id vol-abc123 \
  --description "Web server backup - $(date +%Y-%m-%d)"

# Copy snapshot to another region
aws ec2 copy-snapshot \
  --source-region us-east-1 \
  --source-snapshot-id snap-abc123 \
  --destination-region eu-west-1

# Create AMI from snapshot
aws ec2 register-image \
  --name "MyApp-v1.0.0" \
  --block-device-mappings DeviceName=/dev/xvda,Ebs={SnapshotId=snap-abc123}

Snapshot Features:

  • Incremental — Only changed blocks are saved
  • Hibernation — Preserve RAM state for fast resume
  • Recycle Bin — Recover accidentally deleted snapshots (retention 1 day to 1 year)

3.3 EBS Encryption#

  • By default, EBS encryption is enabled (new accounts automatically encrypt)
  • Uses KMS with a customer managed key
  • Copying an unencrypted snapshot to encrypted creates encrypted volume
  • No performance impact

4. Real-World Use Cases#

Use Case 1: Web Application with Auto Scaling#

                     ┌──────────────┐
                     │  Route53 DNS │
                     └──────┬───────┘
                     ┌──────┴───────┐
                     │  ALB         │
                     │ (HTTPS)      │
                     └──────┬───────┘
              ┌─────────────┼─────────────┐
              │             │             │
        ┌─────┴──┐   ┌─────┴──┐   ┌─────┴──┐
        │ EC2 #1 │   │ EC2 #2 │   │ EC2 #3 │
        │(t3.med)│   │(t3.med)│   │(t3.med)│
        ├────────┤   ├────────┤   ├────────┤
        │Web App │   │Web App │   │Web App │
        └────┬───┘   └────┬───┘   └────┬───┘
             │            │            │
        ┌────┴────────────┴────────────┴───┐
        │         RDS Database              │
        │         (Multi-AZ)                │
        └──────────────────────────────────┘

Auto Scaling Config:

{"AutoScalingGroupName": "web-asg",
  "MinSize": 2,
  "MaxSize": 10,
  "DesiredCapacity": 2, "VPCZoneIdentifier": "subnet-1a,subnet-1b", "TargetGroupARNs": ["arn:aws:elasticloadbalancing:...:targetgroup/web-tg/abc"], "HealthCheckType": "ELB", "HealthCheckGracePeriod": 300 }

Scaling Policy:

{"PolicyName": "scale-out-cpu", "PolicyType": "TargetTrackingScaling", "TargetTrackingConfiguration": { "PredefinedMetricSpecification": { "PredefinedMetricType": "ASGAverageCPUUtilization" },
    "TargetValue": 70.0
  }
}

Use Case 2: Batch Processing with Spot Instances#

Scenario: Process 10 TB of nightly log files. Can be interrupted and restarted.

Solution: Spot Fleet with mixed instance types + EBS-optimized instances + checkpointing to S3

Use Case 3: Burstable vs Non-Burstable#

Scenario: A development web server has low CPU most of the time but occasional spikes.

Solution: Use t3.medium with CPU Credits. It accumulates credits during idle and uses them during bursts.

CPU Usage:
100% ┤     ██
 75% ┤   ██████
 50% ┤ ████████████
 25% ┤████████████████████████████████████
     └─────────────────────────────────→ Time
       Burst  Idle  Burst  Idle  (T3 Unlimited)

Exam Tip: T2/T3 instances have CPU credits. Unlimited mode lets you burst beyond credits (extra cost). Use for variable workloads, not consistent high CPU.


5. ⚡ Exam Tips#

  1. Termination Protection — Enabled by default. Must disable to terminate
  2. Instance Metadata169.254.169.254 — always use IMDSv2
  3. ENA vs VF — ENA (Elastic Network Adapter) for enhanced networking
  4. Hibernate — Preserves RAM to EBS. Faster than reboot. Max 60 days
  5. Placement Groups — Cluster (low latency), Spread (isolation), Partition (big data)
  6. EBS vs Instance Store — EBS is persistent. Instance store is ephemeral but higher performance
  7. Nitro System — Underlying virtualization for newer instances. Better performance, security
  8. Dedicated Hosts — Physical server for existing socket/core licenses (BYOL)

✅ Chapter Quiz#

  1. Which EC2 purchase option is best for a batch processing job that can be interrupted?

    • A) On-Demand
    • B) Reserved
    • C) Spot
    • D) Dedicated
  2. You need the lowest latency between EC2 instances for HPC. Which placement group?

    • A) Spread
    • B) Partition
    • C) Cluster
    • D) Distributed
  3. What is the lifecycle of an EBS snapshot?

    • A) Full copy on every snapshot
    • B) Incremental (only changed blocks)
    • C) Differential (changed since last full)
    • D) Always encrypted
  4. Which EC2 feature preserves RAM state for faster resume?

    • A) Stop
    • B) Terminate
    • C) Hibernate
    • D) Reboot
  5. What IP address do you use to access EC2 instance metadata from within the instance?

    • A) 10.0.0.1
    • B) 169.254.169.254
    • C) 127.0.0.1
    • D) 192.168.1.1
  6. A company needs to launch EC2 instances in physically isolated hardware within a single AZ for low-latency networking. Which placement group should be used?

    • A) Cluster
    • B) Spread
    • C) Partition
    • D) Distributed
  7. You need to attach 50 TB of block storage to a single EC2 instance. Which storage option supports this requirement?

    • A) EBS volumes striped via RAID 0
    • B) Instance Store
    • C) EFS
    • D) S3
  8. An EC2 instance hosting a critical application failed. The administrator cannot connect via SSH and the system log shows a kernel panic. What should be done to restore the application?

    • A) Stop the instance and start it again
    • B) Terminate the instance and launch a new one from the same AMI
    • C) Use EC2 Auto Recovery
    • D) Reboot the instance
  9. An application running on a t3.medium instance consistently uses 90% CPU for extended periods. What happens when CPU credit balance is exhausted in standard mode?

    • A) The instance is stopped
    • B) The instance is throttled to baseline CPU
    • C) The instance is automatically upgraded to a larger type
    • D) AWS charges additional fees for burst performance
  10. A company wants to migrate a legacy application that requires a specific physical server for licensing purposes. Which EC2 option should be used?

    • A) Reserved Instance
    • B) Dedicated Host
    • C) Dedicated Instance
    • D) Spot Instance
  11. A company runs a production database on an EC2 instance and needs the highest possible IOPS. Which EBS volume type should be selected?

    • A) gp3
    • B) io2 Block Express
    • C) st1
    • D) sc1
  12. An application needs to be highly available across multiple Availability Zones. Which EC2 feature distributes instances across different physical locations?

    • A) Placement groups
    • B) Auto Scaling groups with instances in multiple AZs
    • C) EC2 Dedicated Hosts
    • D) EC2 Instance Store
  13. A batch processing workload runs for 6 hours each night and can tolerate interruptions. Which purchasing option provides the LOWEST cost?

    • A) On-Demand Instances
    • B) Reserved Instances
    • C) Spot Instances
    • D) Dedicated Hosts
  14. A company needs to migrate an application to AWS that requires a specific CPU socket and core license. Which EC2 option should be used?

    • A) On-Demand Instances
    • B) Dedicated Hosts
    • C) Spot Instances
    • D) Reserved Instances
  15. Which EC2 instance family is optimized for memory-intensive workloads like in-memory databases and real-time analytics?

    • A) C family (Compute Optimized)
    • B) R family (Memory Optimized)
    • C) I family (Storage Optimized)
    • D) T family (Burstable)
  16. An EC2 instance launched in a private subnet needs to download security patches from the internet. Which component must be configured?

    • A) An Internet Gateway attached to the VPC
    • B) A NAT Gateway in a public subnet with a route from the private subnet
    • C) A VPC Peering connection to a public subnet
    • D) An AWS Direct Connect connection
  17. Which statements about Security Groups are correct? (Select TWO)

    • A) Security Groups are stateless
    • B) Security Groups support allow rules only
    • C) Security Groups support both allow and deny rules
    • D) Security Groups are stateful
    • E) Security Groups filter traffic at the subnet level
  18. A solutions architect needs to create a custom AMI from a running EC2 instance to standardize future deployments. What is the correct approach?

    • A) Stop the instance, create an EBS snapshot of the root volume, and register it as an AMI
    • B) Create an EBS snapshot from the root volume while the instance is running, then register the snapshot as an AMI
    • C) Copy the root volume to a new instance and snapshot that
    • D) Use AWS CloudFormation to create the AMI from the running instance
  19. What is the key difference between EBS volumes and Instance Store volumes?

    • A) EBS is ephemeral; Instance Store is persistent
    • B) EBS is persistent; Instance Store is ephemeral
    • C) Both provide persistent storage
    • D) Both provide ephemeral storage
  20. An EC2 instance with a 500 GB gp3 EBS volume is running low on disk space. How can the volume be resized with minimal downtime?

    • A) Launch a new instance with a larger volume and migrate data
    • B) Modify the EBS volume size, then extend the file system
    • C) Add an Instance Store volume
    • D) Create a new larger volume and attach it in place of the old volume
  21. A web application behind an Application Load Balancer needs to scale EC2 instances based on average CPU utilization. Which Auto Scaling policy type is MOST appropriate?

    • A) Simple scaling policy
    • B) Target tracking scaling policy
    • C) Scheduled scaling policy
    • D) Manual scaling
  22. Which placement group type provides the lowest possible network latency and highest throughput between EC2 instances?

    • A) Spread placement group
    • B) Partition placement group
    • C) Cluster placement group
    • D) Distributed placement group
  23. How does an application running on an EC2 instance with an IAM role obtain temporary AWS credentials?

    • A) Read from an environment variable set at launch
    • B) Retrieve them from the EC2 instance metadata service (IMDS)
    • C) Read from a configuration file downloaded from S3
    • D) Retrieve them from AWS Secrets Manager
  24. What happens when a t3.micro EC2 instance in standard mode exhausts its accumulated CPU credits?

    • A) The instance is immediately stopped
    • B) The instance continues running but CPU performance is throttled to the baseline
    • C) The instance automatically acquires additional credits at no charge
    • D) The instance is terminated and relaunched
  25. A company needs to encrypt an existing unencrypted EBS volume. What is the MOST efficient approach?

    • A) Enable encryption directly on the volume using the EC2 console
    • B) Create a snapshot of the volume, copy the snapshot with encryption enabled, and create a new encrypted volume from the copied snapshot
    • C) Format the volume with an encrypted file system
    • D) Attach the volume to an instance and use OS-level encryption tools
📝 Answer Key
  1. C — Spot Instances are 90% cheaper but can be interrupted.
  2. C — Cluster placement group places instances in a single AZ for low latency.
  3. B — EBS snapshots are incremental (only changed blocks).
  4. C — Hibernate preserves RAM state to EBS for faster resume.
  5. B169.254.169.254 is the link-local address for instance metadata.
  6. A — Cluster placement groups provide low-latency networking within a single AZ.
  7. A — Multiple EBS volumes can be striped (RAID 0) to provide block storage exceeding single volume limits.
  8. B — A kernel panic indicates OS-level corruption; launch a fresh instance from the AMI.
  9. B — T2/T3 instances in standard mode are throttled to baseline CPU when credits are exhausted.
  10. B — Dedicated Hosts provide physical servers for BYOL and server-bound software licenses.
  11. B — io2 Block Express provides up to 256,000 IOPS for latency-sensitive, high-throughput database workloads.
  12. B — Auto Scaling groups configured with multiple AZs distribute instances across Availability Zones for HA.
  13. C — Spot Instances provide up to 90% discount but can be interrupted with a 2-minute warning.
  14. B — Dedicated Hosts provide physical servers for existing socket/core licenses and compliance needs.
  15. B — The R instance family (e.g., R6i, X2iedn) is optimized for memory-intensive workloads.
  16. B — A NAT Gateway in a public subnet with a route from the private subnet provides outbound internet access.
  17. B, D — Security Groups are stateful and only support allow rules (no explicit deny rules).
  18. B — Amazon EC2 can create an AMI from a running instance by taking snapshots and registering them.
  19. B — EBS volumes are persistent (data survives instance termination); Instance Store is ephemeral.
  20. B — Modify the EBS volume size (elastic volumes) and extend the file system online with minimal downtime.
  21. B — Target tracking scaling policy automatically adjusts capacity based on a target metric like CPU.
  22. C — Cluster placement groups place instances in the same rack within a single AZ for lowest latency.
  23. B — EC2 instance metadata at 169.254.169.254/latest/meta-data/iam/security-credentials/ provides temporary credentials.
  24. B — In standard mode, the instance is throttled to baseline CPU when credits are exhausted.
  25. B — Create an unencrypted snapshot, copy it with encryption enabled, then create a new encrypted volume.

📚 Additional Resources#

Next → VPC Networking