⚖️ Elastic Load Balancing & Auto Scaling#

Learning Objectives#

  • Choose between ALB, NLB, and CLB based on requirements
  • Configure target groups, listeners, and health checks
  • Implement Auto Scaling policies (simple, step, target tracking)
  • Design for high availability across AZs

1. Elastic Load Balancing (ELB)#

1.1 Load Balancer Types#

Feature ALB (Layer 7) NLB (Layer 4) CLB (Legacy)
OSI Layer 7 (HTTP/HTTPS) 4 (TCP/UDP) 4 & 7
Protocols HTTP, HTTPS, gRPC TCP, UDP, TLS HTTP, HTTPS, TCP, SSL
Target Type Instance, IP, Lambda Instance, IP, ALB Instance only
SSL/TLS Yes (termination) Yes (passthrough/TCP) Yes
WebSocket Yes No No
Sticky Sessions Yes (cookies) No (source IP) Yes
Fixed Response Yes No No
Path-based routing Yes No No
Host-based routing Yes No No
Static IP No (use NLB + Global Accelerator) Yes (per AZ) No
Slow start Yes No No
Price $0.0225/hr $0.0225/hr $0.025/hr
Use Case Microservices, containers TCP/UDP, extreme performance Legacy apps

Exam Tip: ALB for HTTP/HTTPS with path-based routing. NLB for TCP/UDP with static IPs or extreme performance. CLB is legacy — avoid on new projects.

1.2 ALB Deep Dive#

graph TD
    User["User / Client"]
    Route53["Route53 DNS"]
    ALB["ALB (Layer 7)\nmy-alb-123.elb.amazonaws.com"]
    Listener["Listener: Port 443 (HTTPS)\nCertificate: ACM"]
    Rules["Rules:\n/api/* → API-TG\n/* → Web-TG"]
    TG_WEB["Web Target Group\nEC2 × 3 (t3.medium)\nHealth: /health"]
    TG_API["API Target Group\nECS Fargate × 2\nHealth: /api/health"]

    User --> Route53
    Route53 --> ALB
    ALB --> Listener
    Listener --> Rules
    Rules --> TG_WEB
    Rules --> TG_API

    style User fill:#888,color:#fff
    style ALB fill:#ff9900,color:#fff
    style TG_WEB fill:#527fff,color:#fff
    style TG_API fill:#01ab5c,color:#fff

Connection Flow (ALB → Target):

sequenceDiagram
    participant User as User
    participant DNS as Route53
    participant ALB as ALB
    participant TG as Target Group
    participant EC2 as EC2 Instance

    User->>DNS: myapp.com
    DNS->>User: ALB DNS name
    User->>ALB: HTTPS request (TLS termination)
    ALB->>ALB: Evaluate listener rules
    ALB->>ALB: Path-based routing (/api/*)
    ALB->>TG: Forward to target group
    TG->>EC2: HTTP to instance:80
    EC2-->>TG: HTTP 200 OK
    TG-->>ALB: Response
    ALB-->>User: HTTPS response
    Note over ALB,EC2: Health checks every 30s

Create ALB:

# Create target group
aws elbv2 create-target-group \
  --name web-targets \
  --protocol HTTP \
  --port 80 \
  --vpc-id vpc-abc123 \
  --health-check-path /health \
  --healthy-threshold-count 3 \
  --unhealthy-threshold-count 3 \
  --matcher HttpCode="200,301"

# Create ALB
aws elbv2 create-load-balancer \
  --name my-alb \
  --subnets subnet-abc subnet-def \
  --security-groups sg-web

# Create listener
aws elbv2 create-listener \
  --load-balancer-arn arn:aws:elasticloadbalancing:...:loadbalancer/app/my-alb/abc \
  --protocol HTTPS \
  --port 443 \
  --certificates CertificateArn=arn:aws:acm:...certificate/abc \
  --default-actions Type=forward,TargetGroupArn=arn:aws:...:targetgroup/web-targets/abc

# Register targets
aws elbv2 register-targets \
  --target-group-arn arn:aws:...:targetgroup/web-targets/abc \
  --targets Id=i-abc123 Id=i-def456

1.3 Sticky Sessions (Session Affinity)#

ALB: Uses cookies (AWSALB or custom app cookie) NLB: Uses source IP (stickiness based on client IP)

Use case: Stateful apps where user sessions must stay on the same instance.

# Enable sticky sessions on ALB
aws elbv2 modify-target-group-attributes \
  --target-group-arn arn:aws:...:targetgroup/web-targets/abc \
  --attributes Key=stickiness.enabled,Value=true \
               Key=stickiness.type,Value=lb_cookie \
               Key=stickiness.lb_cookie.duration_seconds,Value=86400

1.4 Cross-Zone Load Balancing#

  • ALB: Enabled by default (distributes evenly across all AZs)
  • NLB: Disabled by default (each AZ gets traffic from its own clients)
  • When disabled, traffic stays in the same AZ (50/50 to each AZ)
Cross-Zone ON:
    us-east-1a: [EC2] [EC2]    (50% traffic)
    us-east-1b: [EC2]          (50% traffic)
    
Cross-Zone OFF:
    us-east-1a: [EC2] [EC2]    (50% traffic, split between 2 instances)
    us-east-1b: [EC2]          (50% traffic, all to 1 instance)

2. Auto Scaling#

2.1 Auto Scaling Groups (ASG)#

┌─────────────────────────────────────────────────────┐
│                 Auto Scaling Group                   │
│                                                      │
│  Launch Template:                                    │
│  ├── AMI: ami-0abc123 (latest app version)           │
│  ├── Instance Type: t3.medium                        │
│  ├── Security Group: sg-web                          │
│  ├── IAM Role: EC2-AppRole                           │
│  └── User Data: bootstrap script                     │
│                                                      │
│  Scaling Config:                                     │
│  ├── Min: 2                                          │
│  ├── Desired: 2                                      │
│  └── Max: 10                                         │
│                                                      │
│  Health Check: ELB (grace period: 300s)              │
│  Scaling Policy: Target tracking (CPU @ 70%)         │
└─────────────────────────────────────────────────────┘

Create ASG:

# Create launch template
aws ec2 create-launch-template \
  --launch-template-name web-template \
  --version-description v1 \
  --launch-template-data '{"ImageId": "ami-0abcdef1234567890", "InstanceType": "t3.medium", "SecurityGroupIds": ["sg-abc123"], "IamInstanceProfile": {"Name": "EC2-AppRole"},
    "UserData": "'$(base64 -w0 bootstrap.sh)'"
  }'

# Create Auto Scaling Group
aws autoscaling create-auto-scaling-group \
  --auto-scaling-group-name web-asg \
  --launch-template LaunchTemplateName=web-template,Version=1 \
  --min-size 2 \
  --max-size 10 \
  --desired-capacity 2 \
  --vpc-zone-identifier "subnet-abc,subnet-def" \
  --target-group-arns arn:aws:...:targetgroup/web-targets/abc \
  --health-check-type ELB \
  --health-check-grace-period 300

2.2 Scaling Policies#

Policy Type How it Works Use Case
Simple Set threshold → scale by X → cool down Basic scenarios
Step Multiple thresholds → different scaling amounts Fine-grained control
Target Tracking Automatically maintain a target metric Most common (CPU 70%)
Scheduled Scale based on time-based predictions Known traffic patterns
Predictive ML-based forecast + proactive scaling Cyclic workloads

Target Tracking Policy:

aws autoscaling put-scaling-policy \
  --auto-scaling-group-name web-asg \
  --policy-name cpu-target \
  --policy-type TargetTrackingScaling \
  --target-tracking-configuration '{"TargetValue": 70.0, "PredefinedMetricSpecification": { "PredefinedMetricType": "ASGAverageCPUUtilization" }
  }'

Step Scaling Policy:

aws autoscaling put-scaling-policy \
  --auto-scaling-group-name web-asg \
  --policy-name cpu-step \
  --policy-type StepScaling \
  --adjustment-type ChangeInCapacity \
  --step-adjustments '[
    {"MetricIntervalLowerBound": 0, "MetricIntervalUpperBound": 20, "ScalingAdjustment": 1},
    {"MetricIntervalLowerBound": 20, "ScalingAdjustment": 3}
  ]'

2.3 Lifecycle Hooks#

Pause instance launch/termination for custom actions:

1. Scale out event → Pending:Wait (60 min timeout)
2. Lambda invoked (configures app, registers with monitoring)
3. Complete lifecycle action → InService
aws autoscaling put-lifecycle-hook \
  --lifecycle-hook-name web-configure \
  --auto-scaling-group-name web-asg \
  --lifecycle-transition autoscaling:EC2_INSTANCE_LAUNCHING \
  --default-result CONTINUE \
  --heartbeat-timeout 600 \
  --notification-target-arn arn:aws:sns:...:asg-lifecycle

3. Real-World Use Cases#

Use Case 1: E-Commerce Traffic Spike#

Scenario: Your e-commerce site gets 10x traffic during Black Friday.

Solution:

  1. ALB with path-based routing: /api/* → API servers, /static/* → S3
  2. ASG with CPU target tracking (target: 60%)
  3. Predictive scaling based on last year’s data
  4. Scheduled scaling to pre-warm before the event

Use Case 2: Blue/Green Deployment#

Blue (Current): ASG with v1 instances
Green (New): ASG with v2 instances
Swap: Update ALB listener to point to Green TG

┌──────────────────────────────────────────────────┐
│        ALB                                        │
│    /v1/* → Blue-TG (instances running v1)         │
│    /v2/* → Green-TG (instances running v2)        │
│    /* → Blue-TG (until cutover)                   │
└──────────────────────────────────────────────────┘

Switch: Update default rule to Green-TG → Shift traffic

Use Case 3: Spot + On-Demand Mix#

Scenario: Save costs by using Spot Instances for workers, On-Demand for critical services.

aws autoscaling create-auto-scaling-group \
  --mixed-instances-policy '{ "LaunchTemplate": {"LaunchTemplateName": "worker", "Version": "1"},
    "InstancesDistribution": {"OnDemandPercentageAboveBaseCapacity": 20, "SpotAllocationStrategy": "capacity-optimized", "OnDemandBaseCapacity": 2 }
  }'

4. ⚡ Exam Tips#

  1. ALB vs NLB — ALB = HTTP/HTTPS path-based. NLB = TCP/UDP with static IPs
  2. Cross-zone LB — ALB = on by default. NLB = off by default
  3. Sticky sessions — ALB uses cookies. NLB uses source IP
  4. Target tracking — Easiest scaling policy (just set target value)
  5. Health check grace period — Allow app to start before health checks begin (default 300s)
  6. Cooldown — Time between scaling activities (default 300s for simple)
  7. ELB with Lambda — ALB can directly invoke Lambda as target
  8. NLB + Static IP — Use with Global Accelerator for fixed IP addresses

✅ Chapter Quiz#

  1. You need a public-facing load balancer with path-based routing. Which type?

    • A) CLB
    • B) NLB
    • C) ALB
    • D) Route53
  2. What is the default health check grace period in Auto Scaling?

    • A) 60 seconds
    • B) 120 seconds
    • C) 300 seconds
    • D) 600 seconds
  3. Which scaling policy automatically maintains a target metric?

    • A) Simple
    • B) Step
    • C) Target Tracking
    • D) Scheduled
  4. By default, cross-zone load balancing is enabled for which ELB type?

    • A) NLB only
    • B) ALB only
    • C) Both ALB and NLB
    • D) Neither
  5. Which ELB type can directly invoke Lambda functions as targets?

    • A) CLB
    • B) NLB
    • C) ALB
    • D) None
  6. A web application behind an ALB returns intermittent 503 errors during traffic spikes. What is the MOST likely cause?

    • A) The SSL certificate has expired
    • B) The target group has insufficient healthy capacity
    • C) Cross-zone load balancing is disabled
    • D) The security group blocks inbound traffic
  7. A company needs to route TCP traffic with static IP addresses across multiple AWS regions. Which combination of services should be used?

    • A) ALB + Route53
    • B) NLB + Global Accelerator
    • C) CLB + CloudFront
    • D) NLB + Route53 latency routing
  8. An Auto Scaling group has min=2, desired=2, max=10. A target tracking policy is configured for 60% CPU. During a deployment, new instances fail health checks and are terminated. What happens next?

    • A) The ASG scales down to 0 instances
    • B) The ASG launches replacement instances to maintain desired capacity
    • C) The ASG suspends all scaling activities
    • D) The ASG remains at the current count
  9. A solutions architect needs user sessions to persist on the same backend instance across multiple requests behind an ALB. Which feature should be enabled?

    • A) Cross-zone load balancing
    • B) Sticky sessions
    • C) Connection draining
    • D) Slow start
  10. After placing an application behind an ALB, all access logs show the ALB’s private IP instead of the client IP. How can the application retrieve the original client IP?

    • A) Enable Proxy Protocol on the ALB
    • B) Read the X-Forwarded-For header
    • C) Enable VPC Flow Logs
    • D) Configure the ALB to use the client’s source IP
  11. An ASG using simple scaling adds 2 instances when CPU exceeds 80%. After scaling out, CPU drops below the threshold. The ASG waits 300 seconds before allowing another scaling activity. What is this period called?

    • A) Health check grace period
    • B) Cooldown period
    • C) Warm-up period
    • D) Termination delay
  12. A solutions architect needs to route /api/ requests to backend services and / requests to web servers. Which load balancer type supports this?**

    • A) NLB
    • B) CLB
    • C) ALB
    • D) Gateway Load Balancer
  13. A web application behind an ALB must be protected from SQL injection and cross-site scripting attacks. Which service should be associated with the ALB?

    • A) Network ACLs
    • B) Security Groups
    • C) AWS WAF
    • D) AWS Shield Advanced
  14. During rolling updates, in-flight requests are interrupted when instances deregister from a target group. What feature should be configured to allow requests to complete?

    • A) Slow start
    • B) Connection draining (deregistration delay)
    • C) Sticky sessions
    • D) Cross-zone load balancing
  15. An ASG is at max capacity (10 instances) of t3.micro instances, yet the application remains CPU-bound during peak hours. What is the MOST effective solution?

    • A) Increase max capacity to 20
    • B) Change the instance type to a larger size
    • C) Switch from target tracking to step scaling
    • D) Reduce the cooldown period
  16. A company uses lifecycle hooks to run a custom configuration script before instances serve traffic. Which lifecycle state should the hook target?

    • A) InService
    • B) Pending:Wait
    • C) Terminating:Wait
    • D) Detaching:Wait
  17. An NLB target group hosts HTTPS services on port 443. What health check type is MOST appropriate?

    • A) HTTP on port 80
    • B) HTTPS with certificate validation
    • C) TCP on port 443
    • D) HTTP with status code matcher
  18. An ALB distributes traffic to 4 instances across 3 AZs (us-east-1a has 2, us-east-1b has 1, us-east-1c has 1). Cross-zone load balancing is enabled. How is traffic distributed?

    • A) Each AZ receives 33.3% of traffic
    • B) Each instance receives 25% of traffic
    • C) us-east-1a receives 50%, others receive 25% each
    • D) Traffic is distributed randomly
  19. A company needs to scale infrastructure in anticipation of known traffic patterns, such as a flash sale starting at 9:00 AM. Which scaling policy should be used?

    • A) Target tracking
    • B) Step scaling
    • C) Scheduled scaling
    • D) Simple scaling
  20. A gaming company runs a UDP-based multiplayer game. Which load balancer handles UDP traffic with the lowest latency?

    • A) ALB
    • B) NLB
    • C) CLB
    • D) Gateway Load Balancer
  21. An ASG has min=3, max=10 with a step scaling policy: add 2 instances when CPU > 70%, remove 1 when CPU < 30%. After a spike, the ASG is at 8 instances. Traffic normalizes and CPU drops below 30%. How many instances terminate in the next scale-in event?

    • A) 0
    • B) 1
    • C) 5
    • D) 8
  22. A company wants to store ALB access logs in S3 for analysis. What must be configured?

    • A) Enable access logging on the ALB and specify an S3 bucket
    • B) Install the CloudWatch agent on the ALB
    • C) Enable VPC Flow Logs to S3
    • D) Configure CloudTrail for the ALB
  23. An ALB in public subnets must accept HTTPS traffic only. Which configurations are required? (Choose two.)

    • A) Security group allowing inbound HTTPS from 0.0.0.0/0
    • B) Network ACL allowing HTTPS on ALB subnets
    • C) HTTPS listener with an ACM certificate
    • D) Cross-zone load balancing enabled
    • E) Sticky sessions configured
  24. An ASG uses a mixed instances policy with On-Demand and Spot Instances. During a Spot interruption, what does the ASG do?

    • A) Terminates without replacement
    • B) Launches replacement instances to maintain desired capacity
    • C) Reduces desired capacity
    • D) Switches all instances to On-Demand
  25. A company needs to gradually shift 10% of traffic to a new application version behind an ALB. Which feature supports this?

    • A) Multiple target groups with weighted routing on the listener rule
    • B) Route53 weighted routing
    • C) Auto Scaling instance refresh
    • D) ALB connection draining
📝 Answer Key
  1. C — ALB supports path-based and host-based routing at Layer 7.
  2. C — 300 seconds (5 minutes) is the default health check grace period.
  3. C — Target Tracking automatically maintains the target value (e.g., 70% CPU).
  4. B — Cross-zone is enabled by default on ALB, disabled by default on NLB.
  5. C — ALB can target Lambda functions, making it ideal for serverless APIs.
  6. B — 503 errors from an ALB indicate no healthy targets are available to handle the request due to insufficient capacity.
  7. B — NLB handles TCP/UDP traffic and Global Accelerator provides 2 static Anycast IPs with global traffic optimization.
  8. B — The ASG automatically replaces terminated instances to maintain the desired or minimum capacity.
  9. B — Sticky sessions (session affinity) route the same client to the same target using cookies.
  10. B — ALB adds the X-Forwarded-For header with the client’s IP; the application must read this header.
  11. B — The cooldown period (default 300s for simple scaling) prevents additional scaling actions before the previous one takes effect.
  12. C — ALB supports path-based routing rules to direct requests to different target groups.
  13. C — AWS WAF protects web applications from SQL injection, XSS, and other web exploits when associated with an ALB.
  14. B — Connection draining (deregistration delay) allows in-flight requests to complete while the instance is being deregistered.
  15. B — When the ASG is at max capacity and instances are still CPU-bound, increasing instance type provides more resources per instance.
  16. B — Pending:Wait allows custom actions after launch but before the instance enters InService.
  17. C — NLB health checks operate at Layer 4; a TCP health check on port 443 verifies the target accepts connections.
  18. B — With cross-zone LB enabled, traffic is distributed evenly across all instances regardless of AZ distribution.
  19. C — Scheduled scaling allows time-based scaling actions for known traffic patterns.
  20. B — NLB handles UDP traffic at Layer 4 with extreme performance and low latency.
  21. B — Step scaling policies execute exact adjustments per event; the policy specifies remove 1 instance.
  22. A — ALB can be configured to deliver access logs directly to a specified S3 bucket.
  23. A, C — The security group must allow HTTPS inbound, and an HTTPS listener with an ACM certificate is required.
  24. B — The ASG treats Spot interruptions as failures and launches replacement instances to maintain desired capacity.
  25. A — ALB supports weighted target groups on a listener rule, enabling gradual traffic shifting between versions.

📚 Additional Resources#

Next → Database Services