💼 AWS Interview Questions#
This chapter contains 50+ AWS interview questions organized by topic. These are commonly asked in Solutions Architect interviews at mid-to-senior level positions.
1. General AWS Questions#
Q1: What is the difference between scalability, elasticity, and high availability?#
Answer:
| Concept | Description |
|---|---|
| Scalability | Ability to handle increased load by adding resources (up/down or out/in) |
| Elasticity | Ability to dynamically scale resources up/down based on demand |
| High Availability | System designed to operate continuously without failure for extended periods |
Example: Auto Scaling gives elasticity. Multi-AZ gives HA.
Q2: Explain the Shared Responsibility Model.#
Answer: AWS is responsible for security OF the cloud (physical security, hardware, networking, managed services). The customer is responsible for security IN the cloud (OS patching, IAM, data encryption, network configuration).
Q3: What is the Well-Architected Framework?#
Answer: A framework of 6 pillars for building reliable, secure, efficient, and cost-effective systems:
- Operational Excellence
- Security
- Reliability
- Performance Efficiency
- Cost Optimization
- Sustainability
Q4: Explain the difference between vertical and horizontal scaling.#
| Type | Description | AWS Example |
|---|---|---|
| Vertical | Increase size of instance (t3.micro → t3.large) | Resize EC2, RDS |
| Horizontal | Increase number of instances | Auto Scaling, ELB |
2. Compute (EC2, Lambda)#
Q5: When would you use EC2 vs Lambda?#
| Factor | EC2 | Lambda |
|---|---|---|
| Duration | Any duration | Max 15 min |
| State | Stateful (EBS) | Stateless |
| Control | Full OS access | No OS access |
| Cost | Pay per hour | Pay per execution |
| Use Case | Legacy apps, custom runtime | Event-driven, APIs, processing |
Q6: Explain the different EC2 purchasing options.#
Answer:
- On-Demand — Pay per hour, no commitment. Spiky/unpredictable workloads.
- Reserved — 1-3 year commitment. Steady-state workloads (up to 72% off).
- Spot — Up to 90% off but can be terminated. Batch, fault-tolerant.
- Dedicated — Physical server. Licensing/compliance requirements.
Q7: What happens when a Spot Instance is terminated?#
Answer: You get a 2-minute interruption notice. Use this time to:
- Save checkpoints/data to S3 or EFS
- Gracefully shut down processes
- Use Lifecycle Hooks to trigger automation
Q8: How do you make EC2 instances highly available?#
Answer:
- Deploy across at least 2 Availability Zones
- Use Auto Scaling Group with min 2 instances
- Use ALB with cross-zone load balancing
- Use RDS Multi-AZ for database
- Use Route53 health checks and failover routing
3. Storage (S3, EBS, EFS)#
Q9: When would you choose S3 vs EBS vs EFS?#
Answer:
| Service | Use Case |
|---|---|
| S3 | Object storage, static website, backups, data lake |
| EBS | Block storage for EC2 boot volumes, databases |
| EFS | Shared file system for multiple EC2 instances, NFS |
Q10: How do you secure data in S3?#
Answer:
- Encryption at rest — SSE-S3, SSE-KMS, SSE-C
- Encryption in transit — HTTPS (CloudFront)
- IAM policies — Least privilege
- Bucket policies — Restrict access by IP, VPC, MFA
- Block Public Access settings
- Versioning — Protect against accidental deletion
- Object Lock — WORM compliance
Q11: Explain S3 storage classes and when to use each.#
Answer:
| Class | Use When |
|---|---|
| S3 Standard | Frequent access, critical data |
| Intelligent-Tiering | Unknown/infrequent access |
| Standard-IA | Infrequent but quick access needed |
| One Zone-IA | Recreatable data |
| Glacier | Archives, retrieval in minutes-hours |
| Deep Archive | Compliance, retrieval in 12-48 hours |
Q12: What is S3 Transfer Acceleration?#
Answer: Uses AWS Edge Locations to accelerate uploads over long distances. Best for large files uploaded from far away. Not beneficial for small files or nearby regions.
4. Networking (VPC)#
Q13: Design a VPC for a 3-tier web application.#
Answer:
├── Public Subnets (AZ-1a, AZ-1b)
│ ├── ALB
│ └── NAT Gateway
├── Private Subnets (AZ-1a, AZ-1b)
│ ├── EC2 (Auto Scaling)
│ └── Application Tier
├── Database Subnets (AZ-1a, AZ-1b)
│ ├── RDS (Multi-AZ)
│ └── ElastiCacheQ14: What’s the difference between a Security Group and a NACL?#
Answer:
| Feature | Security Group | NACL |
|---|---|---|
| Level | Instance (ENI) | Subnet |
| State | Stateful | Stateless |
| Rules | Allow only | Allow + Deny |
| Evaluation | All rules | Rule number order |
Q15: How would you connect two VPCs in different regions?#
Answer: Use VPC Peering (cross-region) or Transit Gateway for more than 5 VPCs.
5. Databases#
Q16: What’s the difference between Multi-AZ and Read Replicas in RDS?#
Answer:
| Feature | Multi-AZ | Read Replicas |
|---|---|---|
| Purpose | High Availability | Read scaling |
| Replication | Synchronous | Asynchronous |
| Writes | Primary only | Primary only |
| Reads | Primary only | Any replica |
| Failover | Automatic | Manual (promote) |
Q17: When should you use DynamoDB vs RDS?#
Answer:
| Use DynamoDB | Use RDS |
|---|---|
| Key-value lookups | Complex queries, joins |
| High-scale, low-latency | ACID transactions |
| Serverless, auto-scaling | Existing SQL skills |
| IoT, gaming, real-time | ERP, CRM, traditional apps |
Q18: What is DynamoDB DAX and when would you use it?#
Answer: DAX (DynamoDB Accelerator) is an in-memory cache that provides microsecond latency for read-heavy workloads. Use for real-time apps, gaming leaderboards, or hot read-heavy data.
6. High Availability & Disaster Recovery#
Q19: Explain the four DR strategies on AWS.#
Answer:
- Backup & Restore — Lowest cost, highest RTO (hours). Recover from snapshots.
- Pilot Light — Core services running in DR. Scale up on failover. RTO: ~30min.
- Warm Standby - Scaled-down prod running in DR. Scale up on failover. RTO: ~5min.
- Multi-Site Active-Active — Full prod in 2+ regions. RTO: near zero. Highest cost.
Q20: What’s the difference between RTO and RPO?#
Answer:
| Metric | Definition | Example |
|---|---|---|
| RTO | Time to recover after disaster | 1 hour to restore service |
| RPO | Maximum acceptable data loss | 15 minutes of lost data |
7. Security & Compliance#
Q21: How does KMS envelope encryption work?#
Answer:
- Generate a Data Encryption Key (DEK) from KMS
- Encrypt data with plaintext DEK (locally)
- Store encrypted DEK with data
- Decrypt: KMS decrypts DEK → Use DEK to decrypt data
This avoids the 1 MB limit of direct KMS encryption.
Q22: What’s the difference between IAM roles and resource-based policies?#
Answer:
| IAM Roles | Resource-based Policies |
|---|---|
| Attached to identities (users, services) | Attached to resources (S3 bucket, SQS) |
| Requires assume role | Direct access |
| Cross-account via trust policy | Cross-account via principal |
Q23: Explain how you would implement least privilege access.#
Answer:
- Start with deny all (explicit Deny)
- Grant only necessary permissions
- Use conditions (IP, time, MFA)
- Use permission boundaries to limit max permissions
- Use SCPs for organizational guardrails
- Use IAM Access Analyzer to identify over-permissive policies
- Review and rotate keys every 90 days
8. Cost Optimization#
Q24: How would you reduce EC2 costs by 40%?#
Answer:
- Right-size instances (downsize over-provisioned)
- Use Savings Plans or Reserved Instances (30-72% off)
- Use Spot Instances for fault-tolerant workloads (90% off)
- Stop unused instances during non-business hours
- Use Auto Scaling to match demand
- Use T3 burstable for variable workloads
Q25: What’s the difference between a Cost Budget and a Usage Budget?#
Answer:
| Budget Type | Tracks | Use Case |
|---|---|---|
| Cost | Dollar amount | “Don’t exceed $10K/month” |
| Usage | Resource quantity | “Don’t exceed 100 TB of S3 storage” |
| Savings Plans | Utilization & coverage | “Ensure 80% SP coverage” |
9. Scenario-Based Questions#
Q26: A company has a web app that stores session data on EC2 instances. Users are getting errors during high traffic. What do you recommend?#
Answer: Make the app stateless by moving session data to ElastiCache (Redis). This allows any instance to serve any user, enabling Auto Scaling.
Q27: A company needs to share large files (10GB+) with external partners securely. What’s the best approach?#
Answer: Store files in S3 with pre-signed URLs (time-limited access). Use CloudFront for faster downloads globally.
Q28: A database migration from Oracle to Aurora PostgreSQL requires minimal downtime. What do you use?#
Answer: Use AWS DMS (Database Migration Service) with ongoing replication (CDC). Use AWS SCT for schema conversion.
10. Architecture Design Questions#
Q29: Design a real-time analytics pipeline for 10TB of daily data.#
Answer:
IoT Devices → Kinesis Data Streams → Kinesis Data Analytics → S3
↓
Lambda (transform)
↓
OpenSearch ServiceQ30: Design a global e-commerce platform with < 100ms latency worldwide.#
Answer:
- Route53 with latency-based routing
- CloudFront CDN at Edge Locations for static content
- DynamoDB Global Tables for active-active multi-region DB
- Global Accelerator for dynamic content routing
- Lambda@Edge for user-specific logic at Edge
✅ Chapter Quiz#
-
What is the difference between scalability and elasticity?
- A) They are the same
- B) Scalability handles increased load; elasticity dynamically adjusts resources based on demand
- C) Elasticity handles increased load; scalability dynamically adjusts resources
- D) Neither relates to resource management
-
Under the Shared Responsibility Model, who is responsible for patching the OS on an EC2 instance?
- A) AWS
- B) The customer
- C) Both
- D) Third-party vendor
-
Which of the Well-Architected Framework pillars focuses on recovering from failures?
- A) Security
- B) Reliability
- C) Performance Efficiency
- D) Operational Excellence
-
What is the maximum retention period for CloudWatch Logs?
- A) 90 days
- B) 1 year
- C) 5 years
- D) 10 years
-
Which EC2 purchasing option is best for a fault-tolerant batch processing workload?
- A) On-Demand
- B) Reserved
- C) Spot
- D) Dedicated
-
What is the difference between Multi-AZ and Read Replicas in RDS?
- A) Multi-AZ is for HA with synchronous replication; Read Replicas are for read scaling with async replication
- B) They are the same feature
- C) Read Replicas provide HA; Multi-AZ provides read scaling
- D) Multi-AZ requires manual failover
-
Which AWS service is used to create a private, dedicated network connection from on-premises to AWS?
- A) VPN
- B) Direct Connect
- C) VPC Peering
- D) Transit Gateway
-
What is the purpose of an S3 pre-signed URL?
- A) To make objects public
- B) To grant time-limited access to a private object
- C) To accelerate uploads
- D) To encrypt objects
-
Which AWS service provides managed threat detection using ML?
- A) Inspector
- B) GuardDuty
- C) Macie
- D) Security Hub
-
What is the RPO of Aurora Global Database?
- A) Under 1 second
- B) 5 seconds
- C) 1 minute
- D) 5 minutes
-
Which AWS service would you use to store application configuration parameters with encryption?
- A) DynamoDB
- B) Parameter Store (SecureString)
- C) S3
- D) CloudFormation
-
How does KMS handle data larger than 1 MB?
- A) It can encrypt any size
- B) It uses envelope encryption with a data key
- C) It compresses the data first
- D) It rejects the request
-
What is the difference between a Security Group and a NACL?
- A) Security Groups are stateless; NACLs are stateful
- B) Security Groups are stateful; NACLs are stateless
- C) Both are stateful
- D) Both are stateless
-
Which service provides automatic failover for an RDS database across Availability Zones?
- A) Read Replicas
- B) Multi-AZ
- C) Automated backups
- D) Snapshots
-
What is the primary use case for DynamoDB DAX?
- A) Cross-region replication
- B) In-memory caching for microsecond read latency
- C) Data warehousing
- D) Stream processing
-
Which S3 storage class is best for data with unknown or changing access patterns?
- A) S3 Standard
- B) S3 Intelligent-Tiering
- C) S3 Glacier
- D) S3 One Zone-IA
-
What is the purpose of an IAM role?
- A) To create a user with permanent credentials
- B) To grant temporary permissions to a trusted entity
- C) To set a password policy
- D) To manage billing
-
Which AWS service should be used to decouple microservices for asynchronous communication?
- A) SQS
- B) API Gateway
- C) ELB
- D) CloudFront
-
What is the maximum time a Lambda function can run?
- A) 5 minutes
- B) 10 minutes
- C) 15 minutes
- D) 30 minutes
-
Which DR strategy has the lowest cost but highest RTO?
- A) Multi-Site Active-Active
- B) Warm Standby
- C) Pilot Light
- D) Backup & Restore
-
What is the purpose of AWS Control Tower?
- A) To set up and govern a secure multi-account AWS environment
- B) To monitor EC2 instances
- C) To manage database migrations
- D) To deploy containers
-
Which AWS service provides a managed Apache Kafka service?
- A) SQS
- B) MSK (Managed Streaming for Kafka)
- C) Kinesis
- D) SNS
-
What is the primary benefit of using Fargate over EC2 launch type for ECS?
- A) Full control over the operating system
- B) No need to manage underlying servers
- C) Lower cost
- D) Higher performance
-
Which routing policy in Route53 is used to route traffic based on geographic location of the user?
- A) Latency
- B) Geolocation
- C) Failover
- D) Weighted
-
What does the AWS Well-Architected Framework’s Sustainability pillar focus on?
- A) Reducing carbon footprint and energy consumption
- B) Improving security
- C) Reducing costs
- D) Increasing performance
📝 Answer Key
- B — Scalability handles increased load; elasticity dynamically adjusts resources based on demand.
- B — The customer is responsible for OS patching on EC2 (security IN the cloud).
- B — Reliability focuses on recovering from failures and meeting business continuity requirements.
- D — CloudWatch Logs can be retained indefinitely up to 10 years.
- C — Spot Instances offer up to 90% discount and are ideal for fault-tolerant workloads.
- A — Multi-AZ uses synchronous replication for HA; Read Replicas use async for read scaling.
- B — Direct Connect provides a dedicated fiber connection from on-premises to AWS.
- B — Pre-signed URLs grant temporary access to private S3 objects without making them public.
- B — GuardDuty uses ML to detect malicious activity from CloudTrail, VPC Flow Logs, and DNS logs.
- A — Aurora Global Database has under 1 second replication lag between regions.
- B — Parameter Store stores configuration values; SecureString type encrypts with KMS.
- B — KMS uses envelope encryption with a data key (DEK) for data larger than 1 MB.
- B — Security Groups are stateful; NACLs are stateless.
- B — Multi-AZ provides automatic failover to a standby in a different AZ.
- B — DAX is an in-memory cache for DynamoDB providing microsecond read performance.
- B — S3 Intelligent-Tiering automatically moves data between tiers based on changing access patterns.
- B — IAM roles grant temporary security credentials for trusted entities to access AWS resources.
- A — SQS decouples microservices by providing asynchronous message queuing.
- C — Lambda has a maximum execution timeout of 15 minutes (900 seconds).
- D — Backup & Restore is the cheapest DR strategy with the highest RTO (hours).
- A — Control Tower provides automated landing zone setup and guardrails for multi-account governance.
- B — MSK is a fully managed Apache Kafka service on AWS.
- B — Fargate removes the need to manage underlying EC2 instances (serverless containers).
- B — Geolocation routing routes traffic based on the geographic location of DNS queries.
- A — The Sustainability pillar focuses on minimizing environmental impact and energy consumption.
📚 Additional Resources#
Next → Real-World Scenarios