Netflix streams to 250 million subscribers simultaneously across 190 countries. Behind every video play is a sophisticated event-driven architecture that handles ingestion, transcoding, DRM, metadata management, and CDN delivery — all at a scale that makes most engineering challenges look trivial. This post reverse-engineers that architecture using AWS-native services, explaining every design decision a senior engineer or system design candidate needs to know.
Problem Statement
A video streaming platform at scale must solve: ingesting raw video files (some over 100GB), transcoding into 20+ format/quality combinations per title, storing and serving metadata for millions of titles, delivering video globally with under 200ms start time, personalising recommendations in real time, and handling 50,000+ concurrent uploads during peak content release windows — all while maintaining 99.99% availability.
Why Naive Solutions Fail
- Single-region monolith: One AZ failure takes the entire platform down. Latency to non-US users is 300–500ms, killing start-time SLAs.
- Synchronous transcoding: A 100GB 4K file takes 8 hours to transcode. A synchronous API call times out after 30 seconds. Transcoding must be fully async.
- Direct S3 streaming: S3 is not a CDN. Serving video directly from S3 creates hot partitions, costs 10x more in egress, and gives poor latency outside AWS regions.
- Single DynamoDB table for everything: Video metadata, user sessions, playback state, and recommendations have completely different access patterns. Mixed tables create hot partitions.
Architecture Overview
┌─────────────────────────────────────────────────────────────────────┐
│ CONTENT INGESTION PLANE │
│ │
│ Studio Upload ──► S3 Raw Bucket ──► EventBridge ──► SQS │
│ │ ▼ │
│ │ ECS Transcoder │
│ │ Cluster (Spot) │
│ ▼ │ │
│ S3 CDN Bucket ◄───────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ DELIVERY PLANE │
│ │
│ CloudFront (450+ PoPs) ──► Lambda@Edge ──► S3 CDN Bucket │
│ │ │
│ ▼ │
│ API Gateway ──► Lambda ──► DynamoDB (Metadata) │
│ │ │
│ ▼ │
│ ElastiCache ──► OpenSearch (Search) │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ ANALYTICS & ML PLANE │
│ │
│ Kinesis Data Streams ──► Kinesis Firehose ──► S3 Data Lake │
│ │ │ │
│ ▼ ▼ │
│ Lambda (Real-time) Glue + Athena │
│ │ │ │
│ ▼ ▼ │
│ DynamoDB (Watch State) SageMaker (Recommendations)│
└─────────────────────────────────────────────────────────────────────┘
Detailed Architecture Breakdown
1. Content Ingestion — S3 + EventBridge + ECS
Studios upload raw video via a presigned S3 URL to a dedicated ingestion bucket in us-east-1. S3 Event Notifications trigger EventBridge, which routes the event to an SQS queue. An ECS cluster running on Spot Instances pulls from the queue and runs FFmpeg-based transcoding jobs. Using Spot saves 70% over On-Demand — transcoding jobs are retryable so Spot interruptions are handled gracefully via SQS message visibility timeout.
# EventBridge rule — triggers on S3 PutObject to raw bucket
{
"source": ["aws.s3"],
"detail-type": ["Object Created"],
"detail": {
"bucket": { "name": ["netflix-raw-ingestion-prod"] },
"object": { "key": [{"suffix": ".mp4"}, {"suffix": ".mov"}, {"suffix": ".mxf"}] }
}
}
# ECS Task Definition — transcoder
{
"family": "video-transcoder",
"cpu": "4096", # 4 vCPU — FFmpeg is CPU-intensive
"memory": "16384", # 16GB — for 4K frame buffers
"requiresCompatibilities": ["FARGATE_SPOT"],
"containerDefinitions": [{
"name": "transcoder",
"image": "netflix-transcoder:latest",
"environment": [
{"name": "OUTPUT_PROFILES", "value": "360p,480p,720p,1080p,4K"},
{"name": "OUTPUT_BUCKET", "value": "netflix-cdn-content-prod"}
]
}]
}
2. Transcoding Pipeline — Step Functions Orchestration
Each video goes through a Step Functions workflow: validate format → extract metadata → fan out to parallel transcoding jobs (one per output profile) → generate thumbnails → create DRM manifests → update DynamoDB metadata → invalidate CloudFront cache. The Parallel state in Step Functions handles the fan-out — all 20 output profiles transcode simultaneously, reducing total time from sequential hours to the duration of the slowest single profile.
# Step Functions state machine (excerpt)
{
"Comment": "Video processing pipeline",
"StartAt": "ValidateInput",
"States": {
"ValidateInput": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:...:function:validate-video",
"Next": "ExtractMetadata"
},
"TranscodeParallel": {
"Type": "Parallel",
"Branches": [
{"StartAt": "Transcode360p", "States": {"Transcode360p": {"Type": "Task", "Resource": "arn:aws:ecs:...", "End": true}}},
{"StartAt": "Transcode1080p", "States": {"Transcode1080p": {"Type": "Task", "Resource": "arn:aws:ecs:...", "End": true}}},
{"StartAt": "Transcode4K", "States": {"Transcode4K": {"Type": "Task", "Resource": "arn:aws:ecs:...", "End": true}}}
],
"Next": "UpdateMetadata"
},
"UpdateMetadata": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:updateItem",
"Parameters": {
"TableName": "video-metadata",
"Key": {"videoId": {"S.$": "$.videoId"}},
"UpdateExpression": "SET #status = :ready",
"ExpressionAttributeValues": {":ready": {"S": "READY"}}
},
"Next": "InvalidateCDN"
}
}
}
3. Metadata Storage — DynamoDB Single-Table Design
Video metadata lives in a single DynamoDB table with carefully overloaded keys. Video entity, episode list, season entity, and availability windows all share the table using the prefix pattern. DynamoDB Streams propagate changes to ElastiCache for hot metadata and to OpenSearch for full-text search indexing. DAX sits in front of DynamoDB for sub-millisecond reads on the most popular titles.
# DynamoDB access patterns:
# PK: VIDEO#tt1234567 SK: METADATA → video entity
# PK: VIDEO#tt1234567 SK: SEASON#1 → season metadata
# PK: VIDEO#tt1234567 SK: EPISODE#S01E01 → episode entity
# PK: USER#u789 SK: WATCHLIST#tt1234 → watchlist item
# PK: USER#u789 SK: HISTORY#2026-04 → watch history
# GSI1: GSI1PK = GENRE#thriller, GSI1SK = RATING#8.5 → browse by genre
# GSI2: GSI2PK = COUNTRY#IN, GSI2SK = RELEASE#2026 → regional availability
4. Global Delivery — CloudFront + Lambda@Edge
CloudFront with 450+ Points of Presence serves all video content. Lambda@Edge functions run at every PoP to: verify JWT tokens (auth at edge = no origin round-trip for auth), rewrite URLs based on device capability (serve AV1 to supported devices, H.264 to others), inject DRM headers, and perform A/B testing of different CDN configurations. Origin Shield sits between CloudFront and S3 to absorb cache misses at a single regional point, reducing S3 GET requests by 80%.
// Lambda@Edge — viewer request (runs at every PoP)
exports.handler = async (event) => {
const request = event.Records[0].cf.request;
const headers = request.headers;
// 1. Verify JWT at edge — no origin round trip
const token = headers['authorization']?.[0]?.value;
if (!token || !verifyJWT(token)) {
return { status: '401', body: 'Unauthorized' };
}
// 2. Device-based format negotiation
const ua = headers['user-agent']?.[0]?.value || '';
const supportsAV1 = ua.includes('Chrome/9') || ua.includes('Firefox/1');
if (supportsAV1 && request.uri.includes('/hls/')) {
request.uri = request.uri.replace('/hls/', '/av1/');
}
// 3. Geo-based content filtering
const country = request.headers['cloudfront-viewer-country']?.[0]?.value;
request.headers['x-viewer-country'] = [{ key: 'X-Viewer-Country', value: country }];
return request;
};
5. Real-Time Analytics — Kinesis + Lambda + DynamoDB
Every play, pause, seek, and quality switch event is written to Kinesis Data Streams (100 shards for 100K events/second). Two Lambda consumers process the stream: one updates real-time watch state in DynamoDB (for resume playback), another aggregates viewing data for CloudWatch metrics. Kinesis Firehose delivers a copy to S3 in Parquet format. Glue crawls S3 hourly, Athena queries run for batch analytics, and SageMaker re-trains recommendation models nightly.
Data Flow: Complete Request Lifecycle
- User opens app → CloudFront edge PoP serves cached homepage HTML/JS (0ms latency)
- App requests title list → API Gateway → Lambda → ElastiCache (cache hit: 1ms) or DynamoDB + DAX (cache miss: 5ms)
- User clicks play → API Gateway → Lambda validates entitlement → DynamoDB → returns signed CloudFront URL with 1hr expiry
- Video manifest fetch → CloudFront → Lambda@Edge validates JWT → S3 CDN bucket (HLS/DASH manifest)
- Video segments → CloudFront PoP (cache hit: <10ms) or Origin Shield → S3 CDN bucket (cache miss: 50–100ms)
- Playback events → Kinesis Producer Library → Kinesis Data Streams → Lambda → DynamoDB (watch state)
- Session end → Kinesis → Firehose → S3 → Glue ETL → Athena analytics
Scaling Strategy
- ECS Transcoding Cluster: Target Tracking Scaling on SQS queue depth — scale out when queue > 10 messages per task, scale in after 15-min cooldown. Fargate Spot for 70% cost saving.
- Kinesis Shards: Auto-scale with Enhanced Fan-Out. Each shard handles 1,000 records/sec. At peak: 100 shards = 100K events/sec. Shard splitting triggered when
GetRecords.IteratorAgeMillisecondsexceeds 60 seconds. - DynamoDB: On-Demand billing mode for unpredictable traffic. DAX cluster (3 nodes across 3 AZs) absorbs read spikes. Global Tables replicate to
eu-west-1andap-southeast-1for low-latency regional reads. - Lambda: Provisioned Concurrency on the title metadata function (100 units) eliminates cold starts for the hottest API path. Reserved Concurrency on streaming auth function (500) prevents it from being throttled by other functions.
- CloudFront: No scaling needed — it is a global network. Cache hit ratio target: 95%+. Cache TTL: 24 hours for video segments, 5 minutes for metadata, 0 for auth responses.
Cost Optimization Techniques
- S3 Intelligent-Tiering: New content in Standard tier. After 30 days without access, auto-transition to Infrequent Access. After 90 days, Glacier Instant Retrieval. Saves 60% on storage for catalog tail.
- Spot Instances for Transcoding: ECS Fargate Spot is 70% cheaper than On-Demand. Transcoding jobs are idempotent — Spot interruptions handled by re-queuing to SQS.
- CloudFront Origin Shield: Consolidates cache misses to one region, reducing S3 GET requests by 80%. S3 GET costs $0.0004/10K — at Netflix scale this saves millions per month.
- Kinesis Enhanced Fan-Out selectively: Only the real-time watch state Lambda uses Enhanced Fan-Out ($0.015/shard-hour). Batch analytics consumers use standard GetRecords (free).
- DynamoDB On-Demand vs Provisioned: Use On-Demand for unpredictable workloads (new title launches). Switch to Provisioned + Auto Scaling for stable baseline traffic (20–30% savings).
Security Best Practices
- IAM: Lambda execution roles use least-privilege — transcode function can only write to CDN bucket, never read raw bucket credentials. Cross-account roles for multi-account setup (separate accounts for prod/staging/dev).
- VPC: All Lambda functions are inside VPC. DynamoDB, S3, Kinesis accessed via VPC Endpoints (no internet egress). ElastiCache in private subnets only.
- Encryption: S3 server-side encryption with KMS CMKs. DynamoDB encrypted at rest with AWS-managed keys. Kinesis streams encrypted. TLS 1.3 minimum enforced on CloudFront.
- DRM: Widevine + FairPlay + PlayReady via AWS Elemental MediaConvert. Content keys stored in AWS Secrets Manager, rotated every 24 hours.
- Presigned URLs: Video access via short-lived CloudFront signed URLs (1-hour expiry). URL contains user ID embedded — Lambda@Edge validates entitlement at edge.
Failure Handling and Resilience
- Transcoding failures: SQS dead-letter queue after 3 retries. Failed jobs trigger SNS alert → PagerDuty. Step Functions execution history retained 90 days for debugging.
- DynamoDB unavailability: Global Tables automatic failover. DynamoDB Accelerator (DAX) caches last-known metadata for up to 5 minutes during table degradation.
- Kinesis shard failures: Enhanced Fan-Out consumers automatically re-read from last checkpoint. Lambda retry with exponential backoff. Watch state updates are idempotent (last-write-wins).
- CDN origin failures: CloudFront origin failover — primary origin is S3 CDN bucket, secondary is a separate region S3 bucket. Failover triggers on 5xx responses or connection timeout.
- Multi-region: Route53 health-check-based failover. Primary region
us-east-1, failover toeu-west-1. DynamoDB Global Tables ensure both regions have consistent data within 1 second.
Trade-offs and Alternatives
- ECS vs EKS for transcoding: ECS simpler to operate for a single workload. EKS chosen when transcoding shares cluster with other services — better bin-packing and GPU scheduling for ML workloads.
- DynamoDB vs Aurora for metadata: DynamoDB chosen for horizontal scalability and Global Tables. Aurora chosen for complex queries (genre browsing with multiple filters). In practice: DynamoDB for hot access paths, Aurora for content management/search.
- Kinesis vs SQS for event streaming: Kinesis for ordered, replayable event streams (playback analytics). SQS for decoupled job queues (transcoding). Both are used — different access patterns justify different tools.
- Lambda@Edge vs CloudFront Functions: Lambda@Edge for complex auth/personalization logic (runs on regional edge). CloudFront Functions for simple URL rewrites (runs at every PoP, sub-millisecond).
Real-World Parallels
Netflix uses a mix of AWS and their own Open Connect CDN. Disney+ runs almost entirely on AWS with a similar CloudFront + ECS + DynamoDB architecture. Twitch (Amazon-owned) uses Kinesis Video Streams instead of S3 raw ingest — a variant for live streaming vs VOD. The pattern described here maps directly to any large-scale VOD platform and is a canonical system design answer for video streaming platforms at FAANG interviews.
Key Interview Insights
- Always start with scale numbers: How many concurrent users? How many uploads per day? These determine whether you need Kinesis vs SQS, DynamoDB vs RDS, single-region vs multi-region.
- Async everything that can wait: Transcoding is inherently async. Notification delivery is async. Analytics is async. Synchronous paths should be limited to: auth, entitlement check, manifest fetch.
- CDN cache ratio is the most important metric: A 95% cache hit ratio means 95% of video requests never reach your origin. Optimizing cache TTL and key strategy has more impact than any origin scaling.
- DynamoDB key design determines scalability: Hot partition keys (using videoId as PK for all access patterns) will throttle under load. Design keys around access patterns, not entity relationships.
- Cost is an architecture concern: Egress costs from S3 without CDN at Netflix scale = hundreds of millions per year. CloudFront + Origin Shield is not optional — it is an architectural requirement.
- Event-driven reduces coupling: EventBridge between S3 ingestion and transcoding means you can add new consumers (DRM generation, thumbnail creation, ML tagging) without changing the ingestion service.
This architecture builds directly on the Lambda cold start optimization guide — the metadata Lambda functions use Provisioned Concurrency for the exact reasons covered there. For the DynamoDB key design patterns shown here, the DynamoDB single-table design deep dive covers the theory behind the access pattern-first approach. Official reference: AWS Architecture Blog.
Recommended Reading
→ Designing Data-Intensive Applications — The essential book for understanding distributed systems, databases, and the infrastructure behind architectures like these.
→ System Design Interview Vol. 2 — Covers many of the architectures in this post in interview format with trade-off analysis.
Affiliate links. We earn a small commission at no extra cost to you.
Discover more from CheatCoders
Subscribe to get the latest posts sent to your email.
