
Search systems collapse when index freshness is coupled to the write path or when teams mistake the search cluster for the source of truth. At scale, you need safe async indexing and reliable rebuild paths. An e-commerce or marketplace platform must serve rich keyword and faceted search over millions of products, keep results fresh despite frequent catalog updates, and preserve a source of truth that search cluster issues cannot corrupt.
TL;DR: Persist canonical catalog data first, publish product change events through EventBridge, queue indexing in SQS, and build OpenSearch as a derived serving layer with Redis caching in front of hot queries.
Why Naive Solutions Break
Using the search index as the primary database leads to inconsistency, painful reindexing, and brittle write paths. Synchronously indexing every catalog update inside product write APIs also turns catalog mutations into latency spikes and operational outages during index contention.
Architecture Overview
Store canonical catalog data in DynamoDB or Aurora depending on the domain, publish change events through EventBridge, buffer indexing jobs in SQS, build denormalized search documents in Lambda or ECS workers, and serve query traffic from OpenSearch behind CloudFront and API Gateway.
Architecture Diagram

Service-by-Service Breakdown
API Gateway: Public search endpoint with auth, throttling, and request normalization.Lambda or ECS: Query-serving layer for ranking, filtering logic, and fallback behavior.DynamoDB: Canonical product metadata store for flexible catalog entities and rapid point reads.EventBridge: Product lifecycle event bus forProductUpdated,PriceChanged, andInventoryChanged.SQS: Indexing backlog buffer that protects catalog writes from OpenSearch incidents.Lambda or ECS Workers: Build denormalized search documents and push bulk updates to OpenSearch.OpenSearch: Full-text search, faceting, autocomplete, and ranking features.ElastiCache Redis: Query result cache for hot search pages or autocomplete prefixes.S3: Snapshot storage, offline reindex inputs, and export data.CloudWatch and X-Ray: Index lag, query latency, bulk-failure metrics, and traceability.
Request Lifecycle and Data Flow
- Clients send search queries through CloudFront and API Gateway.
- The search API checks Redis for hot cached responses.
- On a miss, the API queries OpenSearch for relevant results and enriches point details from DynamoDB if needed.
- Catalog updates write only to the canonical store first.
- Change events flow through EventBridge into SQS.
- Indexing workers build denormalized search documents and bulk-update OpenSearch asynchronously.
- Reindex jobs can be rebuilt from canonical data and S3 snapshots if the index becomes inconsistent.
Production Code Patterns
Bulk indexing worker against OpenSearch
from opensearchpy import OpenSearch, helpers
client = OpenSearch(hosts=[{'host': os.environ['OS_HOST'], 'port': 443}], use_ssl=True)
def index_batch(documents):
actions = [
{"_index": "catalog-v3", "_id": doc["productId"], "_source": doc}
for doc in documents
]
helpers.bulk(client, actions, chunk_size=500, request_timeout=30)
EventBridge rule for product-change indexing
resource "aws_cloudwatch_event_rule" "catalog_changes" {
name = "catalog-product-updated"
event_pattern = jsonencode({
source = ["catalog.service"],
detail-type = ["ProductUpdated", "InventoryChanged", "PriceChanged"]
})
}
Scaling Strategy
- Scale query-serving nodes separately from indexing workers.
- Bulk index through SQS to smooth write bursts.
- Use OpenSearch shard sizing based on index volume and query concurrency, not default settings.
- Keep large product documents minimal; denormalize only search-relevant fields.
- Cache hot queries and autocomplete aggressively.
Cost Optimization Techniques
- Only index fields that matter for ranking or filtering.
- Use DynamoDB point reads for non-search metadata instead of bloating the search document.
- Snapshot and restore from S3 rather than overprovisioning for rare rebuilds.
- Expire low-value query caches quickly and measure hit rate versus memory cost.
Security Best Practices
- Keep OpenSearch in a VPC and front it with an application layer rather than direct public access.
- Restrict indexing and query roles separately.
- Encrypt data at rest and in transit.
- Audit admin reindex and mapping-change operations carefully.
Failure Handling and Resilience
- Never make OpenSearch the only copy of product data.
- Let SQS absorb indexing outages while catalog writes continue.
- Use bulk retry with backoff and failed-document quarantine.
- Degrade gracefully to cached results or top-sellers if the search cluster is impaired.
- Snapshot indexes regularly to S3.
Trade-offs and Alternatives
OpenSearch is powerful for text and facets, but it adds operational tuning around shards, mappings, and reindexing. If search needs are simple, DynamoDB plus prefix or exact-match indexes may be enough. At scale, dedicated search infrastructure becomes worthwhile.
Real-World Use Case
An Amazon-style product catalog with fast-moving pricing, inventory, and seller attributes fits this architecture well.
Key Interview Insights
- Stress that search is a derived view, not the source of truth.
- Explain asynchronous indexing and freshness trade-offs.
- Mention shard sizing, bulk ingestion, and denormalized documents.
- Discuss graceful degradation when search fails but commerce must continue.
Recommended resources
Recommended Reading
→ Designing Data-Intensive Applications — The essential book for understanding distributed systems, databases, and the infrastructure behind architectures like these.
→ System Design Interview Vol. 2 — Covers many of the architectures in this post in interview format with trade-off analysis.
Affiliate links. We earn a small commission at no extra cost to you.
Discover more from CheatCoders
Subscribe to get the latest posts sent to your email.
