Most developers use CloudWatch Logs like a text file. CloudWatch Logs Insights transforms your logs into a queryable database. These queries find production bugs that manual log searching misses entirely.
⚡ TL;DR: CloudWatch Insights uses SQL-like syntax on your log groups.
filter,stats,sort, andparsehandle 90% of production debugging. Combine them for database-quality analytics on raw logs.
The 4 Core Commands
# The 4 commands that handle everything:
filter @message like /ERROR/ # Select matching lines
stats count() as errors by bin(5m) # Aggregate data
sort @timestamp desc # Order results
parse @message "duration: * ms" as ms # Extract fields
# Chain them:
filter @type = "REPORT"
| stats avg(@duration) as avg, percentile(@duration, 99) as p99
by bin(1h)
| sort avg desc
Query 1: Find All Lambda Cold Starts
filter @type = "REPORT"
| filter ispresent(@initDuration)
| stats count() as coldStarts,
avg(@initDuration) as avgInitMs,
max(@initDuration) as maxInitMs,
percentile(@initDuration, 95) as p95InitMs
| sort coldStarts desc
# @initDuration only appears on cold start invocations
Query 2: API Latency Percentiles Over Time
filter @type = "REPORT"
| stats percentile(@duration, 50) as p50,
percentile(@duration, 95) as p95,
percentile(@duration, 99) as p99,
count() as requests
by bin(5m)
| sort @timestamp asc
# Use p99, not avg — averages hide tail latency problems
Query 3: Lambda Memory Utilization
filter @type = "REPORT"
| parse @message "Memory Size: * MB" as memorySize
| parse @message "Max Memory Used: * MB" as memoryUsed
| stats max(memoryUsed) as maxUsedMB,
max(memorySize) as allocatedMB
| extend utilizationPct = maxUsedMB / allocatedMB * 100
| filter utilizationPct > 80
# > 80% = OOM risk. Increase memory allocation.
Query 4: Error Rate by Time Window
filter @type = "REPORT"
| stats count(@duration > 3000) as slowRequests,
count() as total
by bin(5m)
| extend slowRate = slowRequests / total * 100
| sort @timestamp asc
# Shows exactly when your API slowed down
Query 5: Top Endpoints by Error Rate
parse @message "[*] * * * *" as ts, method, path, status, latency
| stats count(status >= 400) as errors,
count() as total,
avg(latency) as avgLatency
by path, method
| extend errorRate = errors / total * 100
| filter total > 10
| sort errorRate desc
Query 6: Detect Memory Leaks Over Time
filter @type = "REPORT"
| parse @message "Max Memory Used: * MB" as memUsed
| stats max(memUsed) as maxMem,
min(memUsed) as minMem,
count() as invocations
by @logStream
| filter invocations > 10
| extend memGrowth = maxMem - minMem
| filter memGrowth > 50
| sort memGrowth desc
# Same container showing 50MB+ growth = likely memory leak
Query 7: Cost Analysis by Function
filter @type = "REPORT"
| parse @message "Billed Duration: * ms" as billedMs
| parse @message "Memory Size: * MB" as memMB
| extend gbSeconds = (billedMs / 1000) * (memMB / 1024)
| extend costUSD = gbSeconds * 0.0000166667
| stats sum(costUSD) as totalCostUSD,
count() as invocations
by bin(1d)
| sort totalCostUSD desc
# Find your most expensive time windows
CloudWatch Insights Cheat Sheet
- ✅ Use
bin(5m)for time-series — shows spikes clearly - ✅ Use
percentile(@duration, 99)not avg — averages hide tail latency - ✅ Use
ispresent(@initDuration)to filter cold starts specifically - ✅ Use
parseto extract fields from unstructured messages - ✅ Save frequent queries as CloudWatch saved queries
- ❌ Never query without a time range — scans cost money
- ❌ Switch to structured JSON logging for production — much faster to query
These queries pair directly with the Lambda cold start optimization guide — use Query 1 to measure cold starts before and after applying fixes. For DynamoDB-backed functions, the AWS security guide shows how to log presigned URL generation events. Official reference: CloudWatch Insights query syntax.
Master AWS monitoring and observability
→ View Course on Udemy — Hands-on video course covering every concept in this post and more.
Sponsored link. We may earn a commission at no extra cost to you.
Discover more from CheatCoders
Subscribe to get the latest posts sent to your email.

Pingback: Step Functions Express Workflows: Orchestrate Lambda at 100K Events Per Second - CheatCoders