On October 20th, 2025, a disruption in a single AWS service rippled through the backbone of the internet. What started as elevated error rates in Amazon’s DynamoDB service in the US-EAST-1 region quickly escalated into a full-blown outage across dozens of AWS products. For anyone who relies on Amazon Web Services, and that’s nearly everyone, this was a stark reminder of just how fragile cloud infrastructure can be.
The failure wasn’t flashy. There was no cyberattack or major hardware meltdown. It began when DNS queries to DynamoDB’s API endpoint started timing out. That alone was enough to grind AWS’s control plane to a halt. Dozens of internal AWS services depend on DynamoDB to function—not just for storing application data, but for managing metadata, service states, and configurations behind the scenes. When clients couldn’t resolve the DynamoDB address, chaos followed.
Just like in the 2015 DynamoDB incident, retries kicked in. Applications and internal AWS systems kept trying to reach the broken endpoint. Each attempt consumed more network and compute resources, which triggered more failures, which triggered more retries. It became a runaway loop that choked off access across AWS.
Over 36 AWS services were affected. These included some of the most widely used ones, from Lambda and API Gateway to S3, CloudWatch, and CloudFormation. That, in turn, broke functionality across a wide swath of the public internet. Venmo users couldn’t send or receive money. Delta’s app stopped showing flight details. McDonald’s mobile ordering system failed in parts of the U.S. and Australia. Ticketmaster users ran into blank pages. Even Amazon’s own Alexa services briefly glitched out. The outage exposed just how much modern life runs on this one cloud provider.
AWS engineers didn’t wait for a perfect fix. Instead, they launched multiple recovery efforts in parallel. The key was restoring DNS resolution for DynamoDB. This likely involved restarting or reconfiguring DNS resolvers, redirecting queries through alternate paths, and adjusting caching behavior in real time. The priority wasn’t elegance. It was speed.
Incidents like this expose how interconnected everything has become. A failure in one region, inside one service, can cripple the broader internet within minutes. DynamoDB isn’t just a database. It’s a linchpin buried deep inside the automation that runs AWS. And when it breaks, there’s nowhere to hide.
If you’re building on AWS, now’s a good time to revisit your fault tolerance assumptions. High availability isn’t the same thing as resilience when your dependencies include core control systems. And if your app needs DNS to reach a service that’s also experiencing failure, your fallback logic might be useless.
The cloud is powerful, scalable, and everywhere. It’s also held together by fewer threads than most people think.