When the internet went dark, our tools helped us stay ahead.
On October 20, 2025, an Amazon Web Services (AWS) DNS resolution failure in the US-EAST-1 region caused a widespread outage that rippled across industries and continents. For several hours, applications and services that millions depend on, from gaming platforms to global banks, experienced slowdowns or downtime.
Within minutes, our teams identified affected customers, communicated transparently, and provided leadership with clear, data-backed insight. It was not just a test of technology; it was proof that resilience is measurable, repeatable, and real.
When Infrastructure Fails, Resilience Must Prevail
Although AWS restored its infrastructure within hours, the downstream business impacts were far slower to recover. For many organizations, the effects cascaded for days or even weeks as critical systems were brought back online, backlogs were cleared, and customer commitments were restored. The incident exposed the systemic dependency modern enterprises have on a small number of critical infrastructure providers, and how a single event can trigger systemic risk across global operations, brand reputation, and revenue long after the immediate outage ends.
At Fusion, we provide resilience software to some of the world’s most critical organizations, where every minute of downtime carries real consequences. Our own response teams operate under the same pressure, expected to move with the speed and accuracy our customers demand. During the incident, they leveraged the Incident Commander Agent within Fusion Intelligence to transform what could have been a time-consuming data hunt into a clear, coordinated response within minutes.
Seeing the Outage Through a Resilience Lens
For resilience and risk leaders, events like this reveal a fundamental truth: disruption challenges not only technology but also visibility and coordination. Cloud concentration risk has become one of the defining exposures of modern business. The AWS incident showed how dependent entire ecosystems are on a few critical nodes. When those nodes fail, customer trust and operational stability are immediately at risk.
In moments like these, waiting even ten minutes to identify affected customers or systems can amplify the impact. The organizations that recover fastest are the ones that can move from question to insight quickly and act with confidence.
Fusion Intelligence in Action
The Incident Commander Agent, powered by Fusion Intelligence, gave our teams rapid access to all operational and resiliency data within a single system, eliminating delays and uncertainty when decisions mattered most. Because we operate from the same unified platform that we coach our customers to build, our leaders had the clarity and confidence to act decisively under pressure.
During the AWS outage, our response team used the Incident Commander Agent to:
- Analyze impact by identifying which customers were connected to affected instances and receiving results in minutes.
- Visualize risk exposure through data-driven charts that revealed where customer concentration was highest.
- Communicate clearly by generating targeted messages to impacted stakeholders, reviewed and approved directly within the agent before sending.
- Coordinate faster by linking affected customer accounts with open support cases to streamline triage.
- Log and report automatically, creating a ready-to-share summary for executive leadership without switching tools.
What once required multiple dashboards, spreadsheets, and cross-team calls became a single conversation that delivered actionable insight in record time.