
Why this series exists
Every time there’s a major technology outage — AWS, Azure, Cloudflare, or even your favourite streaming service — we’re flooded with two kinds of reactions:
- Frustration from users, who just want to know why something isn’t working.
- Jargon-heavy explanations from engineers and companies, full of acronyms and abstractions that mean little outside their circles.
Neither side is wrong. But somewhere between the outage map and the incident report, the human story gets lost.
That’s where this series comes in.
Technical difficulties? Let’s speak plainly…
… is about translating the complex, often intimidating language of incident retrospectives into something we can all understand — and learn from. Each article takes a real-world service disruption and reframes it as an everyday story anyone can relate to.
Who this is for
| Audience | What they get out of it |
|---|---|
| Non-technical readers | Clear, relatable explanations for events that shape the digital world they depend on daily. |
| Technical practitioners | A way to communicate impact and causality to leadership, customers, and peers without losing clarity or credibility. |
| Leaders and executives | Context for why these events happen — without oversimplification or blame — and insight into what reliability and resilience really looks like. |
| SREs and engineers | A reminder that every outage is more than a technical failure — it’s an opportunity to tell a better story about systems, people, opportunities and limitations. |
Why this matters
Technology runs everything — from hospitals to grocery stores to government services — but when it falters, explanations often sound alien.
By using everyday analogies — a hospital contact list gone missing, a hotel lobby where no one’s keycard works — this series makes reliability a shared language.
Because when we all understand failure better, we can build and respond better, too.
What to expect
Each piece will:
- Break down a real-world outage or service disruption.
- Reimagine it as a human-scale story everyone can relate to.
- End with a lesson — not about blame, but about clarity, resilience, and communication.
One guiding principle
The better we can explain failure,
the faster we can learn from it —
and the more trust we can build in what we create next.
✳️ Recent blog post in this series:
- Ep.05 – No One Can Reach DispatchInspired by the 2021 Meta outage – The day communication itself became the incident It started like… Read more: Ep.05 – No One Can Reach Dispatch
- Ep.04 – A hospital contact list gone missingInspired by the October 2026 AWS us-east-1 outage – Imagine a large hospital that updates its emergency… Read more: Ep.04 – A hospital contact list gone missing
- Ep.03 – The lobby doors are jammedInspired by the Oct 2026 Azure service disruption – Imagine you manage a large hotel. Your main… Read more: Ep.03 – The lobby doors are jammed
- Ep.02 – A stalled assembly lineInspired by the 2024 CrowdStrike Channel File 291 incident – Imagine a precision-parts factory, where every process… Read more: Ep.02 – A stalled assembly line
- Ep.01 – The Water Is Out AgainInspired by 3 related and unrelated Anthropic incidents in 2026 – Imagine you manage a large apartment… Read more: Ep.01 – The Water Is Out Again





