Blog

Blog

Topics Discussed

01.

Fundamentals

Exploring the basic principles and practices that underpin effective SRE strategies.

02.

Tools

Reviewing the latest SRE tools that enhance system reliability and efficiency.

03.

Practices

Discussing best practices for implementing SRE within diverse environments.

04.

Challenges

Addressing common challenges and pitfalls in SRE and how to overcome them.

05.

Metrics

Analyzing key performance indicators and metrics crucial for SRE success.

06.

Culture

Building a supportive culture that fosters growth and innovation in SRE.

07.

Case Studies

Learning from real-world case studies of SRE implementations and their outcomes.

08.

Trends

Examining emerging trends and future directions in Site Reliability Engineering.

Most recent ...

From Ai driven increase of change to Safe Velocity, guided by SRE

AI Doesn’t Break Software Engineering. It Moves the Constraints.

Why reliability, invariants, and guardrails matter more as AI accelerates code production. Introduction AI has already changed the rate at which software can be produced. ...
Read More →
Picture with titles of the series and anecdote surrounding an emergency centers dispatch terminal showing "no signal" on their displays

Ep.05 – No One Can Reach Dispatch

Inspired by the 2021 Meta outage – The day communication itself became the incident It started like any other morning at the city’s emergency coordination ...
Read More →
Picture shows a robotic arm trying to attach a bolt

Ep.02 – A stalled assembly line

Inspired by the 2024 CrowdStrike Channel File 291 incident – Imagine a precision-parts factory, where every process is automated. Each morning, headquarters sends a digital ...
Read More →
A picture showing a maintenance worker for an appartment building trying to fix water problems.

Ep.01 – The Water Is Out Again

Inspired by 3 related and unrelated Anthropic incidents in 2026 – Imagine you manage a large apartment building. Dozens of floors, hundreds of tenants, and ...
Read More →
Hotel guest trying to get into the lobby but one of 3 doors is not working. Confusion occurs as folks navigate to the remaining 2 entrances.

Ep.03 – The lobby doors are jammed

Inspired by the Oct 2026 Azure service disruption – Imagine you manage a large hotel. Your main revolving doors — the high-traffic entryways that connect ...
Read More →
two hospital workers standing next to an empty contact list

Ep.04 – A hospital contact list gone missing

Inspired by the October 2026 AWS us-east-1 outage – Imagine a large hospital that updates its emergency contact list every hour. Suddenly, nobody can call ...
Read More →