Back to posts

SRE — Humans and Organizations

Read the full guide on docs.beyondyou.my.id
srecultureorganizationteam-structuredevopsreliability

Technology is the easy part of SRE. The hard part is people and organizations — how do you structure teams, build psychological safety for incident response, and navigate the tension between feature velocity and reliability? SRE is fundamentally a cultural practice, and without organizational buy-in, even the best tooling won’t make your services reliable.

Key Takeaways

  • SRE teams can be structured as: embedded (SREs embedded in dev teams), consulting (central SRE team that consults), or platform (SRE builds self-service platform)
  • Blameless postmortems: Focus on system improvements, not individual blame — this requires psychological safety
  • SRE teams should have a clear charter: what services they support, what their SLOs are, and how handoffs work
  • On-call should be sustainable: compensated, limited in frequency, and with adequate tooling and runbooks
  • SRE is not a silo — it requires deep collaboration between development, operations, and product teams

Quick Overview

Google’s SRE book describes three team models: Kitchen Sink (one SRE team handles everything — doesn’t scale), Infrastructure (SRE focuses on platform reliability), and Product/Embedded (SREs embedded within product teams). Most organizations evolve through these models as they scale — starting with a small central SRE team that gradually embeds into product teams.

The cultural foundations — blameless postmortems, psychological safety, on-call sustainability, and shared ownership — are harder to establish than any technical practice. SRE leaders must model these behaviors and protect their teams from burnout and blame cultures.


Read the full guide: SRE — Humans and Organizations → — includes team topology patterns, incident response frameworks, and organizational maturity models.