4. Postmortems | Real-World SRE

Book Overview & Buying
Table Of Contents
Feedback & Rating

Real-World SRE

By : Nat Welch

4.5 (10)

Buy this Book

Real-World SRE

4.5 (10)

By: Nat Welch

Buy this Book

Overview of this book

Real-World SRE is the go-to survival guide for the software developer in the middle of catastrophic website failure. Site Reliability Engineering (SRE) has emerged on the frontline as businesses strive to maximize uptime. This book is a step-by-step framework to follow when your website is down and the countdown is on to fix it. Nat Welch has battle-hardened experience in reliability engineering at some of the biggest outage-sensitive companies on the internet. Arm yourself with his tried-and-tested methods for monitoring modern web services, setting up alerts, and evaluating your incident response. Real-World SRE goes beyond just reacting to disaster—uncover the tools and strategies needed to safely test and release software, plan for long-term growth, and foresee future bottlenecks. Real-World SRE gives you the capability to set up your own robust plan of action to see you through a company-wide website crisis. The final chapter of Real-World SRE is dedicated to acing SRE interviews, either in getting a first job or a valued promotion.

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Free Chapter

1. Introduction

A brief history

What is SRE?

What is in the book?

SRE as a framework for new projects

Summary

References

2. Monitoring

Why monitoring?

Instrumenting an application

Collecting and saving monitoring data

Displaying monitoring information

Managing and maintaining monitoring data

Communicating about monitoring

References and related reading

Summary

3. Incident Response

What is an incident?

What is incident response?

Alerting

Being on call

Communication

Recovering the system

Calling all clear

Summary

4. Postmortems

What is a postmortem?

Why write a postmortem?

When to write a postmortem document

Carrying out incident analysis

How to write a postmortem document

Blameless postmortems

Holding a postmortem meeting

Analyzing past postmortems

Summary

References

5. Testing and Releasing

Testing

Releasing

Automation

Summary

6. Capacity Planning

A quick introduction to business finance

Why plan?

Defining a plan

Architecture–where performance changes come from

Tech as a profit center and procurement

Summary

7. Building Tools

Finding projects

Defining projects

Planning projects

Building projects

Documenting and maintaining projects

Summary

8. User Experience

An introduction to design and UX

User testing

Developer experience

Experience of tools

Performance budgets

Security

ACM code of ethics

Summary

References

9. Networking Foundations

The internet

Sending an HTTP request

Tools for watching the network

Summary

10. Linux and Cloud Foundations

Linux fundamentals

Cloud fundamentals

Units of scale

Example architecture interview

Summary

References

Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Customer Reviews

4.5 (10)

5 star

80%

4 star

3 star

10%

2 star

10%

1 star

Real-World SRE

By : Nat Welch

Real-World SRE

By: Nat Welch

Overview of this book

Analyzing past postmortems

MTTR and MTBF

Unlock full access

Continue reading for free

Real-World SRE

By : Nat Welch

Real-World SRE

By: Nat Welch

Overview of this book

Analyzing past postmortems

MTTR and MTBF

Unlock full access

Continue reading for free

Create a Note

Delete Bookmark

Delete Note

Edit Note

Confirmation

Buy this book with your credits?