
Real-World SRE
By :

Defining a plan is a series of steps. Each step requires the evaluation of a question to discover the answer:
Let us work through each step. Note that I am using long periods of time in a lot of the examples below. If you are using systems that can automatically add and remove capacity, you may be working with data on the scale of minutes instead of days.
There are lots of ways to define the capacity of your infrastructure. You can use aggregated metrics such as CPU usage, disk storage availability, requests per minute, packets per second, or any application metric. Usually, the metrics you want to focus on are the resources that you use the most or the resources that are most important to you. What is most important often comes from your SLOs and the SLIs (Service Level Indicators) behind them. Note that the metrics do not have...