Published: May 30, 2025
Written By: Ng Jun Hao

Overview
Introduction to SRE and principles that will guide us in our
testing.
History
The concept for SRE was first started at Google.
Goals
SRE aims to help software engineers meet business objectives in areas
like performance. By improving the reliability of software, companies
can boost the reputation of their products through third-party software
ranking organisations.
Core responsibilities
- Availability: Ensure that the system is operational
and accessible when needed.
- Latency: Minimize the delay in processing and
responding to requests.
- Performance: Optimize the system to handle
workloads efficiently.
- Efficiency: Maximize the use of resources to
achieve the desired output.
- Monitoring: Continuously observe and check the
health and performance of the system.
- Change Management: Implement changes in a
controlled and systematic manner.
- Emergency Response: React promptly and effectively
to incidents and outages.
- Capacity Planning: Predict and prepare for future
resource requirements.
Inspiration
“If you can’t measure it, you can’t improve it.” — Peter Drucker
How to break down an
application?
- Business objectives
- Features/components
- How to trigger them
How to model a product?
- Objective, Step, Description
Deliverables?
- Severity, Description
- Criticality, Description
- Automation: Ansible, Terraform
- Monitoring: Prometheus, Grafana
Terminology
- SLO, Service Level Objectives
- SLA, Service Level Agreements
- SLI, Service Level Indicator
Resources
Navigation
Back to home