DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Low-Code Development: Leverage low and no code to streamline your workflow so that you can focus on higher priorities.

DZone Security Research: Tell us your top security strategies in 2024, influence our research, and enter for a chance to win $!

Launch your software development career: Dive head first into the SDLC and learn how to build high-quality software and teams.

Open Source Migration Practices and Patterns: Explore key traits of migrating open-source software and its impact on software development.

Related

  • Books To Start Your Career in Cloud, DevOps, or SRE in 2024
  • How To Reduce MTTR
  • SAP Commerce Cloud Architecture: All You Need to Know!
  • Private Cloud's Deployment

Trending

  • Packages for Store Routines in MariaDB 11.4
  • Getting Started With Microsoft Tool Playwright for Automated Testing
  • Enhance IaC Security With Mend Scans
  • Mastering System Design: A Comprehensive Guide to System Scaling for Millions, Part 2
  1. DZone
  2. Testing, Deployment, and Maintenance
  3. Monitoring and Observability
  4. Implementing SLAs, SLOs, and SLIs: A Practical Guide for SREs

Implementing SLAs, SLOs, and SLIs: A Practical Guide for SREs

Explore definitions along with how SLAs, SLOs, and SLIs help in effective monitoring and maintaining system performance.

By 
Karthigayan Devan user avatar
Karthigayan Devan
·
Jun. 13, 24 · Analysis
Like (3)
Save
Tweet
Share
3.6K Views

Join the DZone community and get the full member experience.

Join For Free

In today’s Information Technology (IT) digital transformation world, many applications are getting hosted in cloud environments every day. Monitoring and maintaining these applications daily is very challenging and we need proper metrics in place to measure and take action. This is where the importance of implementing SLAs, SLOs, and SLIs comes into the picture and it helps in effective monitoring and maintaining the system performance.  

Defining SLA, SLO, SLI, and SRE

What Is an SLA? (Commitment)

A Service Level Agreement is an agreement that exists between the cloud provider and client/user about measurable metrics; for example, uptime check, etc. This is normally handled by the company's legal department as per business and legal terms. It includes all the factors to be considered as part of the agreement and the consequences if it fails; for example, credits, penalties, etc. It is mostly applicable for paid services and not for free services. 

What Is an SLO? (Objective)

A Service Level Objective is an objective the cloud provider must meet to satisfy the agreement made with the client. It is used to mention specific individual metric expectations that cloud providers must meet to satisfy a client’s expectation (i.e., availability, etc). This will help clients to improve overall service quality and reliability. 

What Is an SLI? (How Did We Do?)

A Service Level Indicator measures compliance with an SLO and actual measurement of SLI. It gives a quantified view of the service's performance (i.e., 99.92% of latency, etc.). 

Who Is an SRE?

A Site Reliability Engineer is an engineer who always thinks about minimizing gaps between software development and operations. This term is slightly related to DevOps, which focuses on identifying the gaps. An SRE creates and uses automation tools to monitor and observe software reliability in production environments. 

In this article, we will discuss the importance of SLOs/SLIs/SLAs and how to implement them into production applications by a Site Reliability Engineer (SRE). 

Implementation of SLOs and SLIs

Let’s assume we have an application service that is up and running in a production environment.   The first step is to determine what an SLO should be and what it should cover. 

Example of SLOs

  • SLO = Target 
    • Above this target, GOOD
    • Below this target, BAD: Needs an action item
      • While setting up a Target, please do not consider it 100% reliable.  It is practically not possible and it fails most of the items due to patches, deployments, downtime, etc. This is where Error Budget (EB) comes into the picture. EB is the maximum amount of time that a service can fail without contractual consequences.

For example:

  • SLA = 99.99% uptime
    • EB = 55 mins and 35 secs per year, or 4 mins and 23 secs per month, the system can go down without consequences. A step is how to measure this SLO, and it is where SLI comes into the picture, which is an indicator of the level of service that you are providing. 

Example of SLIs

  • HTTP reqs = No. of success/total requests

Common SLI Metrics

  • Durability
  • Response time
  • Latency
  • Availability
  • Error rate
  • Throughput

Leverage automation of deployment monitoring and reporting tools to check SLIs and detect deviations from SLOs in real-time (i.e., Prometheus, Grafana, etc.).

Category SLO SLI
Availability 99.92% uptime/month X % of the time app is available
Latency 92% of reqs with response time under 240 ms X average resp time for user reqs
Error rate Less than 0.8% of requests result in errors X % of reqs that fail

Challenges

  • SLA: Normally, SLAs are written by business or legal teams with no input from technical teams, which results in missing key aspects to measure. 
  • SLO: Not able to measure or too broad to calculate 
  • SLI: There are too many metrics and differences in capturing and calculating the measures.  It leads to lots of effort for the SREs and gives less beneficial results.

Best Practices

  • SLA: Involve the technical team when SLAs are written by the company's business/legal team and the provider. This will help to reflect exact tech scenarios into the agreement. 
  • SLO: This should be simple, and easily measurable to check, whether we are in line with objectives or not. 
  • SLI: Define all standard metrics to monitor and measure. It will help SREs to check the reliability and performance of the services.

Conclusion

Implementation of SLAs, SLOs, and SLIs should be included as part of the system requirements and design and it should be in continuous improvement mode. SREs need to understand and take responsibility for how the systems serve the business needs and take necessary measures to minimize the impact.

Site reliability engineering System requirements Cloud systems

Opinions expressed by DZone contributors are their own.

Related

  • Books To Start Your Career in Cloud, DevOps, or SRE in 2024
  • How To Reduce MTTR
  • SAP Commerce Cloud Architecture: All You Need to Know!
  • Private Cloud's Deployment

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: