Becoming a Rockstar SRE: Electrify your site reliability engineering mindset to build reliable, resilient, and efficient systems

Proffitt, Jeremy, Anami, Rod

  • 出版商: Packt Publishing
  • 出版日期: 2023-04-28
  • 售價: $1,650
  • 貴賓價: 9.5$1,568
  • 語言: 英文
  • 頁數: 420
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1803239220
  • ISBN-13: 9781803239224
  • 相關分類: DevOps
  • 立即出貨 (庫存=1)

商品描述

Excel in site reliability engineering by learning from field-driven lessons on observability and reliability in code, architecture, process, systems management, costs, and people to minimize downtime and enhance developers' output

Purchase of the print or Kindle book includes a free eBook in the PDF format


Key Features:

  • Understand the goals of an SRE in terms of reliability, efficiency, and constant improvement
  • Master highly resilient architecture in server, serverless, and containerized workloads
  • Learn the why and when of employing Kubernetes, GitHub, Prometheus, Grafana, Terraform, Python, Argo CD, and GitOps


Book Description:

Site reliability engineering is all about continuous improvement, finding the balance between business and product demands while working within technological limitations to drive higher revenue. But quantifying and understanding reliability, handling resources, and meeting developer requirements can sometimes be overwhelming. With a focus on reliability from an infrastructure and coding perspective, Becoming a Rockstar SRE brings forth the site reliability engineer (SRE) persona using real-world examples.

This book will acquaint you the role of an SRE, followed by the why and how of site reliability engineering. It walks you through the jobs of an SRE, from the automation of CI/CD pipelines and reducing toil to reliability best practices. You'll learn what creates bad code and how to circumvent it with reliable design and patterns. The book also guides you through interacting and negotiating with businesses and vendors on various technical matters and exploring observability, outages, and why and how to craft an excellent runbook. Finally, you'll learn how to elevate your site reliability engineering career, including certifications and interview tips and questions.

By the end of this book, you'll be able to identify and measure reliability, reduce downtime, troubleshoot outages, and enhance productivity to become a true rockstar SRE!


What You Will Learn:

  • Get insights into the SRE role and its evolution, starting from Google's original vision
  • Understand the key terms, such as golden signals, SLO, SLI, MTBF, MTTR, and MTTD
  • Overcome the challenges in adopting site reliability engineering
  • Employ reliable architecture and deployments with serverless, containerization, and release strategies
  • Identify monitoring targets and determine observability strategy
  • Reduce toil and leverage root cause analysis to enhance efficiency and reliability
  • Realize how business decisions can impact quality and reliability


Who this book is for:

This book is for IT professionals, including developers looking to advance into an SRE role, system administrators mastering technologies, and executives experiencing repeated downtime in their organizations. Anyone interested in bringing reliability and automation to their organization to drive down customer impact and revenue loss while increasing development throughput will find this book useful. A basic understanding of API and web architecture and some experience with cloud computing and services will assist with understanding the concepts covered.

商品描述(中文翻譯)

在現場驗證和可靠性方面取得卓越成就,從代碼、架構、流程、系統管理、成本和人員方面學習觀察性和可靠性的實戰經驗,以減少停機時間並提高開發人員的產出。

購買印刷版或Kindle電子書,將包含一本PDF格式的免費電子書。

主要特點:
- 了解SRE在可靠性、效率和持續改進方面的目標
- 掌握在服務器、無服務器和容器化工作負載中高度強大的架構
- 學習何時以及為什麼使用Kubernetes、GitHub、Prometheus、Grafana、Terraform、Python、Argo CD和GitOps

書籍描述:
現場可靠性工程是關於持續改進的,找到在技術限制下在業務和產品需求之間取得平衡,以推動更高收入。但是,量化和理解可靠性、處理資源以及滿足開發人員需求有時可能會讓人感到不知所措。《成為搖滾明星SRE》著重於從基礎設施和編碼角度關注可靠性,並使用真實世界的例子介紹了現場可靠性工程師(SRE)的角色。

本書將向您介紹SRE的角色,以及現場可靠性工程的原因和方式。它將引導您了解SRE的工作,從自動化CI/CD流程和減少繁瑣工作到可靠性最佳實踐。您將學習什麼會產生糟糕的代碼,以及如何通過可靠的設計和模式來避免它。本書還指導您在各種技術問題上與企業和供應商進行互動和談判,並探索觀察性、故障和如何製作出色的運行手冊。最後,您將學習如何提升您的現場可靠性工程職業生涯,包括認證和面試技巧和問題。

通過閱讀本書,您將能夠識別和衡量可靠性,減少停機時間,解決故障並提高生產力,成為真正的搖滾明星SRE!

您將學到什麼:
- 瞭解SRE角色及其演變,從Google的最初願景開始
- 理解關鍵術語,如黃金信號、SLO、SLI、MTBF、MTTR和MTTD
- 克服採用現場可靠性工程的挑戰
- 使用無服務器、容器化和發布策略實現可靠的架構和部署
- 確定監控目標並確定觀察策略
- 減少繁瑣工作,利用根本原因分析提高效率和可靠性
- 瞭解業務決策如何影響質量和可靠性

本書適合對象:
本書適合IT專業人士,包括希望晉升為SRE角色的開發人員,掌握技術的系統管理員以及在組織中經歷反覆停機的高管。任何有興趣為其組織帶來可靠性和自動化,以減少客戶影響和收入損失,同時提高開發效率的人都會發現本書有用。對API和Web架構有基本的理解,以及在雲計算和服務方面有一些經驗,將有助於理解所涵蓋的概念。