Site Reliability Engineering

Site Reliability Engineering

5/1/2020

link

https://openlibrary.org/books/OL27208603M

summary

Site Reliability Engineering (SRE) is vital for any organization running a large-scale system. In this collection of essays and articles, Google’s Site Reliability Team shares their experiences and expertise in building, deploying, monitoring, and maintaining some of the largest software systems in the world. The book is divided into four sections that will help you learn about site reliability engineering, principles that influence the work of a site reliability engineer, theory and practice of an SRE’s day-to-day work and Google's best practices for training, communication, and meetings. You'll learn how Google implemented site reliability engineering within their organization and how they make their systems more scalable, reliable, and efficient. This book provides valuable insights and practices that can be easily applied to your organization's site reliability engineering practices. Whether you are a software engineer, product manager, or system administrator, this book is a must-read for anyone working with large-scale computing systems.

tags

software development ꞏ google ꞏ site reliability engineering ꞏ computer engineering ꞏ essays ꞏ scalability ꞏ reliability ꞏ efficiency ꞏ large-scale computing systems ꞏ distributed systems ꞏ it industry practices ꞏ management ꞏ training ꞏ communication ꞏ meetings ꞏ computing ꞏ technology ꞏ information technology ꞏ production systems ꞏ lifecycle ꞏ principles ꞏ practices