Deep Learning Systems: Algorithms, Compilers, and Processors for Large-Scale Production

Rodriguez, Andres

買這商品的人也買了...

商品描述

This book describes deep learning systems: the algorithms, compilers, and processor components to efficiently train and deploy deep learning models for commercial applications.

The exponential growth in computational power is slowing at a time when the amount of compute consumed by state-of-the-art deep learning (DL) workloads is rapidly growing. Model size, serving latency, and power constraints are a significant challenge in the deployment of DL models for many applications. Therefore, it is imperative to codesign algorithms, compilers, and hardware to accelerate advances in this field with holistic system-level and algorithm solutions that improve performance, power, and efficiency.

Advancing DL systems generally involves three types of engineers: (1) data scientists that utilize and develop DL algorithms in partnership with domain experts, such as medical, economic, or climate scientists; (2) hardware designers that develop specialized hardware to accelerate the components in the DL models; and (3) performance and compiler engineers that optimize software to run more efficiently on a given hardware. Hardware engineers should be aware of the characteristics and components of production and academic models likely to be adopted by industry to guide design decisions impacting future hardware. Data scientists should be aware of deployment platform constraints when designing models. Performance engineers should support optimizations across diverse models, libraries, and hardware targets.

The purpose of this book is to provide a solid understanding of (1) the design, training, and applications of DL algorithms in industry; (2) the compiler techniques to map deep learning code to hardware targets; and (3) the critical hardware features that accelerate DL systems. This book aims to facilitate co-innovation for the advancement of DL systems. It is written for engineers working in one or more of these areas who seek to understand the entire system stack in order to better collaborate with engineers working in other parts of the system stack.

The book details advancements and adoption of DL models in industry, explains the training and deployment process, describes the essential hardware architectural features needed for today's and future models, and details advances in DL compilers to efficiently execute algorithms across various hardware targets.

Unique in this book is the holistic exposition of the entire DL system stack, the emphasis on commercial applications, and the practical techniques to design models and accelerate their performance. The author is fortunate to work with hardware, software, data scientist, and research teams across many high-technology companies with hyperscale data centers. These companies employ many of the examples and methods provided throughout the book.

商品描述(中文翻譯)

本書描述了深度學習系統:用於商業應用的高效訓練和部署深度學習模型的算法、編譯器和處理器組件。

在最先進的深度學習工作負載消耗的計算量迅速增長的同時,計算能力的指數增長正在減緩。模型大小、服務延遲和功耗限制對於許多應用中深度學習模型的部署構成了重大挑戰。因此,有必要通過協同設計算法、編譯器和硬件來加速這一領域的發展,提供整體系統級和算法解決方案,以改善性能、功耗和效率。

推進深度學習系統通常涉及三類工程師:(1)數據科學家與醫學、經濟或氣候科學等領域專家合作,利用和開發深度學習算法;(2)硬件設計師開發專用硬件以加速深度學習模型中的組件;(3)性能和編譯器工程師優化軟件,以在給定硬件上更高效地運行。硬件工程師應該了解可能被行業採用的生產和學術模型的特性和組件,以指導影響未來硬件的設計決策。數據科學家在設計模型時應該考慮部署平台的限制。性能工程師應該支持跨不同模型、庫和硬件目標的優化。

本書的目的是提供對以下內容的全面理解:(1)工業界深度學習算法的設計、訓練和應用;(2)將深度學習代碼映射到硬件目標的編譯器技術;(3)加速深度學習系統所需的關鍵硬件特性。本書旨在促進深度學習系統的共同創新。本書針對在這些領域中工作並希望瞭解整個系統堆棧以更好地與系統堆棧其他部分的工程師合作的工程師撰寫。

本書詳細介紹了深度學習模型在工業界的進展和應用,解釋了訓練和部署過程,描述了當今和未來模型所需的基本硬件架構特性,並詳細介紹了在各種硬件目標上高效執行算法的深度學習編譯器的進展。

本書的獨特之處在於全面闡述整個深度學習系統堆棧,強調商業應用,並提供設計模型和加速性能的實用技術。作者有幸與許多擁有超大規模數據中心的高科技公司的硬件、軟件、數據科學家和研究團隊合作。本書中提供的許多示例和方法正是這些公司所使用的。