Deep Learning - Hardware Design

Albert Chun Chen Liu, Oscar Ming Kin Law

  • 出版日期: 2020-03-26
  • 售價: $680
  • 語言: 英文
  • 頁數: 107
  • 裝訂: 平裝
  • ISBN: 9869890202
  • ISBN-13: 9789869890205
  • 相關分類: DeepLearning 深度學習
  • 相關翻譯: 深度學習 - 硬體設計 (繁中版)
  • 銷售排行: 🥉 2020/10 英文書 銷售排行 第 3 名
    🥇 2020/8 英文書 銷售排行 第 1 名
    🥇 2020/7 英文書 銷售排行 第 1 名
    🥇 2020/6 英文書 銷售排行 第 1 名
    🥇 2020/5 英文書 銷售排行 第 1 名

立即出貨

買這商品的人也買了...

相關主題

商品描述

Preface

In 2012, the convolutional neural network (CNN) technology arrived at major breakthroughs. Since then, deep learning has become widely integrated into daily life via automotive, retail, healthcare and finance products. In 2016, the triumph of Alpha Go, as enabled by reinforcement learning (RL), further proved that the AI revolution is set to transform society––much as did the personal computer (in 1977), internet (in 1994), and the smartphone (in 2007.) Nonetheless, the revolution’s innovative efforts have thus far been focused on software development. Major hardware challenges, such as the following, remain little addressed:

•    Big input data
•    Deep neural network
•    Massive parallel processing
•    Reconfigurable network
•    Memory bottleneck
•    Intensive computation
•    Network pruning
•    Data sparsity

This book reviews various hardware designs, including the CPU, GPU and NPU. It also surveys special features aimed at resolving the above challenges. New hardware may be derived from the following designs for performance and power improvement:

•    Parallel architecture
•    Convolution optimization
•    In-memory computation
•    Near-memory architecture
•    Network optimization

The book is organized as follows:

•    Chapter 1: The neural network and its history
•    Chapter 2: The convolutional neural network model, it’s layer functions, and examples
•    Chapter 3: Parallel architectures––the Intel CPU, Nvidia GPU, Google TPU and Microsoft NPU)
•    Chapter 4: Optimizing convolution––the UCLA DCNN accelerator and MIT Eyeriss DNN
•    Chapter 5: The GT Neurocube architecture and Stanford Tetris DNN process with in-memory computation using Hybrid Memory Cube (HMC)
•    Chapter 6: Near-memory architecture––the ICT DaDianNao supercomputer and UofT Cnvlutin DNN accelerator
•    Chapter 7: Energy-efficient inference engines for network pruning


Future revisions will incorporate new approaches for enhancing deep learning hardware designs alongside other topics, including:

•    Distributive graph theory
•    High speed arithmetic
•    3D neural processing

作者簡介

劉峻誠 Albert Chun Chen Liu

創辦人暨執行長

劉峻誠為Kneron創辦人暨執行長,於2015年在美國聖地牙哥創辦耐能。自台灣國立成功大學畢業後,獲得美國雷神公司(Raytheon)獎學金和加州大學獎學金,赴美深造,就讀美國加州大學柏克萊、洛杉磯與聖地牙哥分校的共同研究計劃碩博班,之後取得加州大學(UCLA)電子工程博士學位。劉峻誠先後在高通、三星電子研發中心、晨星半導體(MStar)和Wireless Info 等企業擔任不同的研發和管理職務。於高通任職期間,領導研發團隊獲得9個核心技術專利,榮獲公司的ImpaQt研發大獎。

劉峻誠曾受邀在加州大學開授計算機視覺技術與人工智慧講座課程,也是諸多國際知名學術期刊的技術審稿人,此外,還曾參與美國智產局 IARPA 與貝爾實驗室NASA前端合作技術開發,在人工智慧、電腦視覺和影像處理領域擁有超過30餘項國際專利,先後在國際重要期刊發表70餘篇論文。

 

羅明健 Oscar Ming Kin Law

目錄大綱

1 Introduction .......................................................................................................................................... 5
1.1 History ........................................................................................................................................... 5
1.2 Neural Network ............................................................................................................................. 6
2 Deep Learning ....................................................................................................................................... 7
2.1 Network Model ............................................................................................................................. 7
2.1.1 Convolutional Layer .............................................................................................................. 7
2.1.2 Activation Layer .................................................................................................................... 7
2.1.3 Pooling .................................................................................................................................. 7
2.1.4 Normalization ........................................................................................................................ 7
2.2 Deep Learning Challenges ............................................................................................................. 8
3 Parallel Architecture ............................................................................................................................. 9
3.1 Intel Central Processing Unit (CPU)............................................................................................... 9
3.1.1 Skylake Mesh Architecture ................................................................................................. 10
3.1.2 Intel Ultra Path Interconnect (UPI) ..................................................................................... 12
3.1.3 Sub-NUMA Clustering (SNC) ............................................................................................... 13
3.1.4 Cache Hierarchy Changes .................................................................................................... 14
3.1.5 Advanced Vector Software Extension ................................................................................. 15
3.1.6 Math Kernel Library for Deep Neural Network (MKL-DNN) ............................................... 15
3.2 Nvidia Graphics Processing Unit (GPU) ....................................................................................... 16
3.2.1 Tensor Core Architecture .................................................................................................... 18
3.2.2 Simultaneous Multi-Threading (SMT) ................................................................................. 21
3.2.3 High Bandwidth Memory (HBM2)....................................................................................... 21
3.2.4 NVLink2 Configuration ........................................................................................................ 22
3.3 Google Tensor Processing Unit (TPU) ......................................................................................... 24
3.3.1 System Architecture ............................................................................................................ 25
3.3.2 Multiply-Accumulate (MAC) Systolic Array ......................................................................... 27
3.3.3 New Brain Floating Point Format ........................................................................................ 28
3.3.4 Cloud TPU Configuration ..................................................................................................... 29
3.3.5 Cloud Software Architecture ............................................................................................... 31
3.4 Microsoft Catapult Fabric NPU Processor ................................................................................... 32
3.4.1 System Configuration .......................................................................................................... 32
3.4.2 Neural Processor Architecture ............................................................................................ 32
3.4.3 Matrix-Vector Multiplier ..................................................................................................... 33
3.4.4 Sparse Matrix-Vector Multiplication ................................................................................... 33
4 Convolution Optimization ................................................................................................................... 35
4.1 UCLA DCNN Accelerator .............................................................................................................. 35
4.1.1 System Architecture ............................................................................................................ 35
4.1.2 Filter Decomposition ........................................................................................................... 35
4.1.3 Streaming Architecture ....................................................................................................... 35
4.1.4 Convolution Unit (CU) Engine ............................................................................................. 36
4.1.5 Accumulation (ACCU) Buffer ............................................................................................... 36
4.1.6 Max Pooling......................................................................................................................... 36
4.2 MIT Eyeriss DNN Accelerator ...................................................................................................... 36
4.2.1 Convolution Mapping .......................................................................................................... 37
4.2.2 Row Stationary (RS) Dataflow ............................................................................................. 37
4.2.3 Run-Length Compression (RLC) ........................................................................................... 38
4.2.4 Network-on-Chip (NoC) ...................................................................................................... 38
4.2.5 Row Stationary Plus (RS+) Dataflow ................................................................................... 39
5 In-Memory Hierarchy .......................................................................................................................... 40
5.1 GT Neurocube Architecture ........................................................................................................ 40
5.1.1 Hybrid Memory Cube (HNC) ............................................................................................... 40
5.1.2 Memory Centric Neural Computing (MCNC) ...................................................................... 42
5.1.3 Programmable Neurosequence Generator (PNG) .............................................................. 43
5.2 Stanford Tetris DNN Processor ................................................................................................... 44
5.2.1 Memory Hierarchy .............................................................................................................. 45
5.2.2 In-Memory Accumulation ................................................................................................... 46
5.2.3 Data Scheduling .................................................................................................................. 46
5.2.4 NN Partitioning across Vaults ............................................................................................. 47
6 Near-Memory Architecture ................................................................................................................ 49
6.1 ICT DaDianNao Supercomputer .................................................................................................. 49
6.1.1 Memory Configuration ........................................................................................................ 49
6.1.2 Neural Functional Unit (NFU) .............................................................................................. 49
6.2 UofT Cnvlutin DNN Accelerator .................................................................................................. 49
6.2.1 System Architecture ............................................................................................................ 49
6.2.2 Zero-Free Neuron Array Format (ZFNAf) ............................................................................ 50
6.2.3 Network Pruning ................................................................................................................. 50
6.2.4 Raw or Encoded Format (RoE) ............................................................................................ 51
6.2.5 Vector Ineffectual Activation Identifier Format (VIAI) ........................................................ 51
6.2.6 Zero Memory Overhead Ineffectual Activation Skipping ................................................... 51
7 Network Pruning ................................................................................................................................. 52
7.1 Energy Efficient Inference Engine (EIE) ....................................................................................... 52
7.1.1 Compressed DNN Model ..................................................................................................... 52
7.1.2 Central Control Unit (CCU) .................................................................................................. 52