# Python機器學習手冊：從數據預處理到深度學習

### Chris Albon 著作 韓慧昌,林然,徐江 譯

• 出版商:
• 出版日期: 2019-07-01
• 定價: \$534
• 售價: 7.5\$401
• 語言: 簡體中文
• ISBN: 7121369621
• ISBN-13: 9787121369629
• 相關分類:

## 目錄大綱

1.0簡介.................... .................................................. ...............................1
1.1創建一個向量.............. .................................................. ......................1
1.2創建一個矩陣....................... .................................................. .............2
1.3創建一個稀疏矩陣............................... ................................................3
1.4選擇元素................................................ ..............................................5
1.5展示一個矩陣的屬性............................................... ............................6
1.6對多個元素同時應用某個操作........................................ ....................7
1.7找到最大值和最小值...................... .................................................. ...8
1.8計算平均值、方差和標準差..................................... ...........................9
1.9矩陣變形................... .................................................. .......................10
1.10轉置向量或矩陣.................... ........................................... 11
1.11展開一個矩陣.. .................................................. ................................12
1.12計算矩陣的秩............ .................................................. ......................13
1.13計算行列式.............................................. ..........................................14
1.14獲取矩陣的對角線元素................................................. ....................14
1.15計算矩陣的跡........................ .................................................. ..........15
1.16計算特徵值和特徵向量................................ .....................................16
1.17計算點積........ .................................................. .................................17
1.18矩陣的相加或相減........ .................................................. ..................18
1.19矩陣的乘法........................... .................................................. ...........19
1.20計算矩陣的逆............................................. .......................................20
1.21生成隨機數...... .................................................. ................................21

2.0簡介............................. .................................................. ....................23
2.1加載樣本數據集........................ .................................................. ......23
2.2創建仿真數據集...................................... ..........................................25
2.3加載CSV文件... .................................................. .............................28
2.4加載Excel文件.............................................. ...................................29
2.5加載JSON文件.......... .................................................. .....................29
2.6查詢SQL數據庫........................ .................................................. .....31

3.0簡介...... .................................................. ...........................................33
3.1創建一個數據幀. .................................................. .............................34
3.2描述數據................. .................................................. .........................35
3.3瀏覽數據幀.............................................. ..........................................37
3.4根據條件語句來選擇行.................................................. ...................39
3.5替換值........................... .................................................. ..................40
3.6重命名列........................... .................................................. ...............41
3.7計算最小值、最大值、總和、平均值與計數值................... .............43
3.8查找唯一值................................ .................................................. ......44
3.9處理缺失值....................................... .................................................45
3.10刪除一列............................................... ............................................47
3.11刪除一行.. .................................................. ........................................48
3.12刪除重複行..... .................................................. .................................49
3.13根據值對行分組.......... .................................................. ....................51
3.14按時間段對行分組...................... .................................................. ....52
3.15遍歷一個列的數據....................................... .....................................54
3.16對一列的所有元素應用某個函數.. .................................................. ..55
3.17對所有分組應用一個函數........................................... ......................56
3.18連接多個數據幀..................... .................................................. .........57
3.19合併兩個數據幀.................................. ..............................................59

4.0簡介....................... .................................................. ..........................63
4.1特徵的縮放................... .................................................. ...................63
4.2特徵的標準化.......................... .................................................. ........65
4.3歸一化觀察值............................................ ........................................66
4.4生成多項式和交互特徵... .................................................. ................69
4.5轉換特徵.............................. .................................................. ............70
4.6識別異常值................................. .................................................. .....71
4.7處理異常值........................................ ................................................73
4.8將特徵離散化.............................................. ......................................75
4.9使用聚類的方式將觀察值分組. .................................................. .......77
4.10刪除帶有缺失值的觀察值......................................... ........................79
4.11填充缺失值..................... .................................................. .................81

5.0簡介.................................................. .................................................83
5.1對nominal型分類特徵編碼........................................... ...................84
5.2對ordinal分類特徵編碼........................ ............................................86
5.3對特徵字典編碼.................................................. ..............................88
5.4填充缺失的分類值............................................ .................................91
5.5處理不均衡分類........... .................................................. ...................93

6.0簡介.......................................... .................................................. .......97
6.1清洗文本....................................... .................................................. ...97
6.2解析並清洗HTML ......................................... ...................................99
6.3移除標點.......... .................................................. .............................. 100
6.4文本分詞............................................... ........................................... 101
6.5刪除停止詞（stop word）......................................... 102
6.6提取詞幹.. .................................................. ...................................... 103
6.7標註詞性........ .................................................. ................................ 104
6.8將文本編碼成詞袋（Bag of Words）..... ........................................... 107
6.9按單詞的重要性加權....................................... 109

7.0簡介................................................ ................................................. 113
7.1把字符串轉換成日期........................................... .............. 113
7.2處理時區................................ .................................................. ........ 115
7.3選擇日期和時間.................................... .......................................... 116
7.4將日期數據切分成多個特徵................................................ ............ 117
7.5計算兩個日期之間的時間差............................ ................................ 118
7.6對一周內的各天進行編碼........ .................................................. ..... 119
7.7創建一個滯後的特徵............................................ ........... 120
7.8使用滾動時間窗口................................. .......................................... 121
7.9處理時間序列中的缺失值................................................. .............. 123

8.0簡介................................................. ................................................ 127
8.1加載圖像................................................ .......................................... 128
8.2保存圖像.... .................................................. .................................... 130
8.3調整圖像大小.............................................. .................................... 131
8.4裁剪圖像.......... .................................................. .............................. 132
8.5平滑處理圖像............... .................................................. ................. 133
8.6圖像銳化............................ .................................................. ............ 136
8.7提升對比度.................................. ................................ 138
8.8顏色分離.............. .................................................. .......................... 140
8.9圖像二值化.................. ........................ 142
8.10移除背景............................................. 144
8.11邊緣檢測............................................... ........................................... 148
8.12角點檢測.. ............................... 150
8.13為機器學習創建特徵............ ..................................... 153
8.14將顏色平均值編碼成特徵.... .................................................. ......... 156
8.15將色彩直方圖編碼成特徵................................ ............................... 157

9.0簡介.... .................................................. ........................................... 161
9.1使用主成分進行特徵降維.......................................... ..................... 161
9.2對線性不可分數據進行特徵降維................... ................................. 164
9.3通過最大化類間可分性進行特徵降維... .......................................... 166
9.4使用矩陣分解法進行特徵降維...................................... 169
9.5對稀疏數據進行特徵降維. .................................................. ............ 170

10.0簡介................... ........................................ 173
10.1數值型特徵方差的閾值化. ..................................... 173
10.2二值特徵的方差閾值化.... ........................................ 175
10.3處理高度相關性的特徵.......................................... 176
10.4刪除與分類任務不相關的特徵......................................... ............. 178
10.5遞歸式特徵消除............................... ............................................. 180

11.0簡介.................... .................................................. 183
11.1交叉驗證模型.......................................... 183
11.2創建一個基準回歸模型........................................ 187
11.3創建一個基準分類模型.................................. 188
11.4評估二元分類器........ ........................................ 190
11.5評估二元分類器的閾值..................................... 193
11.6評估多元分類器................................................. ......... 197
11.7分類器性能的可視化.................................. ................................... 198
11.8評估回歸模型.......... ................................... 201
11.9評估聚類模型......... .................................................. . 203
11.10創建自定義評估指標.......................................... ........................... 204
11.11可視化訓練集規模的影響............... .............................................. 206
11.12生成對評估指標的報告.............................................. ...... 208
11.13可視化超參數值的效果........................................... ...... 209

12.0簡介......... ........................................... 213
12.1使用窮舉搜索選擇最佳模型............................................... ........... 213
12.2使用隨機搜索選擇最佳模型.............................. ............................ 216
12.3從多種學習算法中選擇最佳模型.......... ........ 218
12.4將數據預處理加入模型選擇過程.............................. 220
12.5用並行化加速模型選擇................................. 221
12.6使用針對特定算法的方法加速模型選擇....................................... 223
12.7模型選擇後的性能評估............................ 224

13.0簡介.............................. .......... 227
13.1擬合一條直線.................................. ........ 227
13.2處理特徵之間的影響.................................. ................................... 229
13.3擬合非線性關係........ .................................................. .................. 231
13.4通過正則化減少方差......................... ............................................ 233
13.5使用套索回歸減少特徵.............................................. 235

14.0簡介............... ................ 237
14.1訓練決策樹分類器........................... .............................................. 237
14.2訓練決策樹回歸模型............................................... ...................... 239
14.3可視化決策樹模型...................... .................................................. . 240
14.4訓練隨機森林分類器.......................................... ........................... 243
14.5訓練隨機森林回歸模型................ ............ 244
14.6識別隨機森林中的重要特徵............................. ............................. 245
14.7選擇隨機森林中的重要特徵.......................................... ................ 248
14.8處理不均衡的分類........................... .............................................. 249
14.9控制決策樹的規模............................................... .......................... 250
14.10通過boosting提高性能.................. .............................................. 252
14.11使用袋外誤差（Out-of-Bag Error）評估隨機森林模型................ 253

15.0簡介............................. ...................................... 255
15.1找到一個觀察值的最近鄰... .............................................. 255
15.2創建一個KNN分類器............................................ ....................... 258
15.3確定最佳的鄰域點集的大小............... ........................................... 260
15.4創建一個基於半徑的最近鄰分類器......................... 261

16.0簡介.................................... ........................... 263
16.1訓練二元分類器................ .................................................. .......... 263
16.2訓練多元分類器.................................. .......................................... 265
16.3通過正則化來減小方差............................................. 266
16.4在超大數據集上訓練分類器......................................... ................. 267
16.5處理不均衡的分類.......................... ............................................... 269

17.0簡介.................... ................................................. 271
17.1訓練一個線性分類器............................................ ......................... 271
17.2使用核函數處理線性不可分的數據............... ...................... 274
17.3計算預測分類的概率..................... ................................................ 278
17.4識別支持向量............................................... ........ 279
17.5處理不均衡的分類............................................ ............................. 281

18.0簡介..................................... ........................ 283
18.1為連續的數據訓練分類器................. ............................ 284
18.2為離散數據和計數數據訓練分類器........... .................... 286
18.3為具有二元特徵的數據訓練樸素貝葉斯分類器.............. .............. 287
18.4校準預測概率............................... ......... 288

19.0簡介................................................. ............... 291
19.1使用K-Means聚類算法.......................................... ...................... 291
19.2加速K-Means聚類.................... .................................................. .. 294
19.3使用Meanshift聚類算法......................................... ...................... 295
19.4使用DBSCAN聚類算法..................... .......................................... 296
19.5使用層次合併聚類算法.......................................... 298

20.0簡介....................... ............................................... 301
20.1為神經網絡預處理數據............................................. ....... 302
20.2設計一個神經網絡............................................. ............................ 304
20.3訓練一個二元分類器.............. .................................................. ..... 307
20.4訓練一個多元分類器...................................... ............................... 309
20.5訓練一個回歸模型............. .................................................. .......... 311
20.6做預測.................................... .................................................. ..... 313
20.7可視化訓練歷史........................................ .................................... 315
20.8通過權重調節減少過擬合..... ................................ 318
20.9通過提前結束減少過擬合......... ............................... 320
20.10通過Dropout減少過擬合........................................... .................. 322
20.11保存模型訓練過程.......................... ............................................... 324
20.12使用k折交叉驗證評估神經網絡........................................... ..... 326
20.13調校神經網絡....................................... ................................. 328
20.14可視化神經網絡............ .................................................. .............. 331
20.15圖像分類................................ .................................................. ..... 333
20.16通過圖像增強來改善卷積神經網絡的性能.............................. 337
20.17文本分類............................................... ........................................ 339

21.0簡介........................................... ............................................ 343
21.1保存和加載scikit -learn模型............................................... .......... 343
21.2保存和加載Keras模型................................. ................................. 345