Python 大數據分析與應用實戰

餘本國,劉寧,李春報

  • 出版商: 電子工業
  • 出版日期: 2021-12-01
  • 定價: $654
  • 售價: 8.5$556
  • 語言: 簡體中文
  • 頁數: 356
  • 裝訂: 平裝
  • ISBN: 7121421976
  • ISBN-13: 9787121421976
  • 相關分類: 大數據 Big-dataData Science
  • 立即出貨 (庫存 < 4)

買這商品的人也買了...

商品描述

本書是介紹如何用Python 進行數據處理和分析的學習實戰指南。主要內容包括Python語言基礎、數據處理、數據分析、數據可視化圖形的製作,以及利用Python對數據庫的的貝葉斯操作、利用深度學習技術對模型進行優化等內容。本書主要分為3部分:第1部分包括第1章主要講解Python的基礎知識,第2部分包括第2~6章為實戰案例,第3部分包括第7~8章主要講解利用深度學習和協同過濾技術對大數據分析進行為拓展與延伸。本書內容豐富,講解通俗易懂,適合本科生、研究生,以及對Python語言感興趣或者想要使用Python語言進行數據分析的廣大讀者。

作者簡介

餘本國,博士,碩士研究生導師,現工作於海南醫學院生物醫學信息與工程學院。主講高等數學、微積分、Python語言、大數據分析基礎等課程。 2012年到加拿大York University做訪問學者。
出版《Python數據分析基礎》《基於Python的大數據分析基礎及實戰》《Python在機器學習中的應用》《PyTorch深度學習入門與實戰》《Python編程與數據分析應用》等書。


劉寧,深圳大學信號與信息處理專業碩士研究生畢業,目前從事智慧城市、數字政府建設等相關工作。
曾發表SCI論文Content-based image retrieval using high-dimensional information geometry,出版《高維信息幾何與幾何不變量》等著作。


李春報
海南醫學院現代教育技術中心高級實驗師,從事教育領域信息化研究工作,兼任海南信息化協會監事長,海南省網絡安全協會專家等職。

目錄大綱

第 1 章 Python 語法基礎 ··························· 1
1.1 安裝 Anaconda ····································· 1
1.1.1 代碼提示 ······························· 4
1.1.2 變量瀏覽 ······························· 5
1.1.3 安裝第三方庫 ························· 5
1.2 語法基礎 ············································ 6
1.2.1 字符串、列表、元組、字典和集合 ····································· 6
1.2.2 條件判斷、循環和函數 ··········· 13
1.2.3 異常 ··································· 17
1.2.4 特殊函數 ····························· 20
1.3 Python 基礎庫應用入門 ························ 22
1.3.1 NumPy 庫應用入門 ················ 23
1.3.2 Pandas 庫應用入門 ················· 29
1.3.3 Matplotlib 庫應用入門 ············· 40
1.4 本章小結 ·········································· 45

第 2 章 天氣數據的獲取與建模分析 ·········· 52
2.1 準備工作 ·········································· 52
2.2 利用抓取方法獲取天氣數據 ·················· 54
2.2.1 網頁解析 ····························· 54
2.2.2 抓取一個靜態頁面中的天氣數據 ··································· 57
2.2.3 抓取歷史天氣數據 ················· 60
2.3 天氣數據可視化 ································· 63
2.3.1 查看數據基本信息 ················· 63
2.3.2 變換數據格式 ······················· 64
2.3.3 氣溫走勢的折線圖 ················· 66
2.3.4 歷年氣溫對比圖 ···················· 67
2.3.5 天氣情況的柱狀圖 ················· 69
2.3.6 使用 Tableau 製作天氣情況的氣泡雲圖 ····························· 70
2.3.7 風向佔比的餅圖 ···················· 73
2.3.8 使用 windrose 庫繪製風玫瑰圖 ·· 74
2.4 機器學習在天氣預報中的應用 ··············· 76
2.4.1 線性回歸的基本概念 ·············· 76
2.4.2 使用一元線性回歸預測氣溫 ····· 77
2.4.3 使用多元線性回歸預測氣溫 ····· 85
2.5 本章小結 ·········································· 91

第 3 章 養成遊戲中人物的數據搭建 ·········· 92
3.1 準備工作 ·········································· 92
3.2 利用 Pyecharts 庫進行數據基本情況分析 ··· 93
3.2.1 感染人數分佈圖 ···················· 94
3.2.2 病情分佈圖 ·························· 96
3.2.3 病症情況堆疊圖 ···················· 97
3.2.4 繪製出院、死亡情況折線圖 ····· 98
3.2.5 病情熱力圖 ························· 100
3.2.6 病情分佈象形圖 ··················· 101
3.2.7 人口流動示意圖 ··················· 103
3.3 感染病例分析 ··································· 105
3.3.1 基本信息統計 ······················ 106
3.3.2 使用直方圖展示感染週期 ······· 108
3.3.3 使用詞云圖展示死亡病例情況 ··· 111
3.4 疫情趨勢預測 ··································· 114
3.4.1 利用邏輯方程預測感染人數 ···· 115
3.4.2 利用 SIR 模型進行疫情預測 ···· 120
3.4.3 Logistic 模型和 SIR 模型的對比 ·································· 128
3.5 本章小結 ········································· 131

第 4 章 航空數據分析 ···························· 132
4.1 準備工作 ········································· 132
4.2 基本情況統計分析 ····························· 135
4.2.1 查看數據的基本信息 ············· 135
4.2.2 航空公司、機型分佈 ············· 137
4.2.3 展示各個城市航班數量的 3D地圖 ·································· 139
4.2.4 從首都機場出發的桑基圖 ······· 142
4.2.5 通過關係圖展示航線 ············· 145
4.3 利用 Floyd 算法計算短飛行時間 ········· 148
4.3.1 Floyd 算法簡介 ···················· 148
4.3.2 Floyd 算法的流程 ················· 150
4.3.3 算法程序實現 ······················ 150
4.3.4 結果分析 ···························· 154
4.4 本章小結 ········································· 158

第 5 章 市民服務熱線文本數據分析 ········· 160
5.1 準備工作 ········································· 160
5.2 基本情況分析 ··································· 162
5.2.1 數據分佈基本信息 ················ 162
5.2.2 每日平均工單量分析 ············· 165
5.2.3 來電時間分析 ······················ 166
5.2.4 工單類型分析 ······················ 167
5.3 利用詞云圖展示工單內容 ···················· 171
5.3.1 工單分詞 ···························· 171
5.3.2 去除停用詞 ························· 172
5.3.3 詞頻統計 ···························· 173
5.3.4 市民反映問題詞云圖 ············· 175
5.3.5 保存數據 ···························· 176
5.4 基於樸素貝葉斯的工單自動分類轉辦 ····· 177
5.4.1 需求概述 ···························· 177
5.4.2 樸素貝葉斯模型的基本概念 ···· 177
5.4.3 樸素貝葉斯文本分類算法的流程 ·································· 181
5.4.4 程序實現 ···························· 182
5.5 基於 K-Means 算法和 PCA 方法降維的熱點問題挖掘 ··································· 189
5.5.1 應用場景 ···························· 189
5.5.2 K-Means 算法和 PCA 方法的基本原理 ···························· 189
5.5.3 熱點問題挖掘算法的流程 ······· 193
5.5.4 程序實現 ···························· 194
5.6 本章小結 ········································· 205

第 6 章 決策樹信貸風險控制 ·················· 206
6.1 準備工作 ········································· 206
6.2 數據集基本情況分析 ·························· 209
6.2.1 查看數據大小和缺失情況 ······· 209
6.2.2 繪製直方圖查看數據的分佈情況 ·································· 211
6.2.3 繪製直方圖的 3 種方法 ·········· 212
6.2.4 通過箱型圖查看異常值的情況 ···· 213
6.2.5 異常值和缺失值的處理 ·········· 217
6.2.6 使用小提琴圖展示預處理後的數據 ·································· 218
6.3 利用決策樹進行信貸數據建模 ·············· 219
6.3.1 決策樹原理簡介 ··················· 219
6.3.2 決策樹信貸建模流程 ············· 225
6.3.3 利用 scikit-learn 庫實現決策樹風險控制算法 ······················ 226
6.3.4 模型優化 ···························· 231
6.4 本章小結 ········································· 233

第 7 章 利用深度學習進行垃圾圖片分類 ···· 234
7.1 準備工作 ········································· 234
7.2 深度學習的基本原理 ·························· 237
7.2.1 CNN 的基本原理 ·················· 237
7.2.2 Keras 庫簡介 ······················· 240
7.3 利用 Keras 庫實現基於CNN 的垃圾圖片分類 ········································ 241
7.3.1 算法流程 ···························· 241
7.3.2 數據預處理 ························· 241
7.3.3 CNN 模型實現 ····················· 247
7.4 優化 CNN 模型 ································· 252
7.4.1 選擇優化器 ························· 252
7.4.2 選擇損失函數 ······················ 254
7.4.3 調整模型 ···························· 256
7.4.4 圖片增強 ···························· 259
7.4.5 改變學習率 ························· 263
7.5 模型應用 ········································· 265
7.6 本章小結 ········································· 268

第 8 章 協同過濾和矩陣分解推薦算法分析 ········································· 269
8.1 準備工作 ········································· 269
8.2 基於協同過濾算法的短視頻完播情況分析 ··············································· 271
8.2.1 基於用戶的協同過濾算法的原理 ·································· 271
8.2.2 算法流程 ···························· 274
8.2.3 程序實現 ···························· 275
8.3 基於矩陣分解算法的短視頻完播情況預測 ·············································· 283
8.3.1 算法原理 ···························· 283
8.3.2 利用 Surprise 庫實現 SVD算法 ·································· 286
8.4 幾種方法在集中的表現 ················· 289
8.5 本章小結 ········································· 291

第 9 章 《紅樓夢》文本數據分析 ············ 292
9.1 準備工作 ········································· 292
9.1.1 編程環境 ···························· 292
9.1.2 數據情況簡介 ······················ 293
9.2 分詞 ··············································· 294
9.2.1 讀取數據 ···························· 295
9.2.2 數據預處理 ························· 298
9.2.3 分詞及去除停用詞 ················ 306
9.2.4 製作詞云圖 ························· 307
9.3 文本聚類分析 ··································· 316
9.3.1 構建分詞 TF-IDF 矩陣 ··········· 317
9.3.2 K-Means 聚類 ······················ 318
9.3.3 MDS 降維 ··························· 320
9.3.4 PCA 降維 ··························· 321
9.3.5 HC 聚類 ····························· 323
9.3.6 t -SNE 高維數據可視化 ·········· 325
9.4 LDA 主題模型 ·································· 326
9.5 人物社交網絡分析 ····························· 332
9.6 本章小結 ········································· 338

附錄 A 抓取數據請求頭查詢 ··················· 339
附錄 B GraphViz 庫的安裝方法 ·············· 341
附錄 C 在 Windows 10 中安裝 TensorFlow的方法 ····································· 343
參考文獻 ··············································· 346
致謝 ····················································· 34