Big Data Analysis with Python (Paperback) Combine Spark and Python to unlock the powers of parallel computing and machine learning
暫譯: 使用 Python 進行大數據分析 (平裝本)

Name: Big Data Analysis with Python (Paperback)
Price: 1273 TWD
Availability: OnlineOnly
Author: Ivan Marin , Ankit Shukla , Sarang VK
ISBN: 1789955289

Ivan Marin , Ankit Shukla , Sarang VK

出版商: Packt Publishing
出版日期: 2019-04-08
售價: $1,340
貴賓價: 9.5 折 $1,273
語言: 英文
頁數: 276
裝訂: Paperback
ISBN: 1789955289
ISBN-13: 9781789955286
相關分類: Python、Hadoop、Spark
相關翻譯: Python大數據分析Big Data Analysis with Python (簡中版)

海外代購書籍(需單獨結帳)

買這商品的人也買了...

~~$1,560~~ $1,482

Introducing Character Animation with Blender
~~$600~~ $540

新世紀多媒體導論：理論與應用
~~$650~~ $585

現代多媒體實務與應用
$997

Data Science Essentials in Python: Collect - Organize - Explore - Predict - Value
$2,081

Python for Programmers: with Big Data and Artificial Intelligence Case Studies (Paperback)
~~$1,395~~ $1,325

Hands-On Computer Vision with Tensorflow 2
~~$450~~ $355

2019．2020 系統重灌、調校、故障排除與資料搶救自己來 (超值附贈314分鐘影音講解)
~~$580~~ $458

圖形演算法｜Apache Spark 與 Neo4j 實務範例 (Graph Algorithms)
$1,464

Deep Learning with TensorFlow 2 and Keras, 2/e (Paperback)
~~$680~~ $537

Excel 入門到完整學習邁向最強職場應用—王者歸來 (全彩印刷)
~~$1,500~~ $1,425

Advanced Deep Learning with TensorFlow 2 and Keras - Second Edition
~~$580~~ $458

深度學習的數學地圖 -- 用 Python 實作神經網路的數學模型 (附數學快查學習地圖)
~~$690~~ $587

決心打底！Python 深度學習基礎養成
~~$680~~ $578

AI 醫療 DEEP MEDICINE (Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again)
~~$1,470~~ $1,396

AI for Immunology
~~$380~~ $342

超 Easy！Blender 3D 繪圖設計速成包 - 含 3D列印技巧 - 最新版(第二版)
~~$500~~ $425

速查! 數學大百科事典 – 127 個公式、定理、法則
~~$2,100~~ $2,058

Biology: A Global Approach, 12/e (Paperback)
~~$1,200~~ $948

打下最紮實 AI 基礎不依賴套件：手刻機器學習神經網路穩健前進
~~$540~~ $486

Python 論文數據統計分析
~~$540~~ $529

Python 程式設計 ─ AI 與資料科學應用, 2/e
~~$560~~ $436

人手一本的資安健診實作課：不是專家也能自己動手做！（Win10 / Win11適用）
~~$680~~ $537

資料科學基礎數學 (Essential Math for Data Science)
~~$580~~ $452

未來數位科技活用大全：從 AI 協作、程式設計、資訊安全到大數據分析
~~$840~~ $823

Biology Made Easy

商品描述

Key Features

Get a hands-on, fast-paced introduction to the Python data science stack
Explore ways to create useful metrics and statistics from large datasets
Create detailed analysis reports with real-world data

Book Description

Processing big data in real time is challenging due to scalability, information inconsistency, and fault tolerance. Big Data Analysis with Python teaches you how to use tools that can control this data avalanche for you. With this book, you'll learn practical techniques to aggregate data into useful dimensions for posterior analysis, extract statistical measurements, and transform datasets into features for other systems.

The book begins with an introduction to data manipulation in Python using pandas. You'll then get familiar with statistical analysis and plotting techniques. With multiple hands-on activities in store, you'll be able to analyze data that is distributed on several computers by using Dask. As you progress, you'll study how to aggregate data for plots when the entire data cannot be accommodated in memory. You'll also explore Hadoop (HDFS and YARN), which will help you tackle larger datasets. The book also covers Spark and explains how it interacts with other tools.

By the end of this book, you'll be able to bootstrap your own Python environment, process large files, and manipulate data to generate statistics, metrics, and graphs.

What you will learn

Use Python to read and transform data into different formats
Generate basic statistics and metrics using data on disk
Work with computing tasks distributed over a cluster
Convert data from various sources into storage or querying formats
Prepare data for statistical analysis, visualization, and machine learning
Present data in the form of effective visuals

Who this book is for

Big Data Analysis with Python is designed for Python developers, data analysts, and data scientists who want to get hands-on with methods to control data and transform it into impactful insights. Basic knowledge of statistical measurements and relational databases will help you to understand various concepts explained in this book.

商品描述(中文翻譯)

**主要特點**

- 獲得快速且實作的 Python 數據科學堆疊介紹
- 探索從大型數據集中創建有用的指標和統計數據的方法
- 使用真實數據創建詳細的分析報告

**書籍描述**

實時處理大數據具有挑戰性，因為需要考慮可擴展性、信息不一致性和容錯性。《使用 Python 進行大數據分析》教你如何使用工具來控制這場數據雪崩。通過本書，你將學習實用技術，將數據聚合成有用的維度以便後續分析，提取統計測量，並將數據集轉換為其他系統的特徵。

本書首先介紹如何使用 pandas 進行 Python 中的數據操作。接著，你將熟悉統計分析和繪圖技術。隨著多個實作活動的進行，你將能夠使用 Dask 分析分佈在多台計算機上的數據。隨著進度的推進，你將學習如何在整個數據無法容納於記憶體時聚合數據以進行繪圖。你還將探索 Hadoop（HDFS 和 YARN），這將幫助你處理更大的數據集。本書還涵蓋了 Spark，並解釋它如何與其他工具互動。

在本書結束時，你將能夠啟動自己的 Python 環境，處理大型文件，並操作數據以生成統計數據、指標和圖表。

**你將學到什麼**

- 使用 Python 讀取並將數據轉換為不同格式
- 使用磁碟上的數據生成基本統計數據和指標
- 處理分佈在集群上的計算任務
- 將來自各種來源的數據轉換為存儲或查詢格式
- 準備數據以進行統計分析、可視化和機器學習
- 以有效的視覺形式呈現數據

**本書適合誰**

《使用 Python 進行大數據分析》旨在為希望實作控制數據並將其轉化為有影響力見解的 Python 開發者、數據分析師和數據科學家而設。對統計測量和關聯數據庫的基本知識將幫助你理解本書中解釋的各種概念。

作者簡介

Ivan Marin is a Systems Architect and Data Scientist working at Daitan Group, a Campinas based software company. He designs Big Data systems for large volumes of data, and implements Machine Learning pipelines end to end using Python and Spark. He is also an active organizer of Data Science, Machine Learning and Python in São Paulo and has given Python for Data Science courses at university level.

Sarang VK in his current role as a data scientist, his responsibilities include identifying data sources, data preparation, development, and evaluation of predictive and optimization models for setting up production level machine learning / statistical solutions with back-end and front-end developments. Alongside, he supports pre-sales, stakeholder communication, requirement gathering, scoping, and solutions.

His strengths are Machine / Deep Learning, SQL, Predictive Analytics, Time-Series, Simulation Modelling, Optimization, Image/Text Analytics, NLP, Python, R, Spark, TensorFlow, Keras, h2o, SAP-PAL, AWS, SAP Predictive Factory, Azure, Financial Analytics, Supply Chain, Banking and Insurance, Retail/Customer Analytics, Trading Analytics, Healthcare Analytics, RPA, IPA.

Ankit Shukla is Data Scientist with a passion for using data science & advanced analytics to solve real-life problems and bring ideas to fruition. Skilled in using Machine Learning/AI & statistical modelling techniques to solve business problems & create actual dollar value for clients. Experienced in working with copious amounts of data, using the latest Big Data technologies to design data pipelines and generate impactful data-driven insights & reports.

His skill sets are: R, Python, SQL, HiveQL, Excel, Linux Shell Scripting, SAS (Working Knowledge), Docker Frameworks: Keras, OpenCV, XGBoost, NumPy, Scikit-learn, Caret, ggplot2, recommended lab Big Data: Hadoop, Hive, Impala, PySpark, SparkR, Pig, AWS (S3, EC-2, EMR, Sagemaker, Redshift) Machine Learning: Regression, Classification, Clustering, Feature Selection, Model Selection/Assessment, Recommender Systems, Neural Networks, Deep Learning, Transfer Learning Visualization: Tableau, R, Shiny.

作者簡介(中文翻譯)

伊凡·馬林是Daitan Group的系統架構師和數據科學家，該公司位於坎皮納斯，專注於軟體開發。他設計用於處理大量數據的Big Data系統，並使用Python和Spark實現端到端的機器學習管道。他也是聖保羅數據科學、機器學習和Python的活躍組織者，並在大學層級教授數據科學的Python課程。

薩朗·VK在目前的數據科學家角色中，負責識別數據來源、數據準備、開發和評估預測及優化模型，以建立生產級的機器學習/統計解決方案，並進行後端和前端開發。此外，他還支持售前、利益相關者溝通、需求收集、範疇界定和解決方案。

他的專長包括機器學習/深度學習、SQL、預測分析、時間序列、模擬建模、優化、圖像/文本分析、自然語言處理（NLP）、Python、R、Spark、TensorFlow、Keras、h2o、SAP-PAL、AWS、SAP預測工廠、Azure、金融分析、供應鏈、銀行和保險、零售/客戶分析、交易分析、醫療保健分析、RPA、IPA。

安基特·舒克拉是一名數據科學家，熱衷於利用數據科學和高級分析解決現實問題並實現創意。擅長使用機器學習/人工智慧和統計建模技術來解決商業問題，並為客戶創造實際的經濟價值。擁有處理大量數據的經驗，使用最新的Big Data技術設計數據管道，並生成有影響力的數據驅動見解和報告。

他的技能包括：R、Python、SQL、HiveQL、Excel、Linux Shell腳本、SAS（工作知識）、Docker框架：Keras、OpenCV、XGBoost、NumPy、Scikit-learn、Caret、ggplot2，推薦的實驗室Big Data：Hadoop、Hive、Impala、PySpark、SparkR、Pig、AWS（S3、EC-2、EMR、Sagemaker、Redshift）機器學習：回歸、分類、聚類、特徵選擇、模型選擇/評估、推薦系統、神經網絡、深度學習、轉移學習可視化：Tableau、R、Shiny。