Learning Spark: Lightning-Fast Data Analytics, 2/e (Paperback)
暫譯: 學習 Spark:閃電般快速的數據分析,第二版(平裝本)
Damji, Jules S., Wenig, Brooke, Das, Tathagata
- 出版商: O'Reilly
- 出版日期: 2020-08-25
- 定價: $2,800
- 售價: 8.8 折 $2,464 (限時優惠至 2025-02-02)
- 語言: 英文
- 頁數: 300
- 裝訂: Quality Paper - also called trade paper
- ISBN: 1492050040
- ISBN-13: 9781492050049
-
相關分類:
Spark、Data Science
-
相關翻譯:
Spark快速大數據分析 第2版 (簡中版)
立即出貨 (庫存=1)
買這商品的人也買了...
-
$520$442 -
$1,892Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems (Paperback)
-
$1,960Spark: The Definitive Guide: Big Data Processing Made Simple (Paperback)
-
$520$410 -
$580$493 -
$680$578 -
$580$458 -
$450$356 -
$699$594 -
$780$616 -
$1,200$948 -
$680$578 -
$780$663 -
$620$490 -
$600$468 -
$680$578 -
$500$450 -
$520$411 -
$630$497 -
$680$537 -
$1,800$1,710 -
$480$379 -
$720$562 -
$630$498 -
$680$340
相關主題
商品描述
Data is bigger, arrives faster, and comes in a variety of formats--and it all needs to be processed at scale for analytics or machine learning. But how can you process such varied workloads efficiently? Enter Apache Spark.
Updated to include Spark 3.0, this second edition shows data engineers and data scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine learning algorithms. Through step-by-step walk-throughs, code snippets, and notebooks, you'll be able to:
- Learn Python, SQL, Scala, or Java high-level Structured APIs
- Understand Spark operations and SQL Engine
- Inspect, tune, and debug Spark operations with Spark configurations and Spark UI
- Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka
- Perform analytics on batch and streaming data using Structured Streaming
- Build reliable data pipelines with open source Delta Lake and Spark
- Develop machine learning pipelines with MLlib and productionize models using MLflow
商品描述(中文翻譯)
資料的規模更大、到達速度更快,並且以多種格式出現——所有這些都需要在大規模下進行處理,以便進行分析或機器學習。但是,如何有效地處理這些多樣化的工作負載呢?這就是 Apache Spark 的用武之地。
本書第二版已更新至 Spark 3.0,向資料工程師和資料科學家說明了 Spark 中結構和統一的重要性。具體而言,本書解釋了如何執行簡單和複雜的資料分析以及使用機器學習算法。通過逐步的指導、程式碼片段和筆記本,您將能夠:
- 學習 Python、SQL、Scala 或 Java 的高階結構化 API
- 理解 Spark 操作和 SQL 引擎
- 使用 Spark 配置和 Spark UI 檢查、調整和除錯 Spark 操作
- 連接到資料來源:JSON、Parquet、CSV、Avro、ORC、Hive、S3 或 Kafka
- 使用結構化流處理對批次和串流資料進行分析
- 使用開源 Delta Lake 和 Spark 建立可靠的資料管道
- 使用 MLlib 開發機器學習管道,並使用 MLflow 將模型投入生產
作者簡介
Jules S. Damji is a senior developer advocate at Databricks and an MLflow contributor. He is a hands-on developer with over 20 years of experience and has worked as a software engineer at leading companies such as Sun Microsystems, Netscape, @Home, Loudcloud/Opsware, Verisign, ProQuest, and Hortonworks, building large scale distributed systems. He holds a B.Sc. and an M.Sc. in computer science and an MA in political advocacy and communication from Oregon State University, Cal State, and Johns Hopkins University, respectively.
Brooke Wenig is a machine learning practice lead at Databricks. She leads a team of data scientists who develop large-scale machine learning pipelines for customers, as well as teaching courses on distributed machine learning best practices. Previously, she was a principal data science consultant at Databricks. She holds an M.S. in computer science from UCLA with a focus on distributed machine learning.
Tathagata Das is a staff software engineer at Databricks, an Apache Spark committer, and a member of the Apache Spark Project Management Committee (PMC). He is one of the original developers of Apache Spark, the lead developer of Spark Streaming (DStreams), and is currently one of the core developers of Structured Streaming and Delta Lake. Tathagata holds an M.S. in computer science from UC Berkeley.
Denny Lee is a staff developer advocate at Databricks who has been working with Apache Spark since 0.6. He is a hands-on distributed systems and data sciences engineer with extensive experience developing internet-scale infrastructure, data platforms, and predictive analytics systems for both on-premises and cloud environments. He also has an M.S. in biomedical informatics from Oregon Health and Sciences University and has architected and implemented powerful data solutions for enterprise healthcare customers.
作者簡介(中文翻譯)
Jules S. Damji 是 Databricks 的資深開發者倡導者,也是 MLflow 的貢獻者。他是一位實務開發者,擁有超過 20 年的經驗,曾在 Sun Microsystems、Netscape、@Home、Loudcloud/Opsware、Verisign、ProQuest 和 Hortonworks 等領先公司擔任軟體工程師,負責構建大規模分散式系統。他擁有俄勒岡州立大學、加州州立大學和約翰霍普金斯大學的計算機科學學士和碩士學位,以及政治倡導與傳播的碩士學位。
Brooke Wenig 是 Databricks 的機器學習實踐負責人。她領導一支數據科學家團隊,為客戶開發大規模機器學習管道,並教授分散式機器學習最佳實踐的課程。她之前是 Databricks 的首席數據科學顧問。她擁有加州大學洛杉磯分校的計算機科學碩士學位,專注於分散式機器學習。
Tathagata Das 是 Databricks 的員工軟體工程師,Apache Spark 的提交者,以及 Apache Spark 專案管理委員會 (PMC) 的成員。他是 Apache Spark 的原始開發者之一,Spark Streaming (DStreams) 的首席開發者,目前是結構化流 (Structured Streaming) 和 Delta Lake 的核心開發者之一。Tathagata 擁有加州大學伯克利分校的計算機科學碩士學位。
Denny Lee 是 Databricks 的員工開發者倡導者,自 0.6 版本以來一直在使用 Apache Spark。他是一位實務的分散式系統和數據科學工程師,擁有豐富的經驗,開發互聯網規模的基礎設施、數據平台和預測分析系統,適用於本地和雲端環境。他還擁有俄勒岡健康與科學大學的生物醫學資訊學碩士學位,並為企業醫療客戶設計和實施了強大的數據解決方案。