Spark Cookbook (Paperback)

Rishi Yadav

  • 出版商: Packt Publishing
  • 出版日期: 2015-06-19
  • 定價: $1,470
  • 售價: 9.0$1,323
  • 語言: 英文
  • 頁數: 221
  • 裝訂: Paperback
  • ISBN: 1783987065
  • ISBN-13: 9781783987061
  • 相關分類: Spark
  • 立即出貨 (庫存=1)

買這商品的人也買了...

商品描述

Over 60 recipes on Spark, covering Spark Core, Spark SQL, Spark Streaming, MLlib, and GraphX libraries

About This Book

  • Become an expert at graph processing using GraphX
  • Use Apache Spark as your single big data compute platform and master its libraries
  • Learn with recipes that can be run on a single machine as well as on a production cluster of thousands of machines

Who This Book Is For

If you are a data engineer, an application developer, or a data scientist who would like to leverage the power of Apache Spark to get better insights from big data, then this is the book for you.

What You Will Learn

  • Install and configure Apache Spark with various cluster managers
  • Set up development environments
  • Perform interactive queries using Spark SQL
  • Get to grips with real-time streaming analytics using Spark Streaming
  • Master supervised learning and unsupervised learning using MLlib
  • Build a recommendation engine using MLlib
  • Develop a set of common applications or project types, and solutions that solve complex big data problems
  • Use Apache Spark as your single big data compute platform and master its libraries

In Detail

By introducing in-memory persistent storage, Apache Spark eliminates the need to store intermediate data in filesystems, thereby increasing processing speed by up to 100 times.

This book will focus on how to analyze large and complex sets of data. Starting with installing and configuring Apache Spark with various cluster managers, you will cover setting up development environments. You will then cover various recipes to perform interactive queries using Spark SQL and real-time streaming with various sources such as Twitter Stream and Apache Kafka. You will then focus on machine learning, including supervised learning, unsupervised learning, and recommendation engine algorithms. After mastering graph processing using GraphX, you will cover various recipes for cluster optimization and troubleshooting.

商品描述(中文翻譯)

超過60個關於Spark的食譜,涵蓋Spark Core、Spark SQL、Spark Streaming、MLlib和GraphX庫。

關於本書
- 成為使用GraphX進行圖形處理的專家
- 將Apache Spark用作您的單一大數據計算平台並掌握其庫
- 學習可以在單台機器上運行以及在數千台機器的生產集群上運行的食譜

本書適合對Apache Spark的數據工程師、應用程序開發人員或數據科學家,他們希望利用Apache Spark的強大功能從大數據中獲得更好的洞察力。

您將學到什麼
- 安裝和配置帶有各種集群管理器的Apache Spark
- 設置開發環境
- 使用Spark SQL進行交互式查詢
- 使用Spark Streaming進行實時流分析
- 掌握使用MLlib進行監督學習和無監督學習
- 使用MLlib構建推薦引擎
- 開發一組常見的應用程序或項目類型,以及解決複雜大數據問題的解決方案
- 將Apache Spark用作您的單一大數據計算平台並掌握其庫

詳細內容
通過引入內存持久存儲,Apache Spark消除了在文件系統中存儲中間數據的需要,從而將處理速度提高了100倍。

本書將重點介紹如何分析大型和複雜的數據集。從安裝和配置帶有各種集群管理器的Apache Spark開始,您將學習如何設置開發環境。然後,您將學習使用Spark SQL進行交互式查詢以及使用Twitter Stream和Apache Kafka等各種來源進行實時流分析的食譜。接下來,您將專注於機器學習,包括監督學習、無監督學習和推薦引擎算法。在掌握使用GraphX進行圖形處理之後,您將學習各種集群優化和故障排除的食譜。