Learning Real Time processing with Spark Streaming

Sumit Gupta

  • 出版商: Packt Publishing
  • 出版日期: 2015-09-29
  • 售價: $1,580
  • 貴賓價: 9.5$1,501
  • 語言: 英文
  • 頁數: 200
  • 裝訂: Paperback
  • ISBN: 1783987669
  • ISBN-13: 9781783987665
  • 相關分類: Spark
  • 下單後立即進貨 (約3~4週)

商品描述

Building scalable and fault-tolerant streaming applications made easy with Spark streaming

About This Book

  • Process live data streams more efficiently with better fault recovery using Spark Streaming
  • Implement and deploy real-time log file analysis
  • Learn about integration with Advance Spark Libraries – GraphX, Spark SQL, and MLib.

Who This Book Is For

This book is intended for big data developers with basic knowledge of Scala but no knowledge of Spark. It will help you grasp the basics of developing real-time applications with Spark and understand efficient programming of core elements and applications.

What You Will Learn

  • Install and configure Spark and Spark Streaming to execute applications
  • Explore the architecture and components of Spark and Spark Streaming to use it as a base for other libraries
  • Process distributed log files in real-time to load data from distributed sources
  • Apply transformations on streaming data to use its functions
  • Integrate Apache Spark with the various advance libraries like MLib and GraphX
  • Apply production deployment scenarios to deploy your application

In Detail

Using practical examples with easy-to-follow steps, this book will teach you how to build real-time applications with Spark Streaming.

Starting with installing and setting the required environment, you will write and execute your first program for Spark Streaming. This will be followed by exploring the architecture and components of Spark Streaming along with an overview of libraries/functions exposed by Spark. Next you will be taught about various client APIs for coding in Spark by using the use-case of distributed log file processing. You will then apply various functions to transform and enrich streaming data. Next you will learn how to cache and persist datasets. Moving on you will integrate Apache Spark with various other libraries/components of Spark like Mlib, GraphX, and Spark SQL. Finally, you will learn about deploying your application and cover the different scenarios ranging from standalone mode to distributed mode using Mesos, Yarn, and private data centers or on cloud infrastructure.

Style and approach

A Step-by-Step approach to learn Spark Streaming in a structured manner, with detailed explanation of basic and advance features in an easy-to-follow Style. Each topic is explained sequentially and supported with real world examples and executable code snippets that appeal to the needs of readers with the wide range of experiences.

商品描述(中文翻譯)

使用Spark Streaming輕鬆建立可擴展且容錯的流應用程式

關於本書
- 使用Spark Streaming更有效地處理即時數據流,並具備更好的容錯恢復能力
- 實現並部署即時日誌文件分析
- 了解與Advance Spark Libraries(GraphX、Spark SQL和MLib)的整合

本書適合對Scala有基本知識但對Spark無知的大數據開發人員。它將幫助您掌握使用Spark開發實時應用程序的基礎知識,並了解核心元素和應用程序的高效編程。

您將學到什麼
- 安裝和配置Spark和Spark Streaming以執行應用程序
- 探索Spark和Spark Streaming的架構和組件,並將其用作其他庫的基礎
- 實時處理分佈式日誌文件以從分佈式源加載數據
- 對流數據應用轉換以使用其功能
- 將Apache Spark與各種高級庫(如MLib和GraphX)集成
- 應用生產部署方案以部署應用程序

詳細內容
本書通過易於跟隨的實例和步驟,教您如何使用Spark Streaming構建實時應用程序。

從安裝和設置所需環境開始,您將編寫並執行Spark Streaming的第一個程式。接著,您將探索Spark Streaming的架構和組件,以及Spark提供的庫/函數的概述。然後,您將通過使用分佈式日誌文件處理的用例,學習有關編寫Spark的各種客戶端API。接下來,您將應用各種函數來轉換和豐富流數據。然後,您將學習如何緩存和持久化數據集。然後,您將將Apache Spark與其他Spark庫/組件(如MLib、GraphX和Spark SQL)集成。最後,您將學習如何部署應用程序,並涵蓋從獨立模式到使用Mesos、Yarn和私有數據中心或雲基礎架構的分佈式模式的不同場景。

風格和方法
本書以結構化的方式逐步學習Spark Streaming,詳細解釋基本和高級功能,並以易於跟隨的方式提供實際示例和可執行的代碼片段,以滿足讀者的各種需求和經驗。