Pro Spark Streaming: The Zen of Real-Time Analytics Using Apache Spark (Paperback)

Zubair Nabi

買這商品的人也買了...

商品描述

Learn the right cutting-edge skills and knowledge to leverage Spark Streaming to implement a wide array of real-time, streaming applications. Pro Spark Streaming walks you through end-to-end real-time application development using real-world applications, data, and code. Taking an application-first approach, each chapter introduces use cases from a specific industry and uses publicly available datasets from that domain to unravel the intricacies of production-grade design and implementation. The domains covered in the book include social media, the sharing economy, finance, online advertising, telecommunication, and IoT.

In the last few years, Spark has become synonymous with big data processing. DStreams enhance the underlying Spark processing engine to support streaming analysis with a novel micro-batch processing model. Pro Spark Streaming by Zubair Nabi will enable you to become a specialist of latency sensitive applications by leveraging the key features of DStreams, micro-batch processing, and functional programming. To this end, the book includes ready-to-deploy examples and actual code. Pro Spark Streaming will act as the bible of Spark Streaming.

What You'll Learn:

  • Spark Streaming application development and best practices
  • Low-level details of discretized streams
  • The application and vitality of streaming analytics to a number of industries and domains
  • Optimization of production-grade deployments of Spark Streaming via configuration recipes and instrumentation using Graphite, collectd, and Nagios
  • Ingestion of data from disparate sources including MQTT, Flume, Kafka, Twitter, and a custom HTTP receiver
  • Integration and coupling with HBase, Cassandra, and Redis
  • Design patterns for side-effects and maintaining state across the Spark Streaming micro-batch model
  • Real-time and scalable ETL using data frames, SparkSQL, Hive, and SparkR
  • Streaming machine learning, predictive analytics, and recommendations
  • Meshing batch processing with stream processing via the Lambda architecture
Who This Book Is For:

The audience includes data scientists, big data experts, BI analysts, and data architects.

商品描述(中文翻譯)

學習正確的前沿技能和知識,利用Spark Streaming實現各種實時流應用。《Pro Spark Streaming》通過真實應用、數據和代碼,引導您進行端到端的實時應用開發。每一章都以應用為先導,介紹特定行業的用例,並使用該領域的公開可用數據集來揭示生產級設計和實現的細節。本書涵蓋的領域包括社交媒體、共享經濟、金融、在線廣告、電信和物聯網。

在過去幾年中,Spark已成為大數據處理的代名詞。DStreams通過新穎的微批處理模型增強了底層的Spark處理引擎,以支持流分析。《Pro Spark Streaming》將使您能夠通過利用DStreams、微批處理和函數式編程的關鍵特性,成為延遲敏感應用的專家。為此,本書提供了可即時部署的示例和實際代碼。《Pro Spark Streaming》將成為Spark Streaming的聖經。

您將學到什麼:
- Spark Streaming應用程序開發和最佳實踐
- 離散流的低級細節
- 流分析在多個行業和領域中的應用和重要性
- 通過配置配方和使用Graphite、collectd和Nagios進行生產級Spark Streaming部署的優化
- 從不同來源(包括MQTT、Flume、Kafka、Twitter和自定義HTTP接收器)載入數據
- 與HBase、Cassandra和Redis的集成和耦合
- 在Spark Streaming微批處理模型中維護副作用和狀態的設計模式
- 使用數據框、SparkSQL、Hive和SparkR進行實時且可擴展的ETL
- 流式機器學習、預測分析和推薦
- 通過Lambda架構將批處理與流處理相結合

本書適合對象:
- 數據科學家、大數據專家、商業智能分析師和數據架構師。