Stream Processing with Apache Spark: Best Practices for Scaling and Optimizing Apache Spark

Francois Garillot, Gerard Maas

  • 出版商: O'Reilly
  • 出版日期: 2019-07-16
  • 定價: $2,380
  • 售價: 9.0$2,142
  • 語言: 英文
  • 頁數: 452
  • 裝訂: Paperback
  • ISBN: 1491944242
  • ISBN-13: 9781491944240
  • 相關分類: Spark
  • 立即出貨 (庫存 < 4)

買這商品的人也買了...

商品描述

To build analytics tools that provide faster insights, knowing how to process data in real time is a must, and moving from batch processing to stream processing is absolutely required. Fortunately, the Spark in-memory framework/platform for processing data has added an extension devoted to fault-tolerant stream processing: Spark Streaming.

If you're familiar with Apache Spark and want to learn how to implement it for streaming jobs, this practical book is a must.

  • Understand how Spark Streaming fits in the big picture
  • Learn core concepts such as Spark RDDs, Spark Streaming clusters, and the fundamentals of a DStream
  • Discover how to create a robust deployment
  • Dive into streaming algorithmics
  • Learn how to tune, measure, and monitor Spark Streaming

商品描述(中文翻譯)

要建立提供更快洞察力的分析工具,了解如何實時處理數據是必須的,從批處理轉向流處理是絕對必要的。幸運的是,用於處理數據的 Spark 內存框架/平台已經添加了一個專門用於容錯流處理的擴展:Spark Streaming。

如果您熟悉 Apache Spark,並且想要學習如何將其應用於流式作業,這本實用書是必讀的。

- 了解 Spark Streaming 在整體架構中的位置
- 學習核心概念,如 Spark RDD、Spark Streaming 集群和 DStream 的基礎知識
- 發現如何創建強大的部署
- 深入研究流式算法
- 學習如何調優、測量和監控 Spark Streaming

作者簡介

Gerard Maas is a Principal Engineer at Lightbend, where he works on the seamless integration of Structured Streaming and other scalable stream processing technologies into the Lightbend Platform. Previously, he worked at a cloud-native IoT startup, where he led the data processing team on building the streaming pipelines that pushed Spark Streaming to its limits in terms of throughput. Back then, he published the first comprehensive guide to tune Spark Streaming performance.

Gerard has held leading roles at several startups and large enterprises, building data science governance, cloud-native IoT platforms, telecom platforms, and scalable APIs. He is a regular speaker at technology conferences and contributes to small and large open source projects. Gerard has a degree in Computer Engineering from the Simón Bolívar University, Venezuela. You can find him on twitter as @maasg.

François Garillot is based in Seattle, where he works on distributed computing at Facebook. He received a Ph.D. from École Polytechnique in 2011, and worked on Spark Streaming's back-pressure while working at Lightbend in 2015. His interests include type systems, leveraging programming languages to make analytics simpler to express, and a passion for Scala, Spark, and roasted arabica. When not at work, he can be found enjoying the mountains of the Pacific Northwest.

作者簡介(中文翻譯)

Gerard Maas是Lightbend的首席工程師,他致力於將Structured Streaming和其他可擴展的流處理技術無縫集成到Lightbend平台中。在此之前,他曾在一家雲原生物聯網初創公司工作,領導數據處理團隊構建了將Spark Streaming的吞吐量推向極限的流水線。當時,他發表了第一本全面指南,以調整Spark Streaming的性能。

Gerard在幾家初創公司和大型企業擔任領導角色,構建了數據科學治理、雲原生物聯網平台、電信平台和可擴展的API。他經常在技術會議上演講,並為小型和大型開源項目做出貢獻。Gerard擁有委內瑞拉Simon Bolivar大學的計算機工程學位。您可以在Twitter上找到他,用戶名為@maasg。

François Garillot目前在西雅圖工作,從事Facebook的分散式計算工作。他於2011年從École Polytechnique獲得博士學位,並在2015年在Lightbend工作期間研究了Spark Streaming的反壓力。他的興趣包括類型系統,利用編程語言使分析更容易表達,以及對Scala、Spark和烘烤阿拉比卡咖啡的熱愛。在工作之餘,他喜歡在太平洋西北地區的山脈中享受生活。