Building Big Data Pipelines with Apache Beam: Use a single programming model for both batch and stream data processing (Paperback)
暫譯: 使用 Apache Beam 建構大數據管道:單一程式模型處理批次與串流數據

Lukavský, Jan

  • 出版商: Packt Publishing
  • 出版日期: 2022-01-21
  • 售價: $2,000
  • 貴賓價: 9.5$1,900
  • 語言: 英文
  • 頁數: 342
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1800564937
  • ISBN-13: 9781800564930
  • 相關分類: 大數據 Big-data
  • 海外代購書籍(需單獨結帳)

買這商品的人也買了...

相關主題

商品描述

Implement, run, operate, and test data processing pipelines using Apache Beam

 

Key Features:

  • Understand how to improve usability and productivity when implementing Beam pipelines
  • Learn how to use stateful processing to implement complex use cases using Apache Beam
  • Implement, test, and run Apache Beam pipelines with the help of expert tips and techniques

 

Book Description:

Apache Beam is an open source unified programming model for implementing and executing data processing pipelines, including Extract, Transform, and Load (ETL), batch, and stream processing.

 

This book will help you to confidently build data processing pipelines with Apache Beam. You'll start with an overview of Apache Beam and understand how to use it to implement basic pipelines. You'll also learn how to test and run the pipelines efficiently. As you progress, you'll explore how to structure your code for reusability and also use various Domain Specific Languages (DSLs). Later chapters will show you how to use schemas and query your data using (streaming) SQL. Finally, you'll understand advanced Apache Beam concepts, such as implementing your own I/O connectors.

 

By the end of this book, you'll have gained a deep understanding of the Apache Beam model and be able to apply it to solve problems.

 

What You Will Learn:

  • Understand the core concepts and architecture of Apache Beam
  • Implement stateless and stateful data processing pipelines
  • Use state and timers for processing real-time event processing
  • Structure your code for reusability
  • Use streaming SQL to process real-time data for increasing productivity and data accessibility
  • Run a pipeline using a portable runner and implement data processing using the Apache Beam Python SDK
  • Implement Apache Beam I/O connectors using the Splittable DoFn API

 

Who this book is for:

This book is for data engineers, data scientists, and data analysts who want to learn how Apache Beam works. Intermediate-level knowledge of the Java programming language is assumed.

商品描述(中文翻譯)

使用 Apache Beam 實作、運行、操作和測試資料處理管道

主要特點:


  • 了解在實作 Beam 管道時如何改善可用性和生產力

  • 學習如何使用有狀態處理來實作複雜的使用案例,使用 Apache Beam

  • 在專家提示和技巧的幫助下,實作、測試和運行 Apache Beam 管道

書籍描述:

Apache Beam 是一個開源的統一程式設計模型,用於實作和執行資料處理管道,包括提取、轉換和加載(ETL)、批次和串流處理。

這本書將幫助你自信地使用 Apache Beam 建立資料處理管道。你將從 Apache Beam 的概述開始,了解如何使用它來實作基本管道。你還將學習如何有效地測試和運行這些管道。隨著進展,你將探索如何結構化你的程式碼以便重用,並使用各種特定領域語言(DSL)。後面的章節將向你展示如何使用模式和使用(串流)SQL 查詢你的資料。最後,你將理解進階的 Apache Beam 概念,例如實作自己的 I/O 連接器。

在這本書結束時,你將深入了解 Apache Beam 模型,並能夠應用它來解決問題。

你將學到什麼:


  • 了解 Apache Beam 的核心概念和架構

  • 實作無狀態和有狀態的資料處理管道

  • 使用狀態和計時器進行即時事件處理

  • 結構化你的程式碼以便重用

  • 使用串流 SQL 處理即時資料,以提高生產力和資料可及性

  • 使用可攜式執行器運行管道,並使用 Apache Beam Python SDK 實作資料處理

  • 使用 Splittable DoFn API 實作 Apache Beam I/O 連接器

本書適合誰:

這本書適合希望了解 Apache Beam 工作原理的資料工程師、資料科學家和資料分析師。假設讀者具備中級的 Java 程式設計語言知識。