Building Big Data Pipelines with Apache Beam: Use a single programming model for both batch and stream data processing

Lukavský, Jan

  • 出版商: Packt Publishing
  • 出版日期: 2022-01-21
  • 定價: $1,750
  • 售價: 9.0$1,575
  • 語言: 英文
  • 頁數: 342
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1800564937
  • ISBN-13: 9781800564930
  • 相關分類: 大數據 Big-data
  • 立即出貨 (庫存=1)

商品描述

Implement, run, operate, and test data processing pipelines using Apache Beam


Key Features:

  • Understand how to improve usability and productivity when implementing Beam pipelines
  • Learn how to use stateful processing to implement complex use cases using Apache Beam
  • Implement, test, and run Apache Beam pipelines with the help of expert tips and techniques


Book Description:

Apache Beam is an open source unified programming model for implementing and executing data processing pipelines, including Extract, Transform, and Load (ETL), batch, and stream processing.


This book will help you to confidently build data processing pipelines with Apache Beam. You'll start with an overview of Apache Beam and understand how to use it to implement basic pipelines. You'll also learn how to test and run the pipelines efficiently. As you progress, you'll explore how to structure your code for reusability and also use various Domain Specific Languages (DSLs). Later chapters will show you how to use schemas and query your data using (streaming) SQL. Finally, you'll understand advanced Apache Beam concepts, such as implementing your own I/O connectors.


By the end of this book, you'll have gained a deep understanding of the Apache Beam model and be able to apply it to solve problems.


What You Will Learn:

  • Understand the core concepts and architecture of Apache Beam
  • Implement stateless and stateful data processing pipelines
  • Use state and timers for processing real-time event processing
  • Structure your code for reusability
  • Use streaming SQL to process real-time data for increasing productivity and data accessibility
  • Run a pipeline using a portable runner and implement data processing using the Apache Beam Python SDK
  • Implement Apache Beam I/O connectors using the Splittable DoFn API


Who this book is for:

This book is for data engineers, data scientists, and data analysts who want to learn how Apache Beam works. Intermediate-level knowledge of the Java programming language is assumed.

商品描述(中文翻譯)

實施、運行、操作和測試使用Apache Beam的數據處理流程

主要特點:
- 瞭解如何在實施Beam流程時提高可用性和生產力
- 學習如何使用有狀態處理來實現複雜的用例
- 使用專家技巧和技術來實施、測試和運行Apache Beam流程

書籍描述:
Apache Beam是一個開源的統一編程模型,用於實施和執行數據處理流程,包括ETL(提取、轉換和加載)、批處理和流處理。

本書將幫助您自信地使用Apache Beam構建數據處理流程。您將從Apache Beam的概述開始,並瞭解如何使用它來實現基本流程。您還將學習如何高效地測試和運行流程。隨著學習的深入,您將探索如何為可重用性結構化代碼,並使用各種特定領域語言(DSL)。後面的章節將向您展示如何使用模式和(流)SQL查詢數據。最後,您將瞭解高級的Apache Beam概念,例如實現自己的I/O連接器。

通過閱讀本書,您將深入瞭解Apache Beam模型,並能夠應用它來解決問題。

您將學到什麼:
- 瞭解Apache Beam的核心概念和架構
- 實現無狀態和有狀態的數據處理流程
- 使用狀態和計時器進行實時事件處理
- 為可重用性結構化代碼
- 使用流式SQL處理實時數據,提高生產力和數據可訪問性
- 使用可移植運行器運行流程,並使用Apache Beam Python SDK實現數據處理
- 使用可分割的DoFn API實現Apache Beam I/O連接器

本書適合數據工程師、數據科學家和數據分析師,他們希望瞭解Apache Beam的工作原理。假設讀者具有中級水平的Java編程語言知識。