Engineering Lakehouses with Open Table Formats : Build scalable and efficient lakehouses with Apache Iceberg, Apache Hudi, and Delta Lake (Paperback)
暫譯: 使用開放表格格式建構工程湖倉：利用 Apache Iceberg、Apache Hudi 和 Delta Lake 建立可擴展且高效的湖倉 (平裝本)

Name: Engineering Lakehouses with Open Table Formats : Build scalable and efficient lakehouses with Apache Iceberg, Apache Hudi, and Delta Lake (Paperback)
Price: 1615 TWD
Availability: InStock
Author: Mazumdar, Dipankar, Govindarajan, Vinoth
ISBN: 1836207239

Mazumdar, Dipankar, Govindarajan, Vinoth

Engineering Lakehouses with Open Table Formats : Build scalable and efficient lakehouses with Apache Iceberg, Apache Hudi, and Delta Lake (Paperback)

出版商: Packt Publishing
出版日期: 2025-12-26
售價: $1,700
貴賓價: 9.5 折 $1,615
語言: 英文
頁數: 414
裝訂: Quality Paper - also called trade paper
ISBN: 1836207239
ISBN-13: 9781836207238
相關分類: 大數據 Big-data

立即出貨 (庫存=1)

買這商品的人也買了...

~~$2,603~~ $2,466

Building Data Integration Solutions: Unifying Data for Enhanced Decision Making
~~$2,318~~ $2,196

The Product-Minded Engineer: Building Impactful Software for Your Users
~~$2,603~~ $2,466

Building Machine Learning Systems with a Feature Store: Batch, Real-Time, and LLM Systems

商品描述

Jump-start your journey toward mastering open data architectural patterns by learning the fundamentals and applications of open table formats

Key Features:

- Build lakehouses with open table formats using compute engines such as Apache Spark, Flink, Trino, and Python

- Optimize lakehouses with techniques such as pruning, partitioning, compaction, indexing, and clustering

- Find out how to enable seamless integration, data management, and interoperability using Apache XTable

- Purchase of the print or Kindle book includes a free PDF eBook

Book Description:

Engineering Lakehouses with Open Table Formats provides detailed insights into lakehouse concepts, and dives deep into the practical implementation of open table formats such as Apache Iceberg, Apache Hudi, and Delta Lake.

You'll explore the internals of a table format and learn in detail about the transactional capabilities of lakehouses. You'll also get hands on with each table format with exercises using popular computing engines, such as Apache Spark, Flink, Trino, and Python-based tools. The book addresses advanced topics, including performance optimization techniques and interoperability among different formats, equipping you to build production-ready lakehouses. With step-by-step explanations, you'll get to grips with the key components of lakehouse architecture and learn how to build, maintain, and optimize them.

By the end of this book, you'll be proficient in evaluating and implementing open table formats, optimizing lakehouse performance, and applying these concepts to real-world scenarios, ensuring you make informed decisions in selecting the right architecture for your organization's data needs.

What You Will Learn:

- Explore lakehouse fundamentals, such as table formats, file formats, compute engines, and catalogs

- Gain a complete understanding of data lifecycle management in lakehouses

- Learn how to systematically evaluate and choose the right lakehouse table format

- Optimize performance with sorting, clustering, and indexing techniques

- Use the open table format data with ML frameworks like TensorFlow and MLflow

- Interoperate across different table formats with Apache XTable and UniForm

- Secure your lakehouse with access controls and ensure regulatory compliance

Who this book is for:

This book is for data engineers, software engineers, and data architects who want to deepen their understanding of open table formats, such as Apache Iceberg, Apache Hudi, and Delta Lake, and see how they are used to build lakehouses. It is also valuable for professionals working with traditional data warehouses, relational databases, and data lakes who wish to transition to an open data architectural pattern. Basic knowledge of databases, Python, Apache Spark, Java, and SQL is recommended for a smooth learning experience.

Table of Contents

- Open Data Lakehouse: A New Architectural Paradigm

- Transactional Capabilities of the Lakehouse

- Apache Iceberg Deep Dive

- Apache Hudi Deep Dive

- Delta Lake Deep Dive

- Catalog and Metadata Management

- Interoperability in Lakehouses

- Performance Optimization and Tuning in a Lakehouse

- Data Governance and Security in Lakehouses

- Evaluating and Selecting Open Table Formats

- Real-World Applications and Learnings

商品描述(中文翻譯)

**開始掌握開放數據架構模式的旅程，學習開放表格格式的基本原理和應用**

**主要特點：**

- 使用 Apache Spark、Flink、Trino 和 Python 等計算引擎構建湖倉（lakehouses）與開放表格格式
- 使用修剪、分區、壓縮、索引和聚類等技術優化湖倉
- 瞭解如何使用 Apache XTable 實現無縫整合、數據管理和互操作性
- 購買印刷版或 Kindle 書籍可獲得免費 PDF 電子書

**書籍描述：**

《使用開放表格格式工程湖倉》提供了湖倉概念的詳細見解，深入探討了 Apache Iceberg、Apache Hudi 和 Delta Lake 等開放表格格式的實際實施。

您將探索表格格式的內部結構，詳細了解湖倉的事務能力。您還將通過使用流行計算引擎（如 Apache Spark、Flink、Trino 和基於 Python 的工具）的練習，親自操作每種表格格式。本書涵蓋了高級主題，包括性能優化技術和不同格式之間的互操作性，使您能夠構建生產就緒的湖倉。通過逐步解釋，您將掌握湖倉架構的關鍵組件，並學習如何構建、維護和優化它們。

在本書結束時，您將能夠熟練評估和實施開放表格格式，優化湖倉性能，並將這些概念應用於現實場景，確保您在選擇適合您組織數據需求的架構時做出明智的決策。

**您將學到什麼：**

- 探索湖倉的基本原理，如表格格式、文件格式、計算引擎和目錄
- 完整理解湖倉中的數據生命周期管理
- 學習如何系統性地評估和選擇合適的湖倉表格格式
- 使用排序、聚類和索引技術優化性能
- 使用開放表格格式數據與 ML 框架（如 TensorFlow 和 MLflow）
- 使用 Apache XTable 和 UniForm 在不同表格格式之間進行互操作
- 通過訪問控制保護您的湖倉，並確保遵守法規

**本書適合誰：**

本書適合希望深入了解開放表格格式（如 Apache Iceberg、Apache Hudi 和 Delta Lake）並了解如何使用它們構建湖倉的數據工程師、軟體工程師和數據架構師。對於希望過渡到開放數據架構模式的傳統數據倉庫、關聯數據庫和數據湖的專業人士也具有價值。建議具備數據庫、Python、Apache Spark、Java 和 SQL 的基本知識，以便順利學習。

**目錄：**

- 開放數據湖倉：一種新的架構範式
- 湖倉的事務能力
- Apache Iceberg 深入探討
- Apache Hudi 深入探討
- Delta Lake 深入探討
- 目錄和元數據管理
- 湖倉中的互操作性
- 湖倉中的性能優化和調整
- 湖倉中的數據治理和安全性
- 評估和選擇開放表格格式
- 實際應用和學習

Engineering Lakehouses with Open Table Formats : Build scalable and efficient lakehouses with Apache Iceberg, Apache Hudi, and Delta Lake (Paperback) 暫譯: 使用開放表格格式建構工程湖倉：利用 Apache Iceberg、Apache Hudi 和 Delta Lake 建立可擴展且高效的湖倉 (平裝本)

Mazumdar, Dipankar, Govindarajan, Vinoth

買這商品的人也買了...

商品描述

商品描述(中文翻譯)

類似商品

Engineering Lakehouses with Open Table Formats : Build scalable and efficient lakehouses with Apache Iceberg, Apache Hudi, and Delta Lake (Paperback)
暫譯: 使用開放表格格式建構工程湖倉：利用 Apache Iceberg、Apache Hudi 和 Delta Lake 建立可擴展且高效的湖倉 (平裝本)