Unlocking Dbt: Design and Deploy Transformations in Your Cloud Data Warehouse (Paperback)

Cyr, Cameron, Dorsey, Dustin

  • 出版商: Apress
  • 出版日期: 2023-09-26
  • 定價: $2,100
  • 售價: 9.5$1,995
  • 貴賓價: 9.0$1,890
  • 語言: 英文
  • 頁數: 356
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1484296990
  • ISBN-13: 9781484296998
  • 相關分類: 資料庫
  • 立即出貨 (庫存=1)

買這商品的人也買了...

商品描述

This book shows how dbt is used to build data transformation pipelines that enable dependency management and allow for version control and automated testing. It explains how dbt is revolutionizing data transformation and the advantages that a command-line tool like dbt provides over and above the use of database stored procedures and other ETL and ELT tools that handle data transformations. You'll see how to create custom-written transformations through simple SQL SELECT statements, eliminating the need for boilerplate code and making it easy to incorporate dbt as the transformation layer in your data warehouse pipelines. Additionally, you will learn how dbt enables data teams to incorporate software engineering best practices such as code reusability, version control, and automated testing into the data transformation process.

 

Unlocking dbt walks you through using dbt to establish a project, build and modularize SQL models, and execute jobs in a way that is easy to maintain and scale as your data ecosystem matures. You'll begin by establishing and configuring a project, a process covered using both dbt Cloud and dbt Core, so that you can confidently stand up a project using either platform. From there, you'll move into building transformations with peace of mind that your project will scale appropriately as you continue to develop it.

 

After learning the basics needed to get started, you'll continue to build on that foundation by looking at the unique ways in which dbt combines SQL with Jinja to take your code beyond what is capable in normal SQL. You will learn about advanced materializations, building lineage in your data flows, the unlimited potential of macros, and so much more. This book also explores supported file types and the building of Python models. Rounding things out, you will learn features of dbt that will assist you in making your transformation layer production ready. These include how to implement automated testing, using dbt to generate documentation, and running CI/CD pipelines.

 

What You Will Learn

 

  • Understand what dbt is and how it is used in the modern data stack
  • Set up a project using both dbt Cloud and dbt Core
  • Connect a dbt project to a cloud data warehouse
  • Build SQL and Python models that are scalable and maintainable
  • Configure development, testing, and production environments
  • Capture reusable logic in the form of Jinja macros
  • Incorporate version control with your data transformation code

 

 

Who This Book Is For

Current and aspiring data professionals, including architects, developers, analysts, engineers, data scientists, and consultants who are beginning the journey of using dbt as part of their data pipeline's transformation layer. Readers should have a foundational knowledge of writing basic SQL statements, development best practices, and working with data in an analytical context such as a data warehouse.

商品描述(中文翻譯)

本書展示了如何使用dbt建立數據轉換流水線,實現依賴管理、版本控制和自動化測試。它解釋了dbt如何革新數據轉換,以及命令行工具dbt相對於使用數據庫存儲過程和其他處理數據轉換的ETL和ELT工具的優勢。您將看到如何通過簡單的SQL SELECT語句創建自定義轉換,消除樣板代碼的需求,並輕鬆將dbt作為數據倉庫流水線中的轉換層。此外,您還將了解dbt如何使數據團隊能夠將軟件工程最佳實踐(例如代碼可重用性、版本控制和自動化測試)納入數據轉換過程中。

《解鎖dbt》將引導您使用dbt建立項目、構建和模塊化SQL模型,以及以易於維護和擴展的方式執行作業,以應對數據生態系統的成熟度提升。您將首先建立和配置一個項目,使用dbt Cloud和dbt Core兩種平台進行操作,以便能夠自信地在任一平台上建立項目。然後,您將進入構建轉換的階段,並確保項目在不斷開發的過程中能夠適當地擴展。

在學習了入門基礎知識後,您將進一步深入研究dbt將SQL與Jinja結合的獨特方式,將代碼提升到超越常規SQL的水平。您將學習高級物化、在數據流程中建立血緣關係、宏的無限潛力等等。本書還探討了支持的文件類型和Python模型的構建。最後,您將學習dbt的功能,以使您的轉換層準備就緒。這些功能包括如何實施自動化測試、使用dbt生成文檔以及運行CI/CD流水線。

《本書的收益》
- 了解dbt及其在現代數據堆棧中的應用
- 使用dbt Cloud和dbt Core建立項目
- 將dbt項目連接到雲數據倉庫
- 構建可擴展且易於維護的SQL和Python模型
- 配置開發、測試和生產環境
- 通過Jinja宏捕獲可重用邏輯
- 將版本控制與數據轉換代碼結合

《本書適合對象》
本書適合當前和有志於成為數據專業人士的人,包括架構師、開發人員、分析師、工程師、數據科學家和顧問,他們正在開始使用dbt作為數據流水線轉換層的一部分。讀者應具備基礎的SQL語句編寫知識、開發最佳實踐以及在數據倉庫等分析環境中處理數據的經驗。

作者簡介


Cameron Cyr is a data fanatic who has spent his career developing data systems enabling valuable use cases such as analytics and machine learning. During this time, he has placed a focus on building reliable and scalable data systems with an emphasis on data quality. He is active in the data community and is one of the co-organizers and founders of Nashville's Data Engineering Group. Cameron currently serves as a data engineer for a healthcare tech startup.

Dustin Dorsey is a data leader and architect who has been building and managing data solutions for nearly 15 years. He is currently leading the build out of data infrastructure and analytics environments for a fast-growing healthcare tech startup. Dustin is a well-respected leader in the data community as an international speaker and mentor. He has previously organized several data community events and user groups and currently is one of the founders and organizers of the Nashville Data Engineering group. Dustin is one of the authors of the popular Apress book, Pro Database Migration to Azure.

 

作者簡介(中文翻譯)

Cameron Cyr 是一位數據迷,他一直致力於開發數據系統,實現分析和機器學習等有價值的應用案例。在此期間,他專注於構建可靠且可擴展的數據系統,並強調數據質量。他活躍於數據社區,是納什維爾數據工程小組的聯合組織者和創始人之一。目前,Cameron 在一家醫療技術初創公司擔任數據工程師。

Dustin Dorsey 是一位數據領導者和架構師,他已經建立和管理數據解決方案近15年。他目前正在領導一家快速增長的醫療技術初創公司的數據基礎設施和分析環境的構建。Dustin 是數據社區中受人尊敬的領導者,是國際演講者和導師。他曾組織過多個數據社區活動和用戶組,目前是納什維爾數據工程小組的創始人和組織者之一。Dustin 是暢銷書《Pro Database Migration to Azure》的作者之一。