Data Quality Fundamentals: A Practitioner's Guide to Building Trustworthy Data Pipelines (Paperback)

Moses, Barr, Gavish, Lior, Vorwerck, Molly

  • 出版商: O'Reilly
  • 出版日期: 2022-10-11
  • 定價: $2,290
  • 售價: 8.0$1,832 (限時優惠至 2024-04-28)
  • 語言: 英文
  • 頁數: 308
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1098112040
  • ISBN-13: 9781098112042
  • 相關分類: 大數據 Big-dataData Science
  • 立即出貨

買這商品的人也買了...

商品描述

Do your product dashboards look funky? Are your quarterly reports stale? Is the dataset you're using broken or just plain wrong? These problems affect almost every team, yet they're usually addressed on an ad hoc basis and in a reactive manner. If you answered yes to any of the questions above, this book is for you.

Many data engineering teams today face the "good pipelines, bad data" problem. It doesn't matter how advanced your data infrastructure is if the data you're piping is bad. In this book, Barr Moses, Lior Gavish, and Molly Vorwerck from the data reliability company Monte Carlo explain how to tackle data quality and trust at scale by leveraging best practices and technologies used by some of the world's most innovative companies.

  • Build more trustworthy and reliable data pipelines
  • Write scripts to make data checks and identify broken pipelines with data observability
  • Program your own data quality monitors from scratch
  • Develop and lead data quality initiatives at your company
  • Generate a dashboard to highlight your company's key data assets
  • Automate data lineage graphs across your data ecosystem
  • Build anomaly detectors for your critical data assets

商品描述(中文翻譯)

你的產品儀表板看起來很奇怪嗎?你的季度報告是否陳舊無味?你使用的數據集是否有問題或者完全錯誤?這些問題幾乎影響到每個團隊,然而通常都是以臨時和被動的方式來解決。如果你對以上任何問題回答是肯定的,那麼這本書就是為你而寫的。

如今,許多數據工程團隊都面臨著「好的管道,壞的數據」的問題。如果你傳輸的數據是錯誤的,那麼你的數據基礎設施有多先進也沒有用。在這本書中,來自數據可靠性公司Monte Carlo的Barr Moses、Lior Gavish和Molly Vorwerck解釋了如何通過利用一些世界上最具創新性的公司使用的最佳實踐和技術來解決大規模的數據質量和信任問題。

- 構建更可靠和可信賴的數據管道
- 編寫腳本進行數據檢查,並識別帶有數據可觀察性的破損管道
- 從頭開始編寫自己的數據質量監控器
- 在公司中開展和領導數據質量倡議
- 生成一個儀表板來突出顯示公司的關鍵數據資產
- 自動化數據生態系統中的數據譜線圖
- 為關鍵數據資產構建異常檢測器