Automating Data Quality Monitoring: Scaling Beyond Rules with Machine Learning (Paperback)

Stanley, Jeremy, Schwartz, Paige

  • 出版商: O'Reilly
  • 出版日期: 2024-02-13
  • 售價: $2,360
  • 貴賓價: 9.5$2,242
  • 語言: 英文
  • 頁數: 217
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1098145933
  • ISBN-13: 9781098145934
  • 相關分類: Machine Learning
  • 海外代購書籍(需單獨結帳)

買這商品的人也買了...

商品描述

The world's businesses ingest a combined 2.5 quintillion bytes of data every day. But how much of this vast amount of data--used to build products, power AI systems, and drive business decisions--is poor quality or just plain bad? This practical book shows you how to ensure that the data your organization relies on contains only high-quality records.

Most data engineers, data analysts, and data scientists genuinely care about data quality, but they often don't have the time, resources, or understanding to create a data quality monitoring solution that succeeds at scale. In this book, Jeremy Stanley and Paige Schwartz from Anomalo explain how you can use automated data quality monitoring to cover all your tables efficiently, proactively alert on every category of issue, and resolve problems immediately.

This book will help you:

  • Learn why data quality is a business imperative
  • Understand and assess unsupervised learning models for detecting data issues
  • Implement notifications that reduce alert fatigue and let you triage and resolve issues quickly
  • Integrate automated data quality monitoring with data catalogs, orchestration layers, and BI and ML systems
  • Understand the limits of automated data quality monitoring and how to overcome them
  • Learn how to deploy and manage your monitoring solution at scale
  • Maintain automated data quality monitoring for the long term

商品描述(中文翻譯)

全球企業每天處理著共計2.5千萬億位元組的數據。然而,這龐大的數據量中有多少是低質量或根本就是糟糕的呢?這本實用的書籍將向您展示如何確保您的組織所依賴的數據只包含高質量的記錄。

大多數數據工程師、數據分析師和數據科學家都真正關心數據質量,但他們通常沒有時間、資源或理解力來建立一個能夠在大規模上成功的數據質量監控解決方案。在這本書中,來自Anomalo的Jeremy Stanley和Paige Schwartz將解釋如何使用自動化的數據質量監控來高效地覆蓋所有表格,主動警示各種問題類別並立即解決問題。

本書將幫助您:
- 了解為何數據質量是業務的必要條件
- 理解並評估無監督學習模型以檢測數據問題
- 實施能減少警示疲勞、讓您快速分類和解決問題的通知系統
- 將自動化的數據質量監控與數據目錄、編排層、商業智能和機器學習系統整合
- 了解自動化數據質量監控的限制以及如何克服它們
- 學習如何在大規模上部署和管理您的監控解決方案
- 長期維護自動化數據質量監控