Data Cleaning: A Practical Perspective (Synthesis Lectures on Data Management)
暫譯: 數據清理：實用觀點（數據管理綜合講座）

Name: Data Cleaning: A Practical Perspective (Synthesis Lectures on Data Management)
Price: 1083 TWD
Availability: OnlineOnly
Author: Venkatesh Ganti, Anish Das Sarma
ISBN: 1608456773

Venkatesh Ganti, Anish Das Sarma

出版商: Morgan & Claypool
出版日期: 2013-09-01
售價: $1,140
貴賓價: 9.5 折 $1,083
語言: 英文
頁數: 86
裝訂: Paperback
ISBN: 1608456773
ISBN-13: 9781608456772

海外代購書籍(需單獨結帳)

買這商品的人也買了...

~~$1,000~~ $900

時間序列分析
$294

超標量處理器設計
~~$1,570~~ $1,492

Clean Data - Data Science Strategies for Tackling Dirty Data
~~$520~~ $411

實戰 Google 深度學習技術：使用 TensorFlow
$564

從零開始學架構：照著做，你也能成為架構師
~~$650~~ $553

一書貫通 -- 從資料科學橫入人工智慧領域
~~$420~~ $331

MIS 一定要懂的 82個伺服器建置與管理知識
~~$300~~ $255

數據分析的力量 Google、Uber 都在用的因果關係思考法
~~$520~~ $411

A-Life｜使用 Python 實作人工生命模型
$280

特徵工程入門與實踐 (Feature Engineering Made Easy)
~~$680~~ $578

因果革命：人工智慧的大未來 (硬殼精裝)(The Book of Why: The New Science of Cause and Effect)
~~$480~~ $379

建立演進式系統架構｜支援常態性的變更 (Building Evolutionary Architectures： Support Constant Change)
~~$880~~ $695

這就是服務設計！｜服務設計工作者的實踐指南 (This Is Service Design Doing: Applying Service Design Thinking in the Real World)
~~$380~~ $342

失控的數據：數字管理的誤用與濫用，如何影響我們的生活與工作，甚至引發災難
~~$580~~ $458

深度學習｜生命科學應用 (Deep Learning for the Life Sciences)
~~$680~~ $510

持續交付 2.0：實務導向的 DevOps
~~$690~~ $345

動手做深度強化學習 (Deep Reinforcement Learning Hands-On)
$378

產品經理方法論構建完整的產品知識體系
~~$760~~ $532

Node.js 量化投資全攻略：從資料收集到自動化交易系統建構實戰（iThome鐵人賽系列書）【軟精裝】
$354

多旋翼無人機嵌入式飛控開發實戰
~~$680~~ $510

半導體製程入門：從零開始了解晶片製造
$696

可信邊緣服務技術
$239

無人系統安全
$414

7天造一臺無人機：飛控、電池、動力系統、地面站全解析
$768

多機器人協同控制技術

商品描述

Data warehouses consolidate various activities of a business and often form the backbone for generating reports that support important business decisions. Errors in data tend to creep in for a variety of reasons. Some of these reasons include errors during input data collection and errors while merging data collected independently across different databases. These errors in data warehouses often result in erroneous upstream reports, and could impact business decisions negatively. Therefore, one of the critical challenges while maintaining large data warehouses is that of ensuring the quality of data in the data warehouse remains high. The process of maintaining high data quality is commonly referred to as data cleaning.

In this book, we first discuss the goals of data cleaning. Often, the goals of data cleaning are not well defined and could mean different solutions in different scenarios. Toward clarifying these goals, we abstract out a common set of data cleaning tasks that often need to be addressed. This abstraction allows us to develop solutions for these common data cleaning tasks. We then discuss a few popular approaches for developing such solutions. In particular, we focus on an operator-centric approach for developing a data cleaning platform. The operator-centric approach involves the development of customizable operators that could be used as building blocks for developing common solutions. This is similar to the approach of relational algebra for query processing. The basic set of operators can be put together to build complex queries. Finally, we discuss the development of custom scripts which leverage the basic data cleaning operators along with relational operators to implement effective solutions for data cleaning tasks.

Table of Contents: Preface / Acknowledgments / Introduction / Technological Approaches / Similarity Functions / Operator: Similarity Join / Operator: Clustering / Operator: Parsing / Task: Record Matching / Task: Deduplication / Data Cleaning Scripts / Conclusion / Bibliography / Authors' Biographies

商品描述(中文翻譯)

資料倉儲整合了企業的各種活動，並且通常形成生成報告的基礎，這些報告支持重要的商業決策。資料中的錯誤往往因多種原因而產生。其中一些原因包括在輸入資料收集過程中的錯誤，以及在獨立收集的不同資料庫之間合併資料時的錯誤。這些資料倉儲中的錯誤通常會導致上游報告的錯誤，並可能對商業決策產生負面影響。因此，維護大型資料倉儲時的一個關鍵挑戰是確保資料倉儲中的資料質量保持高水平。維護高資料質量的過程通常被稱為資料清理。

在本書中，我們首先討論資料清理的目標。通常，資料清理的目標並不明確，並且在不同情境中可能意味著不同的解決方案。為了澄清這些目標，我們抽象出一組常見的資料清理任務，這些任務通常需要被解決。這種抽象使我們能夠為這些常見的資料清理任務開發解決方案。接著，我們討論幾種開發這類解決方案的流行方法。特別是，我們專注於以操作符為中心的方法來開發資料清理平台。以操作符為中心的方法涉及開發可自定義的操作符，這些操作符可以作為開發常見解決方案的構建模塊。這類似於關聯代數在查詢處理中的方法。基本的操作符集可以組合在一起以構建複雜的查詢。最後，我們討論開發自定義腳本，這些腳本利用基本的資料清理操作符以及關聯操作符來實現資料清理任務的有效解決方案。

目錄：前言 / 致謝 / 介紹 / 技術方法 / 相似性函數 / 操作符：相似性連接 / 操作符：聚類 / 操作符：解析 / 任務：記錄匹配 / 任務：去重 / 資料清理腳本 / 結論 / 參考文獻 / 作者簡介