Probabilistic Databases (Synthesis Lectures on Data Management)
暫譯: 機率資料庫(資料管理綜合講座)

Dan Suciu, Dan Olteanu, Christopher Ré, Christoph Koch

  • 出版商: Morgan & Claypool
  • 出版日期: 2011-06-02
  • 售價: $1,780
  • 貴賓價: 9.5$1,691
  • 語言: 英文
  • 頁數: 180
  • 裝訂: Paperback
  • ISBN: 1608456803
  • ISBN-13: 9781608456802
  • 相關分類: 資料庫
  • 海外代購書籍(需單獨結帳)

相關主題

商品描述

Probabilistic databases are databases where the value of some attributes or the presence of some records are uncertain and known only with some probability. Applications in many areas such as information extraction, RFID and scientific data management, data cleaning, data integration, and financial risk assessment produce large volumes of uncertain data, which are best modeled and processed by a probabilistic database.

This book presents the state of the art in representation formalisms and query processing techniques for probabilistic data. It starts by discussing the basic principles for representing large probabilistic databases, by decomposing them into tuple-independent tables, block-independent-disjoint tables, or U-databases. Then it discusses two classes of techniques for query evaluation on probabilistic databases. In extensional query evaluation, the entire probabilistic inference can be pushed into the database engine and, therefore, processed as effectively as the evaluation of standard SQL queries. The relational queries that can be evaluated this way are called safe queries. In intensional query evaluation, the probabilistic inference is performed over a propositional formula called lineage expression: every relational query can be evaluated this way, but the data complexity dramatically depends on the query being evaluated, and can be #P-hard. The book also discusses some advanced topics in probabilistic data management such as top-k query processing, sequential probabilistic databases, indexing and materialized views, and Monte Carlo databases.

Table of Contents: Overview / Data and Query Model / The Query Evaluation Problem / Extensional Query Evaluation / Intensional Query Evaluation / Advanced Techniques

商品描述(中文翻譯)

概率資料庫是指某些屬性的值或某些記錄的存在是不可確定的,僅以某種概率為人所知的資料庫。許多領域的應用,如資訊擷取、RFID 和科學數據管理、數據清理、數據整合以及金融風險評估,產生大量的不確定數據,這些數據最適合由概率資料庫進行建模和處理。

本書介紹了概率數據的表示形式和查詢處理技術的最新進展。首先討論了表示大型概率資料庫的基本原則,通過將其分解為元組獨立表、區塊獨立不相交表或 U-資料庫。然後討論了兩類在概率資料庫上進行查詢評估的技術。在擴展查詢評估 (extensional query evaluation) 中,整個概率推理可以推入資料庫引擎,因此可以像標準 SQL 查詢的評估一樣有效地進行處理。可以這樣評估的關聯查詢稱為安全查詢 (safe queries)。在強制查詢評估 (intensional query evaluation) 中,概率推理是在一個稱為血統表達式 (lineage expression) 的命題公式上進行的:每個關聯查詢都可以這樣評估,但數據的複雜性會根據所評估的查詢而顯著變化,並且可能是 #P-hard。本書還討論了一些概率數據管理的進階主題,如前 k 名查詢處理、序列概率資料庫、索引和物化視圖,以及蒙地卡羅資料庫。

目錄:概述 / 數據與查詢模型 / 查詢評估問題 / 擴展查詢評估 / 強制查詢評估 / 進階技術