Dataset Shift in Machine Learning (Hardcover)

Joaquin Quiñonero-Candela, Masashi Sugiyama, Anton Schwaighofer, Neil D. Lawrence

  • 出版商: MIT
  • 出版日期: 2009-02-01
  • 售價: $1,575
  • 貴賓價: 9.8$1,544
  • 語言: 英文
  • 頁數: 248
  • 裝訂: Hardcover
  • ISBN: 0262170051
  • ISBN-13: 9780262170055
  • 相關分類: Machine Learning
  • 立即出貨 (庫存 < 3)

買這商品的人也買了...

商品描述

Dataset shift is a common problem in predictive modeling that occurs when the joint distribution of inputs and outputs differs between training and test stages. Covariate shift, a particular case of dataset shift, occurs when only the input distribution changes. Dataset shift is present in most practical applications, for reasons ranging from the bias introduced by experimental design to the irreproducibility of the testing conditions at training time. (An example is -email spam filtering, which may fail to recognize spam that differs in form from the spam the automatic filter has been built on.) Despite this, and despite the attention given to the apparently similar problems of semi-supervised learning and active learning, dataset shift has received relatively little attention in the machine learning community until recently. This volume offers an overview of current efforts to deal with dataset and covariate shift.

The chapters offer a mathematical and philosophical introduction to the problem, place dataset shift in relationship to transfer learning, transduction, local learning, active learning, and semi-supervised learning, provide theoretical views of dataset and covariate shift (including decision theoretic and Bayesian perspectives), and present algorithms for covariate shift.

Contributors: Shai Ben-David, Steffen Bickel, Karsten Borgwardt, Michael Brückner, David Corfield, Amir Globerson, Arthur Gretton, Lars Kai Hansen, Matthias Hein, Jiayuan Huang, Takafumi Kanamori, Klaus-Robert Müller, Sam Roweis, Neil Rubens, Tobias Scheffer, Marcel Schmittfull, Bernhard Schölkopf, Hidetoshi Shimodaira, Alex Smola, Amos Storkey, Masashi Sugiyama, Choon Hui Teo

Neural Information Processing series

商品描述(中文翻譯)

資料集偏移是預測建模中常見的問題,當訓練和測試階段的輸入和輸出的聯合分佈不同時就會發生。共變量偏移是資料集偏移的一個特殊情況,只有輸入分佈發生變化。資料集偏移存在於大多數實際應用中,原因包括實驗設計引入的偏差以及訓練時測試條件的不可重複性。(例如,電子郵件垃圾郵件過濾可能無法識別與自動過濾器建立的垃圾郵件形式不同的垃圾郵件。)儘管如此,儘管對半監督學習和主動學習等明顯相似的問題給予了關注,但直到最近,資料集偏移在機器學習社區中才受到相對較少的關注。本書提供了處理資料集和共變量偏移的當前努力的概述。

本書的章節提供了對該問題的數學和哲學介紹,將資料集偏移與轉移學習、轉導、局部學習、主動學習和半監督學習相關聯,提供了資料集和共變量偏移的理論觀點(包括決策理論和貝葉斯觀點),並提出了處理共變量偏移的算法。

貢獻者:Shai Ben-David、Steffen Bickel、Karsten Borgwardt、Michael Brückner、David Corfield、Amir Globerson、Arthur Gretton、Lars Kai Hansen、Matthias Hein、Jiayuan Huang、Takafumi Kanamori、Klaus-Robert Müller、Sam Roweis、Neil Rubens、Tobias Scheffer、Marcel Schmittfull、Bernhard Schölkopf、Hidetoshi Shimodaira、Alex Smola、Amos Storkey、Masashi Sugiyama、Choon Hui Teo

《神經資訊處理系列》