Machine Learning for Data Streams: with Practical Examples in MOA (Adaptive Computation and Machine Learning series)

Albert Bifet, Ricard Gavaldà, Geoff Holmes, Bernhard Pfahringer

商品描述

A hands-on approach to tasks and techniques in data stream mining and real-time analytics, with examples in MOA, a popular freely available open-source software framework.

Today many information sources -- including sensor networks, financial markets, social networks, and healthcare monitoring -- are so-called data streams, arriving sequentially and at high speed. Analysis must take place in real time, with partial data and without the capacity to store the entire data set. This book presents algorithms and techniques used in data stream mining and real-time analytics. Taking a hands-on approach, the book demonstrates the techniques using MOA (Massive Online Analysis), a popular, freely available open-source software framework, allowing readers to try out the techniques after reading the explanations.

The book first offers a brief introduction to the topic, covering big data mining, basic methodologies for mining data streams, and a simple example of MOA. More detailed discussions follow, with chapters on sketching techniques, change, classification, ensemble methods, regression, clustering, and frequent pattern mining. Most of these chapters include exercises, an MOA-based lab session, or both. Finally, the book discusses the MOA software, covering the MOA graphical user interface, the command line, use of its API, and the development of new methods within MOA. The book will be an essential reference for readers who want to use data stream mining as a tool, researchers in innovation or data stream mining, and programmers who want to create new algorithms for MOA.

商品描述(中文翻譯)

這本書提供了一種實踐的方法,介紹了數據流挖掘和實時分析的任務和技術,並以MOA作為示例,MOA是一個受歡迎的免費開源軟件框架。

如今,許多信息源(包括傳感器網絡、金融市場、社交網絡和醫療監測等)都是所謂的數據流,以順序和高速到達。分析必須實時進行,使用部分數據並且無法存儲整個數據集。本書介紹了數據流挖掘和實時分析中使用的算法和技術。本書採用實踐方法,使用MOA(Massive Online Analysis)這個受歡迎的免費開源軟件框架來演示這些技術,讓讀者在閱讀解釋後可以嘗試這些技術。

本書首先簡要介紹了這個主題,包括大數據挖掘、數據流挖掘的基本方法以及MOA的一個簡單示例。接下來進行了更詳細的討論,包括草圖技術、變化檢測、分類、集成方法、回歸、聚類和頻繁模式挖掘等章節。這些章節中的大多數都包含練習題、基於MOA的實驗室會議或兩者兼有。最後,本書討論了MOA軟件,包括MOA圖形用戶界面、命令行、使用其API以及在MOA中開發新方法。本書將成為想要將數據流挖掘作為工具使用的讀者、創新或數據流挖掘研究人員以及希望為MOA創建新算法的程序員的重要參考資料。