Data Mining Algorithms in C++: Data Patterns and Algorithms for Modern Applications

Timothy Masters

商品描述

Discover hidden relationships among the variables in your data, and learn how to exploit these relationships.  This book presents a collection of data-mining algorithms that are effective in a wide variety of prediction and classification applications.  All algorithms include an intuitive explanation of operation, essential equations, references to more rigorous theory, and commented C++ source code.
 
Many of these techniques are recent developments, still not in widespread use.  Others are standard algorithms given a fresh look.  In every case, the focus is on practical applicability, with all code written in such a way that it can easily be included into any program.  The Windows-based DATAMINE program lets you experiment with the techniques before incorporating them into your own work.
 
What you'll learn
  • Monte-Carlo permutation tests provide statistically sound assessment of relationships present in your data.
  • Combinatorially symmetric cross validation reveals whether your model has true power or has just learned noise by overfitting the data.
  • Feature weighting as regularized energy-based learning ranks variables according to their predictive power when there is too little data for traditional methods.
  • The eigenstructure of a dataset enables clustering of variables into groups that exist only within meaningful subspaces of the data.
  • Plotting regions of the variable space where there is disagreement between marginal and actual densities, or where contribution to mutual information is high, provides visual insight into anomalous relationships.
 
Who this book is for
 
The techniques presented in this book and in the DATAMINE program will be useful to anyone interested in discovering and exploiting relationships among variables.  Although all code examples are written in C++, the algorithms are described in sufficient detail that they can easily be programmed in any language.

商品描述(中文翻譯)

發現您的數據中變數之間的隱藏關係,並學習如何利用這些關係。本書介紹了一系列在各種預測和分類應用中有效的數據挖掘算法。所有算法都包括直觀的操作解釋、基本方程式、更嚴謹理論的參考以及有註解的C++源代碼。

其中許多技術都是最近的發展,尚未廣泛使用。其他技術則是對標準算法進行了新的觀察。在每種情況下,重點都是實際應用性,所有代碼都以易於包含到任何程序中的方式編寫。基於Windows的DATAMINE程序讓您在將這些技術應用到自己的工作之前進行實驗。

您將學到以下內容:

- 蒙特卡羅置換檢驗可對數據中存在的關係進行統計上可靠的評估。
- 組合對稱交叉驗證可揭示您的模型是否具有真正的能力,或者只是通過過度擬合數據來學習噪音。
- 特徵加權作為正則化能量學習可根據預測能力對變數進行排序,當傳統方法的數據不足時使用。
- 數據集的特徵結構使得將變數聚類到僅存在於數據的有意義子空間中的群組成為可能。
- 繪製變數空間中邊緣和實際密度之間存在不一致或互信息貢獻高的區域,可提供對異常關係的視覺洞察。

本書適合對發現和利用變數之間關係感興趣的任何人。雖然所有代碼示例都是用C++編寫的,但算法的描述足夠詳細,可以輕鬆地用任何語言編程。