商品描述
Assess the quality of your prediction and classification models in ways that accurately reflect their real-world performance, and then improve this performance using state-of-the-art algorithms such as committee-based decision making, resampling the dataset, and boosting.  This book presents many important techniques for building powerful, robust models and quantifying their expected behavior when put to work in your application.
Considerable attention is given to information theory, especially as it relates to discovering and exploiting relationships between variables employed by your models.  This presentation of an often confusing subject avoids advanced mathematics, focusing instead on concepts easily understood by those with modest background in mathematics.
All algorithms include an intuitive explanation of operation, essential equations, references to more rigorous theory, and commented C++ source code.  Many of these techniques are recent developments, still not in widespread use.  Others are standard algorithms given a fresh look.  In every case, the emphasis is on practical applicability, with all code written in such a way that it can easily be included in any program.
- Compute entropy to detect problematic predictors.
- Compute confidence and tolerance intervals for predictions, as well as confidence levels for classification decisions.
- Improve numeric predictions using constrained and unconstrained combinations, variance-weighted interpolation, and kernel-regression smoothing.
- Improve classification decisions using Borda counts, MinMax and MaxMin rules, union and intersection rules, logistic regression, selection by local accuracy, maximization of the fuzzy integral, and pairwise coupling.
- Use information-theoretic techniques to rapidly screen large numbers of candidate predictors, identifying those that are especially promising.
- Use Monte-Carlo permutation methods to assess the role of good luck in performance results.
Who This Book is For
Anyone who creates prediction or classification models will find a wealth of useful algorithms in this book.  Although all code examples are written in C++, the algorithms are described in sufficient detail that they can easily be programmed in any language.
商品描述(中文翻譯)
評估您的預測和分類模型的質量,以準確反映其在現實世界中的表現,然後使用最先進的算法來改善這一表現,例如基於委員會的決策、重新抽樣數據集和提升(boosting)。本書介紹了許多重要技術,用於構建強大且穩健的模型,並量化它們在應用中運作時的預期行為。
本書對信息理論給予了相當的重視,特別是它與發現和利用模型中變量之間的關係的關聯。這一常常令人困惑的主題的呈現避免了高級數學,專注於那些具有適度數學背景的人容易理解的概念。
所有算法都包括直觀的操作解釋、基本方程式、對更嚴謹理論的參考以及註解的 C++ 源代碼。這些技術中的許多是最近的發展,仍未廣泛使用。其他則是標準算法,給予了新的視角。在每一種情況下,重點都放在實際應用上,所有代碼都以易於納入任何程序的方式編寫。
您將學到的內容:
- 計算熵以檢測問題預測因子。
- 計算預測的置信區間和容忍區間,以及分類決策的置信水平。
- 使用約束和非約束組合、方差加權插值和核回歸平滑來改善數值預測。
- 使用 Borda 計數、MinMax 和 MaxMin 規則、聯集和交集規則、邏輯回歸、基於局部準確度的選擇、模糊積分的最大化以及成對耦合來改善分類決策。
- 使用信息理論技術快速篩選大量候選預測因子,識別出特別有前景的因子。
- 使用蒙地卡羅置換方法評估運氣在性能結果中的作用。
本書適合對象:
任何創建預測或分類模型的人都會在本書中找到大量有用的算法。雖然所有代碼示例均以 C++ 編寫,但算法的描述足夠詳細,以便可以輕鬆地用任何語言編程。

 
     
     
     
     
    
 
     
     
     
    