Learning Apache Mahout

Chandramani Tiwary

  • 出版商: Packt Publishing
  • 出版日期: 2015-04-03
  • 售價: $1,970
  • 貴賓價: 9.5$1,872
  • 語言: 英文
  • 頁數: 275
  • 裝訂: Paperback
  • ISBN: 1783555211
  • ISBN-13: 9781783555215
  • 海外代購書籍(需單獨結帳)

相關主題

商品描述

Acquire practical skills in Big Data Analytics and explore data science with Apache Mahout

About This Book

  • Learn to use Apache Mahout for Big Data Analytics
  • Understand machine learning concepts and algorithms and their implementation in Mahout.
  • A comprehensive guide with numerous code examples and end-to-end case studies on Customer Analytics and Text Analytics.

Who This Book Is For

If you are a Java developer and want to use Mahout and machine learning to solve Big Data Analytics use cases then this book is for you. Familiarity with shell scripts is assumed but no prior experience is required.

What You Will Learn

  • Configure Mahout on Linux systems and set up the development environment
  • Become familiar with the Mahout command line utilities and Java APIs
  • Understand the core concepts of machine learning and the classes that implement them
  • Integrate Apache Mahout with newer platforms such as Apache Spark
  • Solve classification, clustering, and recommendation problems with Mahout
  • Explore frequent pattern mining and topic modeling, the two main application areas of machine learning
  • Understand feature extraction, reduction, and the curse of dimensionality

In Detail

In the past few years the generation of data and our capability to store and process it has grown exponentially. There is a need for scalable analytics frameworks and people with the right skills to get the information needed from this Big Data. Apache Mahout is one of the first and most prominent Big Data machine learning platforms. It implements machine learning algorithms on top of distributed processing platforms such as Hadoop and Spark.

Starting with the basics of Mahout and machine learning, you will explore prominent algorithms and their implementation in Mahout development. You will learn about Mahout building blocks, addressing feature extraction, reduction and the curse of dimensionality, delving into classification use cases with the random forest and Naive Bayes classifier and item and user-based recommendation. You will then work with clustering Mahout using the K-means algorithm and implement Mahout without MapReduce. Finish with a flourish by exploring end-to-end use cases on customer analytics and test analytics to get a real-life practical know-how of analytics projects.

商品描述(中文翻譯)

獲得大數據分析的實用技能,並使用 Apache Mahout 探索數據科學

關於本書
- 學習如何使用 Apache Mahout 進行大數據分析
- 理解機器學習的概念和算法及其在 Mahout 中的實現
- 一本全面的指南,包含大量代碼示例和關於客戶分析及文本分析的端到端案例研究

本書適合誰
如果您是 Java 開發人員,並希望使用 Mahout 和機器學習來解決大數據分析的使用案例,那麼這本書適合您。假設您對 shell 腳本有一定的了解,但不需要任何先前的經驗。

您將學到什麼
- 在 Linux 系統上配置 Mahout 並設置開發環境
- 熟悉 Mahout 命令行工具和 Java API
- 理解機器學習的核心概念及其實現的類別
- 將 Apache Mahout 與 Apache Spark 等新平台整合
- 使用 Mahout 解決分類、聚類和推薦問題
- 探索頻繁模式挖掘和主題建模,這是機器學習的兩個主要應用領域
- 理解特徵提取、降維及維度詛咒

詳細內容
在過去幾年中,數據的生成以及我們存儲和處理數據的能力呈指數增長。對可擴展的分析框架和具備正確技能的人才的需求日益增加,以從這些大數據中獲取所需的信息。Apache Mahout 是最早和最突出的大數據機器學習平台之一。它在 Hadoop 和 Spark 等分佈式處理平台上實現了機器學習算法。

從 Mahout 和機器學習的基礎開始,您將探索突出的算法及其在 Mahout 開發中的實現。您將了解 Mahout 的基本組件,處理特徵提取、降維和維度詛咒,深入研究隨機森林和朴素貝葉斯分類器的分類使用案例,以及基於項目和用戶的推薦。然後,您將使用 K-means 算法進行 Mahout 的聚類,並在不使用 MapReduce 的情況下實現 Mahout。最後,通過探索客戶分析和測試分析的端到端使用案例,獲得分析項目的實際操作經驗。