Large Scale Machine Learning with Spark
暫譯: 使用 Spark 進行大規模機器學習

Md. Rezaul Karim, Md. Mahedi Kaysar

  • 出版商: Packt Publishing
  • 出版日期: 2016-10-27
  • 售價: $2,030
  • 貴賓價: 9.5$1,929
  • 語言: 英文
  • 頁數: 476
  • 裝訂: Paperback
  • ISBN: 1785888749
  • ISBN-13: 9781785888748
  • 相關分類: SparkMachine Learning
  • 海外代購書籍(需單獨結帳)

買這商品的人也買了...

相關主題

商品描述

Discover everything you need to build robust machine learning applications with Spark 2.0

About This Book

  • Get the most up-to-date book on the market that focuses on design, engineering, and scalable solutions in machine learning with Spark 2.0.0
  • Use Spark’s machine learning library in a big data environment
  • You will learn how to develop high-value applications at scale with ease and a develop a personalized design

Who This Book Is For

This book is for data science engineers and scientists who work with large and complex data sets. You should be familiar with the basics of machine learning concepts, statistics, and computational mathematics. Knowledge of Scala and Java is advisable.

What You Will Learn

  • Get solid theoretical understandings of ML algorithms
  • Configure Spark on cluster and cloud infrastructure to develop applications using Scala, Java, Python, and R
  • Scale up ML applications on large cluster or cloud infrastructures
  • Use Spark ML and MLlib to develop ML pipelines with recommendation system, classification, regression, clustering, sentiment analysis, and dimensionality reduction
  • Handle large texts for developing ML applications with strong focus on feature engineering
  • Use Spark Streaming to develop ML applications for real-time streaming
  • Tune ML models with cross-validation, hyperparameters tuning and train split
  • Enhance ML models to make them adaptable for new data in dynamic and incremental environments

In Detail

Data processing, implementing related algorithms, tuning, scaling up and finally deploying are some crucial steps in the process of optimising any application.

Spark is capable of handling large-scale batch and streaming data to figure out when to cache data in memory and processing them up to 100 times faster than Hadoop-based MapReduce. This means predictive analytics can be applied to streaming and batch to develop complete machine learning (ML) applications a lot quicker, making Spark an ideal candidate for large data-intensive applications.

This book focuses on design engineering and scalable solutions using ML with Spark. First, you will learn how to install Spark with all new features from the latest Spark 2.0 release. Moving on, you’ll explore important concepts such as advanced feature engineering with RDD and Datasets. After studying developing and deploying applications, you will see how to use external libraries with Spark.

In summary, you will be able to develop complete and personalised ML applications from data collections,model building, tuning, and scaling up to deploying on a cluster or the cloud.

Style and approach

This book takes a practical approach where all the topics explained are demonstrated with the help of real-world use cases.

商品描述(中文翻譯)

**發現構建穩健機器學習應用程式所需的一切,使用 Spark 2.0**

## 本書介紹
- 獲取市場上最新的書籍,專注於設計、工程和可擴展的機器學習解決方案,使用 Spark 2.0.0。
- 在大數據環境中使用 Spark 的機器學習庫。
- 您將學會如何輕鬆地開發高價值的應用程式並設計個性化的解決方案。

## 本書適合誰
本書適合處理大型和複雜數據集的數據科學工程師和科學家。您應該熟悉機器學習概念、統計學和計算數學的基本知識。建議具備 Scala 和 Java 的知識。

## 您將學到什麼
- 獲得對機器學習(ML)算法的堅實理論理解。
- 在集群和雲基礎設施上配置 Spark,使用 Scala、Java、Python 和 R 開發應用程式。
- 在大型集群或雲基礎設施上擴展 ML 應用程式。
- 使用 Spark ML 和 MLlib 開發包含推薦系統、分類、回歸、聚類、情感分析和降維的 ML 管道。
- 處理大型文本以開發 ML 應用程式,重點關注特徵工程。
- 使用 Spark Streaming 開發實時流式的 ML 應用程式。
- 通過交叉驗證、超參數調整和訓練分割來調整 ML 模型。
- 增強 ML 模型,使其能夠在動態和增量環境中適應新數據。

## 詳細內容
數據處理、實施相關算法、調整、擴展以及最終部署是優化任何應用程式過程中的一些關鍵步驟。

Spark 能夠處理大規模的批量和流數據,以確定何時將數據緩存在內存中,並將其處理速度提高到比基於 Hadoop 的 MapReduce 快 100 倍。這意味著預測分析可以應用於流式和批量數據,以更快地開發完整的機器學習(ML)應用程式,使 Spark 成為大型數據密集型應用程式的理想候選者。

本書專注於使用 Spark 的設計工程和可擴展解決方案。首先,您將學會如何安裝 Spark,並了解最新 Spark 2.0 版本中的所有新功能。接下來,您將探索重要概念,例如使用 RDD 和 Datasets 的高級特徵工程。在學習開發和部署應用程式後,您將看到如何在 Spark 中使用外部庫。

總之,您將能夠從數據收集、模型構建、調整到在集群或雲上部署,開發完整且個性化的 ML 應用程式。

## 風格與方法
本書採取實用的方法,所有解釋的主題都通過真實世界的案例進行演示。