Machine Learning with PySpark: With Natural Language Processing and Recommender Systems

Pramod Singh

買這商品的人也買了...

商品描述

Build machine learning models, natural language processing applications, and recommender systems with PySpark to solve various business challenges. This book starts with the fundamentals of Spark and its evolution and then covers the entire spectrum of traditional machine learning algorithms along with natural language processing and recommender systems using PySpark. 
 
Machine Learning with PySpark shows you how to build supervised machine learning models such as linear regression, logistic regression, decision trees, and random forest. You’ll also see unsupervised machine learning models such as K-means and hierarchical clustering. A major portion of the book focuses on feature engineering to create useful features with PySpark to train the machine learning models. The natural language processing section covers text processing, text mining, and embedding for classification. 
 
After reading this book, you will understand how to use PySpark’s machine learning library to build and train various machine learning models. Additionally you’ll become comfortable with related PySpark components, such as data ingestion, data processing, and data analysis, that you can use to develop data-driven intelligent applications.
What You Will Learn
  • Build a spectrum of supervised and unsupervised machine learning algorithms
  • Implement machine learning algorithms with Spark MLlib libraries
  • Develop a recommender system with Spark MLlib libraries
  • Handle issues related to feature engineering, class balance, bias and variance, and cross validation for building an optimal fit model
 
Who This Book Is For 
 
Data science and machine learning professionals. 
 
 

商品描述(中文翻譯)

使用PySpark建立機器學習模型、自然語言處理應用和推薦系統,以解決各種業務挑戰。本書從Spark的基礎和演進開始,然後涵蓋了傳統機器學習算法的整個範譜,以及使用PySpark進行自然語言處理和推薦系統。

《使用PySpark進行機器學習》向您展示如何建立監督式機器學習模型,例如線性回歸、邏輯回歸、決策樹和隨機森林。您還將看到無監督機器學習模型,例如K-means和階層聚類。本書的一大部分專注於使用PySpark進行特徵工程,以創建有用的特徵來訓練機器學習模型。自然語言處理部分涵蓋了文本處理、文本挖掘和嵌入式分類。

閱讀本書後,您將了解如何使用PySpark的機器學習庫來建立和訓練各種機器學習模型。此外,您還將熟悉相關的PySpark組件,例如數據輸入、數據處理和數據分析,這些組件可用於開發數據驅動的智能應用程序。

《您將學到什麼》

- 建立一系列監督和無監督的機器學習算法
- 使用Spark MLlib庫實現機器學習算法
- 使用Spark MLlib庫開發推薦系統
- 處理特徵工程、類別平衡、偏差和變異以及交叉驗證等問題,以建立最佳擬合模型

《本書適合對象》

數據科學和機器學習專業人士。