Spark for Data Science

Srinivas Duvvuri, Bikramaditya Singhal

  • 出版商: Packt Publishing
  • 出版日期: 2016-09-30
  • 售價: $2,040
  • 貴賓價: 9.5$1,938
  • 語言: 英文
  • 頁數: 344
  • 裝訂: Paperback
  • ISBN: 1785885650
  • ISBN-13: 9781785885655
  • 相關分類: SparkData Science
  • 下單後立即進貨 (約3~4週)

商品描述

Analyze your data and delve deep into the world of machine learning with the latest Spark version, 2.0

About This Book

  • Perform data analysis and build predictive models on huge datasets that leverage Apache Spark
  • Learn to integrate data science algorithms and techniques with the fast and scalable computing features of Spark to address big data challenges
  • Work through practical examples on real-world problems with sample code snippets

Who This Book Is For

This book is for anyone who wants to leverage Apache Spark for data science and machine learning. If you are a technologist who wants to expand your knowledge to perform data science operations in Spark, or a data scientist who wants to understand how algorithms are implemented in Spark, or a newbie with minimal development experience who wants to learn about Big Data Analytics, this book is for you!

What You Will Learn

  • Consolidate, clean, and transform your data acquired from various data sources
  • Perform statistical analysis of data to find hidden insights
  • Explore graphical techniques to see what your data looks like
  • Use machine learning techniques to build predictive models
  • Build scalable data products and solutions
  • Start programming using the RDD, DataFrame and Dataset APIs
  • Become an expert by improving your data analytical skills

In Detail

This is the era of Big Data. The words Big Data implies big innovation and enables a competitive advantage for businesses. Apache Spark was designed to perform Big Data analytics at scale, and so Spark is equipped with the necessary algorithms and supports multiple programming languages.

Whether you are a technologist, a data scientist, or a beginner to Big Data analytics, this book will provide you with all the skills necessary to perform statistical data analysis, data visualization, predictive modeling, and build scalable data products or solutions using Python, Scala, and R.

With ample case studies and real-world examples, Spark for Data Science will help you ensure the successful execution of your data science projects.

Style and approach

This book takes a step-by-step approach to statistical analysis and machine learning, and is explained in a conversational and easy-to-follow style. Each topic is explained sequentially with a focus on the fundamentals as well as the advanced concepts of algorithms and techniques. Real-world examples with sample code snippets are also included.

商品描述(中文翻譯)

分析您的數據並深入機器學習的世界,使用最新的Spark 2.0版本。

關於本書
- 在利用Apache Spark的大型數據集上進行數據分析並建立預測模型
- 學習將數據科學算法和技術與Spark的快速和可擴展的計算功能相結合,以應對大數據挑戰
- 通過實際示例和代碼片段解決現實世界問題

本書適合對於利用Apache Spark進行數據科學和機器學習的任何人。如果您是一位技術人員,希望擴展您的知識以在Spark中執行數據科學操作,或者是一位想要了解Spark中算法實現的數據科學家,或者是一位具有最少開發經驗的新手,希望學習大數據分析,那麼本書適合您!

您將學到什麼
- 整合、清理和轉換從各種數據源獲取的數據
- 進行統計分析以發現隱藏的洞察力
- 探索圖形技術以了解數據的樣貌
- 使用機器學習技術建立預測模型
- 構建可擴展的數據產品和解決方案
- 使用RDD、DataFrame和Dataset API開始編程
- 通過提高數據分析技能成為專家

詳細內容
這是大數據的時代。大數據一詞意味著大創新,並為企業提供競爭優勢。Apache Spark旨在以大規模進行大數據分析,因此Spark配備了必要的算法並支持多種編程語言。

無論您是技術人員、數據科學家還是初學者,本書將為您提供執行統計數據分析、數據可視化、預測建模以及使用Python、Scala和R構建可擴展數據產品或解決方案所需的所有技能。

通過豐富的案例研究和實際示例,Spark for Data Science將幫助您確保成功執行您的數據科學項目。

風格和方法
本書以逐步的方式進行統計分析和機器學習,並以對話和易於理解的風格進行解釋。每個主題都按照順序解釋,重點介紹算法和技術的基礎知識以及高級概念。還包括帶有代碼片段的實際示例。