PySpark SQL Recipes: With HiveQL, Dataframe and Graphframes

Raju Kumar Mishra, Sundar Rajan Raman

  • 出版商: Apress
  • 出版日期: 2019-03-19
  • 售價: $1,680
  • 貴賓價: 9.5$1,596
  • 語言: 英文
  • 頁數: 323
  • 裝訂: Paperback
  • ISBN: 148424334X
  • ISBN-13: 9781484243343
  • 相關分類: SparkSQL
  • 海外代購書籍(需單獨結帳)

買這商品的人也買了...

商品描述

Carry out data analysis with PySpark SQL, graphframes, and graph data processing using a problem-solution approach. This book provides solutions to problems related to dataframes, data manipulation summarization, and exploratory analysis. You will improve your skills in graph data analysis using graphframes and see how to optimize your PySpark SQL code.

 

PySpark SQL Recipes starts with recipes on creating dataframes from different types of data source, data aggregation and summarization, and exploratory data analysis using PySpark SQL. You’ll also discover how to solve problems in graph analysis using graphframes.

 

On completing this book, you’ll have ready-made code for all your PySpark SQL tasks, including creating dataframes using data from different file formats as well as from SQL or NoSQL databases.

 

What You Will Learn

  • Understand PySpark SQL and its advanced features
  • Use SQL and HiveQL with PySpark SQL
  • Work with structured streaming
  • Optimize PySpark SQL 
  • Master graphframes and graph processing

商品描述(中文翻譯)

使用問題解決的方法,使用PySpark SQL、graphframes和圖形數據處理進行數據分析。本書提供了與數據框、數據操作總結和探索性分析相關的問題解決方案。您將通過使用graphframes改進圖形數據分析技能,並了解如何優化PySpark SQL代碼。

《PySpark SQL Recipes》從使用不同類型的數據源創建數據框、數據聚合和總結以及使用PySpark SQL進行探索性數據分析的解決方案開始。您還將發現如何使用graphframes解決圖形分析中的問題。

完成本書後,您將擁有所有PySpark SQL任務的現成代碼,包括使用來自不同文件格式以及SQL或NoSQL數據庫的數據創建數據框。

您將學到以下內容:
- 了解PySpark SQL及其高級功能
- 使用PySpark SQL和HiveQL
- 使用結構化流處理
- 優化PySpark SQL
- 掌握graphframes和圖形處理技術

作者簡介

Raju Kumar Mishra has strong interests in data science and systems that have the capability of handling large amounts of data and operating complex mathematical models through computational programming. He was inspired to pursue an M. Tech in computational sciences from Indian Institute of Science in Bangalore, India. Raju primarily works in the areas of data science and its different applications. Working as a corporate trainer he has developed unique insights that help him in teaching and explaining complex ideas with ease. Raju is also a data science consultant solving complex industrial problems. He works on programming tools such as R, Python, scikit-learn, Statsmodels, Hadoop, Hive, Pig, Spark, and many others. His venture Walsoul Private Ltd provides training in data science, programming, and big data.

 

Sundar Rajan Raman is an artificial intelligence practitioner currently working at Bank of America. He holds a Bachelor of Technology degree from the National Institute of Technology, India. Being a seasoned Java and J2EE programmer he has worked on critical applications for companies such as AT&T, Singtel, and Deutsche Bank. He is also a seasoned big data architect. His current focus is on artificial intelligence space including machine learning and deep learning.

作者簡介(中文翻譯)

Raju Kumar Mishra對於能處理大量數據並通過計算編程操作複雜數學模型的系統和數據科學非常感興趣。他受到印度班加羅爾的印度科學研究所的計算科學碩士學位的啟發,追求了這個領域。Raju主要從事數據科學及其不同應用領域的工作。作為企業培訓師,他開發了獨特的見解,有助於他輕鬆地教授和解釋複雜的想法。Raju還是一位數據科學顧問,解決複雜的工業問題。他使用的編程工具包括R、Python、scikit-learn、Statsmodels、Hadoop、Hive、Pig、Spark等。他的公司Walsoul Private Ltd提供數據科學、編程和大數據培訓。

Sundar Rajan Raman是一位人工智能從業者,目前在美國銀行工作。他擁有印度國家技術學院的學士學位。作為一名經驗豐富的Java和J2EE程序員,他曾為AT&T、Singtel和德意志銀行等公司開發關鍵應用程序。他還是一位經驗豐富的大數據架構師。他目前的重點是人工智能領域,包括機器學習和深度學習。