Introducing .Net for Apache Spark: Distributed Processing for Massive Datasets

Elliott, Ed

  • 出版商: Apress
  • 出版日期: 2021-04-14
  • 定價: $2,100
  • 售價: 9.5$1,995
  • 貴賓價: 9.0$1,890
  • 語言: 英文
  • 頁數: 262
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1484269918
  • ISBN-13: 9781484269916
  • 相關分類: .NETSpark
  • 立即出貨 (庫存=1)

買這商品的人也買了...

商品描述

Get started using Apache Spark via C# or F# and the .NET for Apache Spark bindings. This book is an introduction to both Apache Spark and the .NET bindings. Readers new to Apache Spark will get up to speed quickly using Spark for data processing tasks performed against large and very large datasets. You will learn how to combine your knowledge of .NET with Apache Spark to bring massive computing power to bear by distributed processing of extremely large datasets across multiple servers.
This book covers how to get a local instance of Apache Spark running on your developer machine and shows you how to create your first .NET program that uses the Microsoft .NET bindings for Apache Spark. Techniques shown in the book allow you to use Apache Spark to distribute your data processing tasks over multiple compute nodes. You will learn to process data using both batch mode and streaming mode so you can make the right choice depending on whether you are processing an existing dataset or are working against new records in micro-batches as they arrive. The goal of the book is leave you comfortable in bringing the power of Apache Spark to your favorite .NET language.

What You Will Learn

  • Install and configure Spark .NET on Windows, Linux, and macOS
  • Write Apache Spark programs in C# and F# using the .NET bindings
  • Access and invoke the Apache Spark APIs from .NET with the same high performance as Python, Scala, and R
  • Encapsulate functionality in user-defined functions
  • Transform and aggregate large datasets
  • Execute SQL queries against files through Apache Hive
  • Distribute processing of large datasets across multiple servers
  • Create your own batch, streaming, and machine learning programs

Who This Book Is For
.NET developers who want to perform big data processing without having to migrate to Python, Scala, or R; and Apache Spark developers who want to run natively on .NET and take advantage of the C# and F# ecosystems

商品描述(中文翻譯)

透過C#或F#以及.NET for Apache Spark綁定,開始使用Apache Spark。本書介紹了Apache Spark和.NET綁定的基礎知識。對於初次接觸Apache Spark的讀者,可以快速上手使用Spark進行大型和超大型數據集的數據處理任務。您將學習如何結合您對.NET的知識與Apache Spark,通過在多台服務器上分佈式處理極大型數據集,發揮強大的計算能力。
本書介紹了如何在開發人員機器上運行本地Apache Spark實例,並展示了如何創建使用Microsoft .NET綁定的Apache Spark的第一個.NET程序。本書中展示的技術允許您使用Apache Spark將數據處理任務分佈到多個計算節點上。您將學習如何使用批處理模式和流處理模式處理數據,以便根據您是處理現有數據集還是根據微批次到達的新記錄進行適當的選擇。本書的目標是讓您能夠輕鬆地將Apache Spark的強大功能應用於您喜愛的.NET語言。

您將學到什麼

  • 在Windows、Linux和macOS上安裝和配置Spark .NET

  • 使用.NET綁定在C#和F#中編寫Apache Spark程序

  • 以與Python、Scala和R相同的高性能從.NET訪問和調用Apache Spark API

  • 封裝自定義函數的功能

  • 轉換和聚合大型數據集

  • 通過Apache Hive對文件執行SQL查詢

  • 在多台服務器上分佈處理大型數據集

  • 創建自己的批處理、流處理和機器學習程序

本書適合對象
希望在不必遷移到Python、Scala或R的情況下進行大數據處理的.NET開發人員;以及希望在.NET上本地運行並利用C#和F#生態系統的Apache Spark開發人員

作者簡介

Ed Elliott is a data engineer who has been working in IT for 20 years and has focused on data for the last 15 years. He uses Apache Spark at work and has been contributing to the Microsoft .NET for Apache Spark open source project since it was released in 2019. Ed has been blogging and writing since 2014 at his own blog as well as for SQL Server Central and Redgate. He has spoken at a number of events such as SQLBits, SQL Saturday, and the GroupBy conference.

作者簡介(中文翻譯)

Ed Elliott是一位資料工程師,他在IT行業工作已有20年,並專注於資料領域已有15年。他在工作中使用Apache Spark,並自2019年該項目釋出以來一直在貢獻於Microsoft .NET for Apache Spark開源項目。Ed自2014年開始在自己的部落格以及SQL Server Central和Redgate上撰寫博客和文章。他曾在許多活動上演講,如SQLBits、SQL Saturday和GroupBy conference。