Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud
暫譯: 從零開始使用 Azure Databricks 的 Apache Spark:在雲端釋放大型叢集分析

Ilijason, Robert

相關主題

商品描述

Analyze vast amounts of data in record time using Apache Spark with Databricks in the Cloud. Learn the fundamentals, and more, of running analytics on large clusters in Azure and AWS, using Apache Spark with Databricks on top. Discover how to squeeze the most value out of your data at a mere fraction of what classical analytics solutions cost, while at the same time getting the results you need, incrementally faster.

This book explains how the confluence of these pivotal technologies gives you enormous power, and cheaply, when it comes to huge datasets. You will begin by learning how cloud infrastructure makes it possible to scale your code to large amounts of processing units, without having to pay for the machinery in advance. From there you will learn how Apache Spark, an open source framework, can enable all those CPUs for data analytics use. Finally, you will see how services such as Databricks provide the power of Apache Spark, without you having to know anything about configuring hardware or software. By removing the need for expensive experts and hardware, your resources can instead be allocated to actually finding business value in the data.

This book guides you through some advanced topics such as analytics in the cloud, data lakes, data ingestion, architecture, machine learning, and tools, including Apache Spark, Apache Hadoop, Apache Hive, Python, and SQL. Valuable exercises help reinforce what you have learned.


What You Will Learn

  • Discover the value of big data analytics that leverage the power of the cloud
  • Get started with Databricks using SQL and Python in either Microsoft Azure or AWS
  • Understand the underlying technology, and how the cloud and Apache Spark fit into the bigger picture
  • See how these tools are used in the real world
  • Run basic analytics, including machine learning, on billions of rows at a fraction of a cost or free


Who This Book Is For

Data engineers, data scientists, and cloud architects who want or need to run advanced analytics in the cloud. It is assumed that the reader has data experience, but perhaps minimal exposure to Apache Spark and Azure Databricks. The book is also recommended for people who want to get started in the analytics field, as it provides a strong foundation.

商品描述(中文翻譯)

分析大量數據,使用 Apache Spark 和 Databricks 在雲端以創紀錄的速度進行。學習在 Azure 和 AWS 上運行大型叢集分析的基本原理,以及更多內容,使用 Apache Spark 和 Databricks。發現如何以傳統分析解決方案成本的一小部分,從數據中榨取最大的價值,同時以更快的速度獲得所需的結果。

本書解釋了這些關鍵技術的交匯如何在處理大型數據集時為您提供巨大的力量,且成本低廉。您將首先學習雲基礎設施如何使您的代碼能夠擴展到大量處理單元,而無需提前支付機器的費用。接著,您將學習開源框架 Apache Spark 如何使所有這些 CPU 能夠用於數據分析。最後,您將看到像 Databricks 這樣的服務如何提供 Apache Spark 的強大功能,而您無需了解任何有關配置硬體或軟體的知識。通過消除對昂貴專家和硬體的需求,您的資源可以轉而用於實際在數據中尋找商業價值。

本書將引導您了解一些進階主題,例如雲端分析、數據湖、數據攝取、架構、機器學習和工具,包括 Apache Spark、Apache Hadoop、Apache Hive、Python 和 SQL。寶貴的練習有助於鞏固您所學的知識。

您將學到的內容:
- 發現利用雲端力量的大數據分析的價值
- 在 Microsoft Azure 或 AWS 中使用 SQL 和 Python 開始使用 Databricks
- 理解底層技術,以及雲端和 Apache Spark 如何融入更大的圖景
- 了解這些工具在現實世界中的使用方式
- 在數十億行數據上以極低的成本或免費運行基本分析,包括機器學習

本書適合對象:
數據工程師、數據科學家和雲架構師,想要或需要在雲端運行進階分析的人。假設讀者具有數據經驗,但對 Apache Spark 和 Azure Databricks 的接觸可能有限。本書也推薦給希望在分析領域入門的人,因為它提供了堅實的基礎。

作者簡介

Robert Ilijason is a 20-year veteran in the business intelligence (BI) segment. He has worked as a contractor for some of Europe's biggest companies and has conducted large-scale analytics projects within the areas of retail, telecom, banking, government, and more. He has seen his share of analytic trends come and go over the years, but unlike most of them, he strongly believes that Apache Spark in the cloud, especially with Azure Databricks, is a game changer.

作者簡介(中文翻譯)

Robert Ilijason 是商業智慧 (BI) 領域的資深專家,擁有 20 年的經驗。他曾為一些歐洲最大的公司擔任承包商,並在零售、電信、銀行、政府等領域進行過大規模的分析專案。他見證了許多分析趨勢的興起與消退,但與大多數人不同的是,他堅信 Apache Spark 在雲端中的應用,特別是與 Azure Databricks 結合,將會改變遊戲規則。