Data Engineering with AWS: Learn how to design and build cloud-based data transformation pipelines using AWS (Paperback)

Gareth Eagar



Key Features

  • Learn about common data architectures and modern approaches to generating value from big data
  • Explore AWS tools for ingesting, transforming, and consuming data, and for orchestrating pipelines
  • Learn how to architect and implement data lakes and data lakehouses for big data analytics

Book Description

Knowing how to architect and implement complex data pipelines is a highly sought-after skill. Data engineers are responsible for building these pipelines that ingest, transform, and join raw datasets - creating new value from the data in the process.

Amazon Web Services (AWS) offers a range of tools to simplify a data engineer's job, making it the preferred platform for performing data engineering tasks.

This book will take you through the services and the skills you need to architect and implement data pipelines on AWS. You'll begin by reviewing important data engineering concepts and some of the core AWS services that form a part of the data engineer's toolkit. You'll then architect a data pipeline, review raw data sources, transform the data, and learn how the transformed data is used by various data consumers. The book also teaches you about populating data marts and data warehouses along with how a data lakehouse fits into the picture. Later, you'll be introduced to AWS tools for analyzing data, including those for ad-hoc SQL queries and creating visualizations. In the final chapters, you'll understand how the power of machine learning and artificial intelligence can be used to draw new insights from data.

By the end of this AWS book, you'll be able to carry out data engineering tasks and implement a data pipeline on AWS independently.

What you will learn

  • Understand data engineering concepts and emerging technologies
  • Ingest streaming data with Amazon Kinesis Data Firehose
  • Optimize, denormalize, and join datasets with AWS Glue Studio
  • Use Amazon S3 events to trigger a Lambda process to transform a file
  • Run complex SQL queries on data lake data using Amazon Athena
  • Load data into a Redshift data warehouse and run queries
  • Create a visualization of your data using Amazon QuickSight
  • Extract sentiment data from a dataset using Amazon Comprehend

Who this book is for

This book is for data engineers, data analysts, and data architects who are new to AWS and looking to extend their skills to the AWS cloud. Anyone who is new to data engineering and wants to learn about the foundational concepts while gaining practical experience with common data engineering services on AWS will also find this book useful.

A basic understanding of big data-related topics and Python coding will help you get the most out of this book but is not needed. Familiarity with the AWS console and core services is also useful but not necessary.



  • 了解常見的數據架構和從大數據中生成價值的現代方法

  • 探索用於輸入、轉換和使用數據以及編排流程的 AWS 工具

  • 學習如何為大數據分析架構和實施數據湖和數據湖倉庫




本書將帶領您了解在 AWS 上架構和實施數據流程所需的服務和技能。您將首先回顧重要的數據工程概念和一些核心 AWS 服務,這些服務是數據工程師工具包的一部分。然後,您將設計一個數據流程,檢查原始數據源,轉換數據,並了解轉換後的數據如何被各種數據使用者使用。本書還教您如何填充數據集市和數據倉庫,以及數據湖倉庫的作用。隨後,您將介紹用於分析數據的 AWS 工具,包括用於即席 SQL 查詢和創建可視化的工具。在最後幾章中,您將了解如何利用機器學習和人工智能的力量從數據中獲得新的見解。

通過閱讀本書,您將能夠獨立執行數據工程任務並在 AWS 上實施數據流程。


  • 了解數據工程概念和新興技術

  • 使用 Amazon Kinesis Data Firehose 輸入流式數據

  • 使用 AWS Glue Studio 優化、去正規化和聯接數據集

  • 使用 Amazon S3 事件觸發 Lambda 過程以轉換文件

  • 使用 Amazon Athena 在數據湖數據上運行複雜的 SQL 查詢

  • 將數據加載到 Redshift 數據倉庫並運行查詢

  • 使用 Amazon QuickSight 創建數據可視化

  • 使用 Amazon Comprehend 從數據集中提取情感數據


本書適合數據工程師、數據分析師和數據架構師,他們對 AWS 不熟悉,並希望將自己的技能擴展到 AWS 云端。任何對數據工程新手,希望在獲得常見數據工程服務的實踐經驗的同時學習基礎概念的人也會發現本書有用。

對大數據相關主題和 Python 編程有基本的理解將有助於您充分利用本書,但不是必需的。熟悉 AWS 控制台和核心服務也有幫助,但不是必要的。


Gareth Eagar has worked in the IT industry for over 25 years, starting in South Africa, then working in the United Kingdom, and now based in the United States. In 2017, he started working at Amazon Web Services (AWS) as a solution architect, working with enterprise customers in the NYC metro area. Gareth has become a recognized subject matter expert for building data lakes on AWS, and in 2019 he launched the Data Lake Day educational event at the AWS Lofts in NYC and San Francisco. He has also delivered a number of public talks and webinars on topics relating to big data, and in 2020 Gareth transitioned to the AWS Professional Services organization as a senior data architect, helping customers architect and build complex data pipelines.


Gareth Eagar在IT行業工作超過25年,從南非開始,然後在英國工作,現在在美國定居。2017年,他開始在Amazon Web Services (AWS)擔任解決方案架構師,與紐約市地區的企業客戶合作。Gareth已成為在AWS上建立數據湖的知識專家,並於2019年在紐約市和舊金山的AWS Lofts舉辦了Data Lake Day教育活動。他還發表了許多關於大數據的公開演講和網絡研討會。2020年,Gareth轉到AWS專業服務組織擔任高級數據架構師,幫助客戶設計和構建複雜的數據管道。


Table of Contents

  1. An Introduction to Data Engineering
  2. Data Management Architectures for Analytics
  3. The AWS Data Engineer's Toolkit
  4. Data Cataloging, Security and Governance
  5. Architecting Data Engineering Pipelines
  6. Ingesting Batch and Streaming Data
  7. Transforming Data to Optimize for Analytics
  8. Identifying and Enabling Data Consumers
  9. Loading Data into a Data Mart
  10. Orchestrating the Data Pipeline
  11. Ad Hoc Queries with Amazon Athena
  12. Visualizing Data with Amazon QuickSight
  13. Enabling Artificial Intelligence and Machine Learning
  14. Wrapping Up the First Part of Your Learning Journey



  1. 資料工程簡介

  2. 用於分析的資料管理架構

  3. AWS資料工程師的工具包

  4. 資料目錄、安全性和治理

  5. 資料工程架構設計

  6. 批次和流式資料載入

  7. 優化資料以供分析

  8. 識別和啟用資料使用者

  9. 將資料載入資料倉庫

  10. 資料管道的編排

  11. 使用Amazon Athena進行即席查詢

  12. 使用Amazon QuickSight視覺化資料

  13. 啟用人工智慧和機器學習

  14. 結束您的學習旅程的第一部分