Hadoop: The Definitive Guide, 4/e (Paperback)

Tom White

買這商品的人也買了...

商品描述

Ready to unlock the power of your data? With the fourth edition of this comprehensive guide, you’ll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters.

You’ll find illuminating case studies that demonstrate how Hadoop is used to solve specific problems. This edition includes new case studies, updates on Hadoop 2, a refreshed HBase chapter, and new chapters on Crunch and Flume. Author Tom White also suggests learning paths for the book.

  • Store large datasets with the Hadoop Distributed File System (HDFS)
  • Run distributed computations with MapReduce
  • Use Hadoop’s data and I/O building blocks for compression, data integrity, serialization (including Avro), and persistence
  • Discover common pitfalls and advanced features for writing real-world MapReduce programs
  • Design, build, and administer a dedicated Hadoop cluster—or run Hadoop in the cloud
  • Load data from relational databases into HDFS, using Sqoop
  • Perform large-scale data processing with the Pig query language
  • Analyze datasets with Hive, Hadoop’s data warehousing system
  • Take advantage of HBase for structured and semi-structured data, and ZooKeeper for building distributed systems

商品描述(中文翻譯)

準備好發揮您的數據威力了嗎?透過這本全面指南的第四版,您將學習如何使用Apache Hadoop建立和維護可靠、可擴展、分散式系統。這本書非常適合想要分析任何大小數據集的程式設計師,以及想要建立和運行Hadoop集群的系統管理員。

您將找到一些有啟發性的案例研究,展示了Hadoop如何解決特定問題。本版還包括了新的案例研究、Hadoop 2的更新、一個更新的HBase章節,以及關於Crunch和Flume的新章節。作者Tom White還提供了學習路徑建議。

本書內容包括:
- 使用Hadoop分散式文件系統(HDFS)存儲大型數據集
- 使用MapReduce進行分散計算
- 使用Hadoop的數據和I/O模塊進行壓縮、數據完整性、序列化(包括Avro)和持久化
- 發現撰寫真實世界MapReduce程序的常見問題和高級功能
- 設計、構建和管理專用的Hadoop集群,或在雲端上運行Hadoop
- 使用Sqoop將關聯式數據庫中的數據加載到HDFS
- 使用Pig查詢語言進行大規模數據處理
- 使用Hive進行數據倉儲系統的數據集分析
- 利用HBase處理結構化和半結構化數據,以及使用ZooKeeper構建分散式系統