Field Guide to Hadoop: An Introduction to Hadoop, Its Ecosystem, and Aligned Technologies (Paperback)

Kevin Sitto, Marshall Presser

  • 出版商: O'Reilly
  • 出版日期: 2015-04-21
  • 定價: $1,320
  • 售價: 8.0$1,056
  • 語言: 英文
  • 頁數: 132
  • 裝訂: Paperback
  • ISBN: 1491947934
  • ISBN-13: 9781491947937
  • 相關分類: Hadoop
  • 立即出貨(限量) (庫存=4)

買這商品的人也買了...

商品描述

If your organization is about to enter the world of big data, you not only need to decide whether Apache Hadoop is the right platform to use, but also which of its many components are best suited to your task. This field guide makes the exercise manageable by breaking down the Hadoop ecosystem into short, digestible sections. You’ll quickly understand how Hadoop’s projects, subprojects, and related technologies work together.

Each chapter introduces a different topic—such as core technologies or data transfer—and explains why certain components may or may not be useful for particular needs. When it comes to data, Hadoop is a whole new ballgame, but with this handy reference, you’ll have a good grasp of the playing field.

Topics include:

  • Core technologies—Hadoop Distributed File System (HDFS), MapReduce, YARN, and Spark
  • Database and data management—Cassandra, HBase, MongoDB, and Hive
  • Serialization—Avro, JSON, and Parquet
  • Management and monitoring—Puppet, Chef, Zookeeper, and Oozie
  • Analytic helpers—Pig, Mahout, and MLLib
  • Data transfer—Scoop, Flume, distcp, and Storm
  • Security, access control, auditing—Sentry, Kerberos, and Knox
  • Cloud computing and virtualization—Serengeti, Docker, and Whirr

商品描述(中文翻譯)

如果您的組織即將進入大數據領域,您不僅需要決定是否使用Apache Hadoop作為適合的平台,還需要確定其中的哪些組件最適合您的任務。這本指南將Hadoop生態系統分解為簡短易懂的部分,使這個過程變得可行。您將迅速了解Hadoop的項目、子項目和相關技術如何協同工作。

每個章節介紹不同的主題,例如核心技術或數據傳輸,並解釋為什麼某些組件對特定需求可能或不可能有用。在數據方面,Hadoop是一個全新的遊戲,但通過這本方便的參考資料,您將對這個領域有很好的掌握。

主題包括:

- 核心技術:Hadoop分散式文件系統(HDFS)、MapReduce、YARN和Spark
- 數據庫和數據管理:Cassandra、HBase、MongoDB和Hive
- 序列化:Avro、JSON和Parquet
- 管理和監控:Puppet、Chef、Zookeeper和Oozie
- 分析助手:Pig、Mahout和MLLib
- 數據傳輸:Scoop、Flume、distcp和Storm
- 安全性、訪問控制和審計:Sentry、Kerberos和Knox
- 雲計算和虛擬化:Serengeti、Docker和Whirr