A Data Scientist's Guide to Acquiring, Cleaning, and Managing Data in R
暫譯: 數據科學家在 R 中獲取、清理和管理數據的指南

Samuel E. Buttrey, Lyn R. Whitaker

  • 出版商: Wiley
  • 出版日期: 2017-12-18
  • 定價: $2,800
  • 售價: 9.5$2,660
  • 語言: 英文
  • 頁數: 312
  • 裝訂: Hardcover
  • ISBN: 1119080029
  • ISBN-13: 9781119080022
  • 相關分類: R 語言Data Science
  • 立即出貨 (庫存=1)

買這商品的人也買了...

相關主題

商品描述

The only how-to guide offering a unified, systemic approach to acquiring, cleaning, and managing data in R

Every experienced practitioner knows that preparing data for modeling is a painstaking, time-consuming process. Adding to the difficulty is that most modelers learn the steps involved in cleaning and managing data piecemeal, often on the fly, or they develop their own ad hoc methods. This book helps simplify their task by providing a unified, systematic approach to acquiring, modeling, manipulating, cleaning, and maintaining data in R. 

Starting with the very basics, data scientists Samuel E. Buttrey and Lyn R. Whitaker walk readers through the entire process. From what data looks like and what it should look like, they progress through all the steps involved in getting data ready for modeling.  They describe best practices for acquiring data from numerous sources; explore key issues in data handling, including text/regular expressions, big data, parallel processing, merging, matching, and checking for duplicates; and outline highly efficient and reliable techniques for documenting data and recordkeeping, including audit trails, getting data back out of R, and more.

  • The only single-source guide to R data and its preparation, it describes best practices for acquiring, manipulating, cleaning, and maintaining data
  • Begins with the basics and walks readers through all the steps necessary to get data ready for the modeling process
  • Provides expert guidance on how to document the processes described so that they are reproducible
  • Written by seasoned professionals, it provides both introductory and advanced techniques
  • Features case studies with supporting data and R code, hosted on a companion website

A Data Scientist's Guide to Acquiring, Cleaning and Managing Data in R is a valuable working resource/bench manual for practitioners who collect and analyze data, lab scientists and research associates of all levels of experience, and graduate-level data mining students.

商品描述(中文翻譯)

唯一提供統一系統化方法以獲取、清理和管理 R 中數據的操作指南

每位經驗豐富的從業者都知道,為建模準備數據是一個繁瑣且耗時的過程。增加難度的是,大多數建模者學習清理和管理數據的步驟往往是零散的,經常是在實際操作中學習,或者他們會自行開發臨時的方法。本書通過提供統一的系統化方法來簡化這一任務,幫助讀者在 R 中獲取、建模、操作、清理和維護數據。

從最基本的概念開始,數據科學家 Samuel E. Buttrey 和 Lyn R. Whitaker 帶領讀者了解整個過程。從數據的外觀及其應該的樣子開始,他們逐步介紹了為建模準備數據的所有步驟。他們描述了從多個來源獲取數據的最佳實踐;探討了數據處理中的關鍵問題,包括文本/正則表達式、大數據、並行處理、合併、匹配和檢查重複項;並概述了高效且可靠的數據文檔和記錄保存技術,包括審計追蹤、將數據從 R 中提取出來等。


  • 唯一的 R 數據及其準備的單一來源指南,描述了獲取、操作、清理和維護數據的最佳實踐

  • 從基礎開始,帶領讀者了解為建模過程準備數據所需的所有步驟

  • 提供專家指導,說明如何記錄所描述的過程,以便可重現

  • 由經驗豐富的專業人士撰寫,提供入門和進階技術

  • 包含案例研究,附有支持數據和 R 代碼,並在配套網站上提供

數據科學家的 R 數據獲取、清理和管理指南 是一個對於收集和分析數據的從業者、各級實驗室科學家和研究助理,以及研究生級別的數據挖掘學生來說,都是一個有價值的工作資源/手冊。